Boxplots in matplotlib: Markers and outliers(matplotlib 中的箱线图:标记和异常值)
问题描述
我对 matplotlib 中的 .请注意,如果您在 Pandas 中不提供 whis
关键字,则 k=1.5.
Pandas 中的 boxplot 函数是 matplotlib.pyplot.boxplot
的包装器.matplotlib 文档详细解释了这些框的组成部分:
问题 A:
<块引用>框从数据的下四分位数值延伸到上四分位数,中间有一条线.
即四分之一的输入数据值位于框下方,四分之一的数据位于框的每个部分,其余四分之一位于框上方.
问题 B:
<块引用>whis : 浮点数、序列或字符串(默认值 = 1.5)
作为一个浮点数,决定了胡须到超出范围的范围第一和第三四分位数.换句话说,其中 IQR 是四分位距(Q3-Q1),上部晶须将延伸到最后数据小于 Q3 + whis*IQR).同样,较低的晶须将扩展到大于 Q1 的第一个数据 - whis*IQR.超过胡须,数据被认为是异常值,并被绘制为单独的点.
Matplotlib(和 Pandas)还为您提供了许多更改胡须默认定义的选项:
<块引用>将此设置为不合理的高值以强制胡须显示最小值和最大值.或者,将其设置为升序百分位数序列(例如,[5, 95])将胡须设置在特定的位置数据的百分位数.最后,whis 可以是字符串 'range' 到强制胡须达到数据的最小值和最大值.
I have some questions about boxplots in matplotlib:
Question A. What do the markers that I highlighted below with Q1, Q2, and Q3 represent? I believe Q1 is maximum and Q3 are outliers, but what is Q2?
Question B How does matplotlib identify outliers? (i.e. how does it know that they are not the true max
and min
values?)
Here's a graphic that illustrates the components of the box from a stats.stackexchange answer. Note that k=1.5 if you don't supply the whis
keyword in Pandas.
The boxplot function in Pandas is a wrapper for matplotlib.pyplot.boxplot
. The matplotlib docs explain the components of the boxes in detail:
Question A:
The box extends from the lower to upper quartile values of the data, with a line at the median.
i.e. a quarter of the input data values is below the box, a quarter of the data lies in each part of the box, and the remaining quarter lies above the box.
Question B:
whis : float, sequence, or string (default = 1.5)
As a float, determines the reach of the whiskers to the beyond the first and third quartiles. In other words, where IQR is the interquartile range (Q3-Q1), the upper whisker will extend to last datum less than Q3 + whis*IQR). Similarly, the lower whisker will extend to the first datum greater than Q1 - whis*IQR. Beyond the whiskers, data are considered outliers and are plotted as individual points.
Matplotlib (and Pandas) also gives you a lot of options to change this default definition of the whiskers:
Set this to an unreasonably high value to force the whiskers to show the min and max values. Alternatively, set this to an ascending sequence of percentile (e.g., [5, 95]) to set the whiskers at specific percentiles of the data. Finally, whis can be the string 'range' to force the whiskers to the min and max of the data.
这篇关于matplotlib 中的箱线图:标记和异常值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!