Measures of Position



Overview of measures of position in sample or distributions.

In descriptive statistics, measures of position is an way to know where certain data point or range falls in a sample/population distribution. Most of the them are used to descriptively summarize the data as well as help them not be sensitive to the influence of some extreme observations, known as outliers.

measures of position

Quantiles


Quantiles are cut points dividing regular ranges of the data into contiguous intervals with equal probabilities. In this way, they give rise to q-quantiles, which are established from cut-off points that determine the boundaries between consecutive subsets.

Some q-quantiles have special names. Some of them are:

measures of position quartiles

For example, given the list of 5 numbers: 5, 87, 45, 32, 1

The first step is to sort all the elements: 1, 5, 32, 45, 87

Finally, we can get the quartile points, which are:

Notice that we can define the quantile by defining the relative postion using percent.

Number of set elements


In the same way as in measures of central tendency, the number of set elements matters here. For example, if data has odd number of elements, it is the middle element (or $\frac{n}{2}$ th element). If data has even number of elements, it is the mean of the two center data ($\frac{n}{2}$ th and $\left[ \frac{n}{2} + 1 \right]$th).

measures of central location median

However, sometimes (depending on the number of elements) the point of our measure of position is not exactly in the middle of 2 elements. Sometimes it falls closer to one than the other, so we have to interpolate those values in order to find the weighted result. Firstly, let's use a quantile value $q$, which ranges from 0 to 1.

Having the quantile value, we can find the position value $p$ as:

$$ \large p = q \cdot (n - 1) $$

where $n$ is the number of elements in the set. Thus:

Considering our value $p$ falls between two elemens $a$ and $b$, the interpolated value $I$ can be:

$$ \large t = a - p \quad ; \quad I = (1 - t) \cdot v_a + t \cdot v_b $$

where $v_a$ and $v_b$ are the values of the elements $a$ and $b$, respectively.


For example, given the list of 6 numbers: 5, 87, 45, 32, 1, 38

The next step is to sort all the elements: 1, 5, 32, 38, 45, 87

If we want to calculate the first quartile ($Q_1$ and $q=0.25$), the position is calculated as:

$$ \large \begin{align} p &= q \cdot (n - 1) \\ &= 0.25 \cdot (6 -1) \\ &= 0.25 \cdot 5 \\ p &= 1.25 \end{align} $$

Having $p = 1.25$ we already know that the point is between the elements with index 1 (a) and 2 (b), which are 5 and 32, respectively. Thus:

$$ \large \begin{align} t &= a - p \\ &= 1 - 1.25 \\ t &= 0.25 \end{align} $$$$ \large \begin{align} I &= (1 - t) \cdot v_a + t \cdot v_b \\ &= (1 - 0.25) \cdot 5 + 0.25 \cdot 32 \\ &= 0.75 \cdot 5 + 0.25 \cdot 32 \\ &= 3.75 + 8 \\ I &= 11.75 \end{align} $$

InterQuartile Range and Outliers


The InterQuartile Range (or IQR) tells us where the “middle fifty” is in a data set. It is basically the difference between the first and the third quartiles:

$$ \large IQR = Q_3 - Q_1 $$

Despite being a measure of dispersion, this range can provide us some useful understandings about data suche as InterQuartile Mean (or IQM). They are effective beceuse we can deal with outliers, which are extreme observations that can distort our analysis.

$$ \large IQM = \frac{2}{n} \sum_{i=\frac{n}{4}+1}^{\frac{3n}{4}} x_i $$

Box Plot


Box Plot is a graphical representation of our measures of position. Its anatomy shows us all the important values we cover here:

measures of position boxplot

A very important point about box plot is the minimum and maximum value. They don't represent the extreme values of the sample (or $q=0$ and $q=1$). Instead they can be calculated with whiskers: