Spread Of A Data Set

What are measures of spread?

Measures of spread describe how similar or varied the gear up of observed values are for a particular variable (information particular).

Measures of spread include the range, quartiles and the interquartile range, variance and standard departure.

When can we measure spread?

The spread of the values can be measured for quantitative information, equally the variables are numeric and can be arranged into a logical lodge with a low end value and a high finish value.

Why do we mensurate spread?

Summarising the dataset can help us understand the data, especially when the dataset is large. As discussed in the Measures of Central Tendency page, the mode, median, and mean summarise the data into a single value that is typical or representative of all the values in the dataset, but this is only office of the 'picture' that summarises a dataset. Measures of spread summarise the data in a mode that shows how scattered the values are and how much they differ from the mean value.

For example:

Dataset A	Dataset B
4, 5, 5, five, six, vi, 6, 6, 7, 7, seven, 8	1, 2, 3, 4, v, 6, half-dozen, vii, eight, 9, 10, eleven

The mode (most frequent value), median (middle value*) and mean (arithmetics average) of both datasets is 6.�
(*notation, the median of an fifty-fifty numbered data set up is calculated by taking the mean of the middle two observations). If we merely looked at the measures of central trend, we may assume that the datasets are the same. Nonetheless, if we await at the spread of the values in the post-obit graph, we can see that Dataset B is more dispersed than Dataset A. Used together, the measures of central trend and measures of spread help u.s.a. to meliorate understand the data

What does each measure of spread tell us?

The range is the difference between the smallest value and the largest value in a dataset.

Calculating the Range

Dataset A

4, v, 5, five, 6, half-dozen, 6, six, 7, 7, 7, 8

The range is 4, the divergence between the highest value (8 ) and the everyman value (4).

Dataset B

1, 2, iii, 4, 5, 6, 6, 7, eight, 9, 10, xi

The range is 10, the difference betwixt the highest value (xi ) and the lowest value (1).

Dataset A
0	1	two	3	4	5	6	7	8	9	x	11	12	thirteen

Dataset B
0	1	2	3	4	v	6	seven	8	9	10	11	12	13

On a number line, y'all can see that the range of values for Dataset B is larger than Dataset A.

Quartiles divide an ordered dataset into iv equal parts, and refer to the values of the indicate between the quarters. A dataset may also be divided into quintiles (five equal parts) or deciles (ten equal parts).

Quartiles

25% of values

The lower quartile (Q1) is the indicate between the lowest 25% of values and the highest 75% of values. It is also chosen the 25th percentile . The second quartile (Q2) is the middle of the data set. It is also called the 50th percentile , or the median . The upper quartile (Q3) is the indicate between the lowest 75% and highest 25% of values. It is besides called the 75th percentile .

Calculating Quartiles

Dataset A
iv	5	5	Q1	5	6	half dozen	Q2	vi	half-dozen	7	Q3	7	7	8

As the quartile point falls between two values, the hateful (average) of those values is the quartile value:
Q1 = (five+5) / 2 = five
Q2 = (six+half dozen) / ii = 6
Q3 = (7+seven) / 2 = 7

Dataset B
1	2	3	Q1	four	5	six	Q2	6	seven	viii	Q3	9	10	11

As the quartile point falls between two values, the mean (average) of those values is the quartile value:
Q1 = (three+4) / 2 = iii.5
Q2 = (vi+6) / 2 = half-dozen
Q3 = (8+9) / 2 = 8.v

The interquartile range (IQR) is the departure between the upper (Q3) and lower (Q1) quartiles, and describes the middle l% of values when ordered from lowest to highest. The IQR is often seen every bit a ameliorate measure of spread than the range as information technology is non affected by outliers .

Interquartile Range

25% of values

Calculating the Interquartile Range

The IQR for Dataset A is = 2
IQR = Q3 - Q1
= 7 - v
= two The IQR for Dataset B is = five
IQR = Q3 - Q1
= 8.5 - 3.5
= 5

The variance and the standard deviation are measures of the spread of the information around the mean. They summarise how close each observed data value is to the mean value. In datasets with a minor spread all values are very shut to the mean, resulting in a pocket-size variance and standard deviation. Where a dataset is more dispersed, values are spread farther away from the mean, leading to a larger variance and standard deviation.

The smaller the variance and standard deviation, the more the hateful value is indicative of the whole dataset. Therefore, if all values of a dataset are the same, the standard difference and variance are naught.

The standard deviation of a normal distribution enables us to calculate confidence intervals. In a normal distribution, almost 68% of the values are within 1 standard deviation either side of the mean and nigh 95% of the scores are within two standard deviations of the mean. The population Variance σ ² (pronounced sigma squared ) of a detached set up of numbers is expressed by the following formula:
Image: Equation

where:
X _i represents the ith unit of measurement, starting from the first observation to the final
μ represents the population hateful
Northward represents the number of units in the population The Variance of a sample s ² (pronounced south squared ) is expressed past a slightly different formula:
Image; Equation

where:
10 _i represents the ith unit, starting from the first ascertainment to the terminal
x̅ represents the sample hateful
n represents the number of units in the sample The standard departure is the square root of the variance. The standard difference for a population is represented by σ , and the standard deviation for a sample is represented by s.

Calculating the Population Variance σ ^two and Standard Deviation σ

Dataset A

Calculate the population mean ( μ ) of Dataset A.
(iv + v + 5 + 5 + six + 6 + six + 6 + 7 + 7 + vii + 8) / 12
hateful ( μ ) = six

Calculate the deviation of the individual values from the hateful by subtracting the mean from each value in the dataset

= -2, -ane, -1, -1, 0, 0, 0, 0, 1, one, 1, 2

Square each private divergence value

= 4, ane, 1, 1, 0, 0, 0, 0, 1,1,one, 4

Summate the mean of the squared difference values

=
(4 + 1 +1 +1 + 0 + 0 + 0 + 0 +1 +one +ane + 4) / 12

Variance σ ² = 1.17 Calculate the square root of the variance

Standard divergence σ = 1.08

Dataset B

Calculate the population mean ( μ ) of Dataset B.
(1 + 2 + 3 + 4 + 5 + vi + 6 + 7 + 8 + nine + ten + 11) / 12
mean ( μ ) = 6 Calculate the deviation of the private values from the hateful by subtracting the mean from each value in the dataset

= -5, -iv, -iii, -2, -1, 0, 0, 1, 2, iii, 4, 5,

Square each individual deviation value

= 25, 16, 9, 4, 1, 0, 0, 1, 4, 9, 16, 25

Calculate the mean of the squared deviation values

=
(25 + 16 + 9 + 4 + 1 + 0 + 0 + ane + four + 9 + 16 + 25) / 12

Variance σ ² = 9.17

Summate the square root of the variance

Standard departure σ = 3.03

The larger Variance and Standard Deviation in Dataset B further demonstrates that Dataset B is more dispersed than Dataset A. Return to Statistical Language Homepage

Further information:
External links:
easycalculation.com - Standard Deviation calculator
easycalculation.com - Five Number Summary estimator