STANDARD DEVIATION AND VARIANCE

DEFINITION

Standard deviation and variance are measures of spread (or dispersion or variability).
They indicate how far spread out each value is from the mean of the observations.

For a set of observations, the variance measures the average of the squared mean deviations (deviation of each value from the mean).

Since, the variance considers the squared deviations, another measure of spread would be needed that measures the spread in the same unit as the actual observations (instead of squared units). This measure is the standard deviation which is the squared root of the variance for a set of values.

For a set of observations, the variance measures the average of the squared mean deviations (deviation of each value from the mean).

Since, the variance considers the squared deviations, another measure of spread would be needed that measures the spread in the same unit as the actual observations (instead of squared units). This measure is the standard deviation which is the squared root of the variance for a set of values.

Standard deviation

Variance:

Let us consider a simple set of observations \(2,\ 4,\ 6.\)The mean, \(\overline{x}\), of these observations is \(\frac{2+4+6}{3}=4\). Now, let us consider how far each value is from the mean.

The first observation \(\left(2\right)\ is\ 2\) units less than the mean \(\left(2-4=-2\right)\). So, the deviation of the first observation from the mean is -2.

The deviation of the second observation \(\left(4\right)\ is\ 4-4=0\).

The deviation of the third observation \(\left(6\right)is\ 6-4=2\).

So, the deviation of each observation from the mean is \(-2,\ 0\ and\ 2.\)

Let us now consider the average deviation, which is

The first observation \(\left(2\right)\ is\ 2\) units less than the mean \(\left(2-4=-2\right)\). So, the deviation of the first observation from the mean is -2.

The deviation of the second observation \(\left(4\right)\ is\ 4-4=0\).

The deviation of the third observation \(\left(6\right)is\ 6-4=2\).

So, the deviation of each observation from the mean is \(-2,\ 0\ and\ 2.\)

Let us now consider the average deviation, which is

\[\frac{-2+0+2}{3}=\frac{0}{3}=0\]

The value that is obtained is of no use and is not indicating any information on the spread of values. In fact, the average of the deviations of observations from the mean will always be equal to zero, as the negative and positive deviations will cancel out each other.
Since the average of the deviations are of no use, we consider the average of the squared deviations:-

\[{\left(2-4\right)}^2=-2^2=4\]
\[{\left(4-4\right)}^2=-0^2=0\]
\[{\left(6-4\right)}^2=2^2=4\]

The squared deviations of the observations are 4, 0 and 4.
Now, we will consider the average of these deviations, which is

\[\frac{4+0+4}{3}=\frac{8}{3}=2.67\]

This value obtained is a single value indicating the average of the squared deviations of the observations.
So, for a given set of observations \(x_1,\ x_2\dots x_n\) with mean \(\overline{x}\) , the deviation of each observation from the mean would be \({\left(x_i-\overline{X}\right)}^2\).
The average of these squared deviations will be

\[\frac{\sum{{\left(x-\overline{x}\right)}^2}}{n}\]

But, for the variance, we consider the sum of squared deviations divided by\(n-1\) , rather than simply n. Thus, the variance for a set of observations is:-

\[s^2=\frac{\sum{{\left(x-\overline{x}\right)}^2}}{n-1}\]

Though dividing by n may seem more logical, dividing by \(n-1\)is mathematically sound. Dividing by \(n-1\) makes the sample variance, an unbiased estimator of the population variance. This means that, on average, the sample variance will give the population variance.

The standard deviation, \(s\), is simply the squared root of the variance.

\[s=\sqrt{\frac{\sum{{\left(x-\overline{x}\right)}^2}}{n-1}}\]

The standard deviation is measured in the same units as the observations, as against the variance which is measured in square units.
In case each \(x_i\) is repeated \(f_i\) times, i.e. when each \(x_i\) has a frequency \(f_i\), as follows:-

Observations \( \left( x_{i} \right) \) | Frequencies \( \left( f_{i} \right) \) |
---|---|

\(x_{1}\) | \(f_{1}\) |

\( x_{2} \) | \( f_{2} \) |

... | ... |

... | ... |

\( x_{n} \) | \( f_{n} \) |

\( \sum _{}^{}f=N \) |

then, for each \(x_i\) we must find the squared deviation \(f_i\) times, which is the same as \(f_i{\left(x_i-\overline{x}\right)}^2\).

Therefore, the variance \(s^2\) and standard deviation \(s\) can be calculated as follows:-

\[s^2=\frac{\sum{f}{\left(x-\overline{x}\right)}^2}{n-1}\]
\[s=\sqrt{\frac{\sum{{f\left(x-\overline{x}\right)}^2}}{n-1}}\]

Using the above mentioned formulas might become a tedious task in case of a large number of observations.
The formulas can therefore be simplified as follows:-
\[s^2=\frac{\sum{{\left(x-\overline{x}\right)}^2}}{n-1}\]

On expanding the brackets, we get
\[s^2=\frac{1}{n-1}\left[\sum{\left(x^2-2x\overline{x}+{\overline{x}}^2\right)}\right]\]
\[\ \ \ \ =\frac{1}{n-1}\left[\sum{x^2-\sum{2x\overline{x}+\sum{{\overline{x}}^2}}}\right]\]
\[\ \ \ \ =\frac{1}{n-1}\left[\sum{x^2-2\overline{x}\sum{x}+{\overline{x}}^2\sum{1}}\right]\]
\[\ \ \ \ =\frac{1}{n-1}\left[\sum{x^2-2\overline{x}}\sum{x+{\overline{x}}^2n}\right]\]

Now, we use \(\overline{x}=\frac{\sum{x}}{n}\Rightarrow \sum{x}=n\overline{x}\)

\[s^2=\frac{1}{n-1}\left[\sum{x^2-2\overline{x}n\overline{x}+n{\overline{x}}^2}\right]\]
\[\ \ \ \ =\frac{1}{n-1}\left[\sum{x^2-2n\overline{x}+n{\overline{x}}^2}\right]\]
\[\ \ \ \ =\frac{1}{n-1}\left[\sum{x^2-n{\overline{x}}^2}\right]\]

Therefore,
\[s^2=\frac{\sum{x^2-n{\overline{x}}^2}}{n-1}\]
\[s=\sqrt{\frac{\sum{x^2-n{\overline{x}}^2}}{n-1}}\]

In case of a frequency distribution, the following formulas can be used:-
\[s^2=\frac{\sum{fx^2-n{\overline{x}}^2}}{n-1}\]
\[s=\sqrt{\frac{\sum{{fx}^2-n{\overline{x}}^2}}{n-1}}\]

Example 1

Following is the information on the monthly sales (in million ₹) for a year:-

\(2,\ 4,\ 5,\ 3,\ 9,\ 7,\ 8,\ 6,\ 13,\ 11,\ 12,\ 10.\)

We are required to find the variance and standard deviation.Let us consider the first formula

\(x\) | \( x-\overline{x} \) | \( \left( x-\overline{x} \right) ^{2} \) |
---|---|---|

\(2\) | \(2-7.5=-5.5\) | \( -5.5^{2}=30.25 \) |

\(4\) | \(4-7.5=-3.5\) | \( -3.5^{2}=12.25 \) |

\(5\) | \(5-7.5=-2.5\) | \( -2.5^{2}=6.25 \) |

\(3\) | \(3-7.5=-4.5\) | \( -4.5^{2}=20.25 \) |

\(9\) | \(9-7.5=.5\) | \( 1.5^{2}=2.25 \) |

\(7\) | \(7-7.5=-0.5\) | \( -0.5^{2}=0.25 \) |

\(8\) | \(8-7.5=0.5\) | \( 0.5^{2}=0.25 \) |

\(6\) | \(6-7.5=-1.5\) | \( -1.5^{2}=2.25 \) |

\(13\) | \(13-7.5=5.5\) | \( 5.5^{2}=30.25 \) |

\(11\) | \(11-7.5=3.5\) | \( 3.5^{2}=12.25 \) |

\(12\) | \(12-7.5=4.5\) | \( 4.5^{2}=20.25 \) |

\(10\) | \(10-7.5=2.5\) | \( 2.5^{2}=6.25 \) |

\( \sum _{}^{}x=90 \) | \( \sum _{}^{} \left( x-\overline{x} \right) ^{2}=143 \) |

\[\overline{x}=\frac{\sum{x}}{n}=\frac{90}{12}=7.5\]

\[s^2=\frac{\sum{{\left(x-\overline{x}\right)}^2}}{n-1}\]
\[\ \ \ \ =\frac{143}{12-1}=\frac{143}{11}\]
\[\ \ \ \ =13million\ {rupees}^2\]

\[s=\sqrt{s^2}=\sqrt{13}\]
\[s=3.6056\ million\ rupees\]

Let us now consider the second (simpler) formula:-
\( x \) | \( x^{2} \) |
---|---|

\(2\) | \( 2^{2}=4 \) |

\(4\) | \( 4^{2}=16 \) |

\(5\) | \( 5^{2}=25 \) |

\(3\) | \( 3^{2}=9 \) |

\(9\) | \( 9^{2}=81 \) |

\(7\) | \( 7^{2}=49 \) |

\(8\) | \( 8^{2}=64 \) |

\(6\) | \( 6^{2}=36 \) |

\(13\) | \( 13^{2}=169 \) |

\(11\) | \( 11^{2}=121 \) |

\(12\) | \( 12^{2}=144 \) |

\(10\) | \( 10^{2}=100 \) |

\( \sum _{}^{}x=90 \) | \( \sum _{}^{}x^{2}=818 \) |

\[\overline{x}=\frac{\sum{x}}{n}=\frac{90}{12}=7.5\]

\[s^2=\frac{\sum{x^2-n{\overline{x}}^2}}{n-1}\]
\[\ \ \ \ =\frac{818-\left(12\times {7.5}^2\right)}{12-1}\]
\[\ \ \ \ =\frac{818-675}{11}\]
\[\ \ \ \ =13million\ {rupees}^2\]

Example 2

Following is the record of the number of products sold every day by a retailer for the past 30 days.

No.of Products \( \left( x \right) \) | Frequencies \( \left( f \right) \) |
---|---|

\(2\) | \(4\) |

\(3\) | \(3\) |

\(4\) | \(6\) |

\(5\) | \(8\) |

\(6\) | \(3\) |

\(7\) | \(6\) |

\( \sum _{}^{}f=n=30 \) |

We are required to find out the variance and deviation.

\( x \) | \( f \) | \( fx \) | \( x^{2} \) | \( fx^{2} \) |
---|---|---|---|---|

\(2\) | \(4\) | \(8\) | \(4\) | \(16\) |

\(3\) | \(3\) | \(9\) | \(9\) | \(27\) |

\(4\) | \(6\) | \(24\) | \(16\) | \(96\) |

\(5\) | \(8\) | \(40\) | \(25\) | \(200\) |

\(6\) | \(3\) | \(18\) | \(36\) | \(108\) |

\(7\) | \(6\) | \(42\) | \(49\) | \(294\) |

\( \sum _{}^{}f=30 \) | \( \sum _{}^{}fx=141 \) | \( \sum _{}^{}fx^{2}=741 \) |

\[\overline{x}=\frac{\sum{fx}}{n}=\ \frac{141}{30}=4.7\]

\[s^2=\frac{\sum{f}x^2-n{\overline{x}}^2}{n-1}\]
\[\ \ \ \ =\frac{741-\left(30\times {4.7}^2\right)}{30-1}\]
\[\ \ \ \ =\frac{741-662.7}{29}\]
\[\ \ \ \ =2.7\]

\[s=\sqrt{s^2}=\sqrt{2.7}=1.64\]