FORMULA AND DERIVATION
For a set of values x1
, of a variable x, the sample variance can be measured as follows:
Similarly, the variance of values of variable y can be calculated as follows:-
The variance measures how spread out each observation is from the mean. But we need to know how the values of y vary with respect to values of variable x. For this, we calculate a measure called the covariance.
But, the covariance obtained would be different for different sets of data with the same correlation. The covariance doesn’t exactly define the strength of the relationship between values of the variables. Also, covariance is measured in the same units as the variables, making it difficult for comparison of correlation of two or more sets of data.
It is, therefore, required to have a measure that is independent of the units of measurement, such that we get the same value for any set of data correlated in the same manner and to the same extent. In order to calculate such a measure, we divide the covariance by the standard deviations of both variables, and arrive at a measure called the co-efficient of correlation, denoted by r.
This equation can be written in terms of the sum of squares, as follows:-
The value of r obtained will simply be a number(independent of units of measurement), between -1 and 1, i.e. -1<r<1.
Different values of r can be interpreted as follows:-
The above interpretation can be represented graphically as follows:-
r = 1
Perfect positive correlation
0 < r < 1
Strong positive correlation
0 < r < 1
Weak positive correlation
r = –1
Perfect negative correlation
–1 < r < 0
Strong negative correlation
–1 < r < 0
Weak negative correlation
r = 0
A school wants to analyse if conducting more number of classes can give better results. It gathers the following information on the number of classes conducted and the class average marks.
Let us first calculate the means of both variables.
We can now calculate the sum of squares.
We can find the correlation co-efficient as follows:-
Since r is positive, it means that there is a direct relationship between average marks and the number of classes conducted, i.e. as number of classes conducted increases, the average marks will go on increasing too.
Since r is close to 1, it means that there is a strong correlation between the variables.
Following are the prices of six supercars of the same model, used for different number of years.
In order to determine the co-efficient of correlation, we need to calculate the sum of squares.
Since r is negative, it means that the price of the car and the number of years it is used move in opposite directions, i.e. there is a negative or inverse relationship.