π‘ Correlation
Types, Test for significance
- When there are two continuous variables which are concomitant their joint distribution is known as bivariate normal distribution.
- If there are more than two such variables their joint distribution is known as multivariate normal distributions.
- In case of bivariate or multivariate normal distributions, we may be interested in discovering and measuring the magnitude and direction of the relationship between two or more variables.
- For this purpose we use the statistical tool known as correlation.
- Definition:
If the change in one variable affects a change in the other variable, the two variables are said to be correlated and the degree of association ship (or extent of the relationship) is known as correlation.
- It studies the relation or association between two variables.
- Two independent variables are not interrelated.
- The measurement of correlation is called the
correlation co-efficient (r)
orcorrelation index
, which summarizes in one figure the direction & degree of correlation. - Range of correlation varies between
+1 to -1
(i.e. β1 β€ r β€ 1). The correlation coefficient never exceed unity. - If r = +1 then we say that there is a perfect positive correlation between x and y
- If r = -1 then we say that there is a perfect negative correlation between x and y
- If r = 0 then the two variables x and y are called uncorrelated variables
No unit
of measurement.
Types of Correlation
Positive
- If the two variables deviate in the
same direction
, i.e., if the increase (or decrease) in one variable results in a corresponding increase (or decrease) in the other variable, correlation is said to be direct or positive. - Ex:
- Heights and weights
- Household income and expenditure
- Amount of rainfall and yield of crops
- Prices and supply of commodities
- Feed and milk yield of an animal
- Soluble nitrogen and total chlorophyll in the leaves of paddy.
Negative correlation
- If the two variables constantly deviate in the
opposite direction
i.e., if increase (or decrease) in one variable results in corresponding decrease (or increase) in the other variable, correlation is said to be inverse or negative. - Ex:
- Price and demand of a goods
- Volume and pressure of perfect gas
- Sales of woolen garments and the day temperature
- Yield of crop and plant infestation
No or Zero Correlation
- If there is no relationship between the two variables such that the value of one variable change and the other variable remain constant is called no or zero correlation.
Simple, Partial and Multiple Correlations
- Simple correlation: When only two variables are studied.
- Partial correlation: More than two variables are studied but consider only two to be influencing each other, the effect of other influencing variable being kept constant.
- Multiple correlations: Three or more variable are studied simultaneously.
Linear and Nonlinear Correlation
- If the amount of change in one variable tends to bear a
constant ratio
to the amount of change in the other variable is known as linear correlation. - If the amount of change in variable doesn’t bear a constant ratio to the amount of change in other variable is known as nonlinear correlation.
- In the most of the practical situations we find a nonlinear relationship between variables.
- Absence of any relationship between the variable the value of correlation coefficient will be zero.
Methods of studying Correlation
- Scatter Diagram
- Karl Pearsonβs Coefficient of Correlation
- Spearmanβs Rank Correlation
- Regression Lines
Scatter diagram
- It is the simplest way of the diagrammatic representation of bivariate data. Thus for the bivariate distribution (xi, yi); i = j = 1,2,β¦n, If the values of the variables X and Y be plotted along the X-axis and Y-axis respectively in the xy-plane, the diagram of dots so obtained is known as scatter diagram.
- From the scatter diagram, if the points are very close to each other, we should expect a fairly good amount of correlation between the variables and if the points are widely scattered, a poor correlation is expected. This method, however, is not suitable if the number of observations is fairly large.
Positive Correlation
- If the plotted points shows an upward trend of a straight line then we say that both the variables are positively correlated.
Negative Correlation
- When the plotted points shows a downward trend of a straight line then we say that both the variables are negatively correlated.
No Correlation
- If the plotted points spread on whole of the graph sheet, then we say that both the variables are not correlated.
Karl Pearsonβs Coefficient of Correlation
- Prof. Karl Pearson, a British Biometrician suggested a measure of correlation between two variables. It is known as Karl Pearsonβs coefficient of correlation. It is useful for
measuring the degree of linear relationship
between the two variables X and Y. - It is usually denoted by rxy or βrβ.
i) Direct Method:
ii) Deviation method
- Where
- Οx = S.D. of x and Οy = S.D. of Y
- n = number of items
- dx = x - A, dy = y - B
- A = assumed value of and B = assumed value of y
Test for significance of correlation coefficient
- If βrβ is the observed correlation coefficient in a sample of βnβ pairs of observations from a bivariate normal population, then Prof. Fisher proved that under the null hypothesis
H0: Ο = 0
- The variables x, y follows a bivariate normal distribution. If the population correlation coefficient of x and y is denoted by Ο, then it is often of interest to test whether Ο is zero or different from zero, on the basis of observed correlation coefficient βrβ.
- Thus if βrβ is the sample correlation coefficient based on a sample of βnβ observations, then the appropriate test statistic for testing the null hypothesis H0: Ο = 0 against the alternative hypothesis H1: Ο β 0 is
- Follows Studentβs t β distribution with
(n - 2) d.f.
- If calculated value of t > table value of t with (n - 2) d.f. at specified level of significance, then the null hypothesis is rejected. That is, there may be significant correlation between the two variables. Otherwise, the null hypothesis is accepted.
Example
- From a paddy field, 12 plants were selected at random. The length of panicles in cm (x) and the number of grains per panicle (y) of the selected plants were recorded. The results are given in the following table. Calculate correlation coefficient and its testing.
Solution:
a) Direct Method:
- Where, n = number of observations
- Testing the correlation coefficient:
- Null hypothesis H0: Population correlation coefficient βΟβ = 0
- Under H0, the test statistic becomes
- T critical (table) value for 10 d.f. at 5% LOS is 2.23
- Since calculated value i.e. 9.6 is > t table value i.e. 2.23, it can be inferred that there exists significant positive correlation between (x, y).
b) Indirect Method:
- Here A = 127 and B = 24