Courses › … › agriculture › agricultural statistics

Lesson

03 of 18

Translate

agriculture › agricultural statistics

Translate

Prev Course Next

3 / 18

17%

🍡Correlation Analysis

Types of correlation, Karl Pearson's coefficient, Spearman's rank correlation, scatter diagrams, and significance testing with agricultural examples

Does more rainfall always mean higher crop yield? Does increasing fertiliser dose proportionally increase grain weight? These questions ask about the relationship between two variables — and correlation analysis is the statistical tool that measures the strength and direction of such relationships in agricultural research.

When there are two continuous variables which are concomitant their joint distribution is known as bivariate normal distribution. The word concomitant here means the two variables occur together or change together — for instance, the height and weight of plants measured simultaneously.

If there are more than two such variables their joint distribution is known as multivariate normal distributions. For example, if we measure plant height, number of tillers, and grain yield together, these three variables form a multivariate distribution.

In case of bivariate or multivariate normal distributions, we may be interested in discovering and measuring the magnitude and direction of the relationship between two or more variables.
For this purpose we use the statistical tool known as correlation. Correlation helps us answer the question: “Do these variables move together, and if so, how strongly?”

Definition:

If the change in one variable affects a change in the other variable, the two variables are said to be correlated and the degree of association ship (or extent of the relationship) is known as correlation.

It studies the relation or association between two variables.
Two independent variables are not interrelated.
The measurement of correlation is called the correlation co-efficient (r) or correlation index, which summarizes in one figure the direction & degree of correlation. The correlation coefficient is a single number that captures both how strong and in which direction the relationship between two variables is.

Range of correlation varies between +1 to -1 (i.e. —1 ≤ r ≤ 1). The correlation coefficient never exceeds unity.
If r = +1 then we say that there is a perfect positive correlation between x and y. All data points fall exactly on an upward-sloping straight line.
If r = -1 then we say that there is a perfect negative correlation between x and y. All data points fall exactly on a downward-sloping straight line.
If r = 0 then the two variables x and y are called uncorrelated variables. There is no linear relationship between them.
No unit of measurement. The correlation coefficient is a pure number (dimensionless), meaning it does not depend on the units in which the variables are measured. Whether you measure yield in kg/ha or quintals/ha, the correlation coefficient remains the same.

Types of Correlation

Positive

If the two variables deviate in the same direction, i.e., if the increase (or decrease) in one variable results in a corresponding increase (or decrease) in the other variable, correlation is said to be direct or positive.
Ex:
- Heights and weights
- Household income and expenditure
- Amount of rainfall and yield of crops
- Prices and supply of commodities
- Feed and milk yield of an animal
- Soluble nitrogen and total chlorophyll in the leaves of paddy.

Negative correlation

If the two variables constantly deviate in the opposite direction i.e., if increase (or decrease) in one variable results in corresponding decrease (or increase) in the other variable, correlation is said to be inverse or negative.
Ex:
- Price and demand of a goods
- Volume and pressure of perfect gas
- Sales of woolen garments and the day temperature
- Yield of crop and plant infestation

No or Zero Correlation

If there is no relationship between the two variables such that the value of one variable change and the other variable remain constant is called no or zero correlation. In this case, knowing the value of one variable provides no useful information for predicting the value of the other.

Simple, Partial and Multiple Correlations

Simple correlation: When only two variables are studied. For example, studying the relationship between fertilizer dose and crop yield alone.
Partial correlation: More than two variables are studied but consider only two to be influencing each other, the effect of other influencing variable being kept constant. For example, studying the relationship between fertilizer and yield while holding rainfall constant.
Multiple correlations: Three or more variable are studied simultaneously. This gives a comprehensive picture of how several variables together relate to a response variable.

Linear and Nonlinear Correlation

If the amount of change in one variable tends to bear a constant ratio to the amount of change in the other variable is known as linear correlation. Graphically, this relationship plots as a straight line.
If the amount of change in variable doesn’t bear a constant ratio to the amount of change in other variable is known as nonlinear correlation. The relationship curve may be parabolic, exponential, or of some other form.
In the most of the practical situations we find a nonlinear relationship between variables. For instance, crop yield does not increase indefinitely with fertilizer — beyond an optimum point, additional fertilizer may actually reduce yield, creating a curvilinear relationship.
Absence of any relationship between the variable the value of correlation coefficient will be zero.

Methods of studying Correlation

Scatter Diagram
Karl Pearson’s Coefficient of Correlation
Spearman’s Rank Correlation
Regression Lines

Each method has its own strengths: the scatter diagram is visual and intuitive, Karl Pearson’s method gives a precise numerical measure for linear relationships, Spearman’s method works with ranked data, and regression lines help in prediction.

Scatter diagram

It is the simplest way of the diagrammatic representation of bivariate data. Thus for the bivariate distribution (x_i, y_i); i = j = 1,2,…n, If the values of the variables X and Y be plotted along the X-axis and Y-axis respectively in the xy-plane, the diagram of dots so obtained is known as scatter diagram.
From the scatter diagram, if the points are very close to each other, we should expect a fairly good amount of correlation between the variables and if the points are widely scattered, a poor correlation is expected. This method, however, is not suitable if the number of observations is fairly large. While not precise, the scatter diagram gives a quick visual impression of the nature and strength of the relationship.

Positive Correlation

If the plotted points shows an upward trend of a straight line then we say that both the variables are positively correlated.

Negative Correlation

When the plotted points shows a downward trend of a straight line then we say that both the variables are negatively correlated.

No Correlation

If the plotted points spread on whole of the graph sheet, then we say that both the variables are not correlated.

Karl Pearson’s Coefficient of Correlation

Prof. Karl Pearson, a British Biometrician suggested a measure of correlation between two variables. It is known as Karl Pearson’s coefficient of correlation. It is useful for measuring the degree of linear relationship between the two variables X and Y. This is the most widely used method for computing correlation in agricultural and biological research.
It is usually denoted by r_xy or ‘r’.

i) Direct Method:

Correlation formula — direct method (part 1)

Correlation formula — direct method (part 2)

Correlation formula — direct method (part 3)

Correlation formula — direct method (part 4)

ii) Deviation method

Where
- σ_x = S.D. of x and σ_y = S.D. of Y
- n = number of items
- d_x = x - A, d_y = y - B
- A = assumed value of and B = assumed value of y

The deviation method simplifies calculations by using assumed means (A and B) to reduce the size of numbers being worked with. This was especially useful before calculators and computers became widespread.

Test for significance of correlation coefficient

If “r” is the observed correlation coefficient in a sample of “n” pairs of observations from a bivariate normal population, then Prof. Fisher proved that under the null hypothesis

H₀: ρ = 0

This null hypothesis states that the population correlation coefficient (denoted by the Greek letter rho, ρ) is zero, meaning there is no true linear relationship between the two variables in the population. The sample correlation “r” that we compute could simply be due to sampling fluctuation.

The variables x, y follows a bivariate normal distribution. If the population correlation coefficient of x and y is denoted by ρ, then it is often of interest to test whether ρ is zero or different from zero, on the basis of observed correlation coefficient “r”.
Thus if “r” is the sample correlation coefficient based on a sample of “n” observations, then the appropriate test statistic for testing the null hypothesis H₀: ρ = 0 against the alternative hypothesis H₁: ρ ≠ 0 is

Test statistic for significance of correlation coefficient

Follows Student’s t — distribution with (n - 2) d.f. The degrees of freedom are (n - 2) because two parameters (the means of X and Y) have been estimated from the data.
If calculated value of t > table value of t with (n - 2) d.f. at specified level of significance, then the null hypothesis is rejected. That is, there may be significant correlation between the two variables. Otherwise, the null hypothesis is accepted.

Example

From a paddy field, 12 plants were selected at random. The length of panicles in cm (x) and the number of grains per panicle (y) of the selected plants were recorded. The results are given in the following table. Calculate correlation coefficient and its testing.

Solution:

a) Direct Method:

Where, n = number of observations

Testing the correlation coefficient:
Null hypothesis H₀: Population correlation coefficient “ρ” = 0
Under H₀, the test statistic becomes

Significance test — t-statistic calculation (part 1)

Significance test — t-statistic calculation (part 2)

Significance test — t-statistic calculation (part 3)

T critical (table) value for 10 d.f. at 5% LOS is 2.23
Since calculated value i.e. 9.6 is > t table value i.e. 2.23, it can be inferred that there exists significant positive correlation between (x, y). This means the relationship between panicle length and number of grains is not due to chance — longer panicles genuinely tend to have more grains.

b) Indirect Method:

Here A = 127 and B = 24

Indirect method — correlation calculation

Summary Table

Concept	Key Point	Exam Tip
Correlation coefficient (r)	Measures strength and direction of linear relationship	Range: -1 to +1; unit-free
r = +1	Perfect positive correlation	All points on upward line
r = -1	Perfect negative correlation	All points on downward line
r = 0	No linear relationship	Variables are uncorrelated
Positive example	Rainfall and crop yield	Both increase together
Negative example	Price and demand	One up, other down
Simple correlation	Two variables only	Most common in exams
Partial correlation	Two variables, others held constant	Isolates one relationship
Multiple correlation	Three or more variables	Comprehensive analysis
Karl Pearson’s r	Most widely used method	For linear relationships
Test statistic	t = r√(n-2)/√(1-r²)	d.f. = n - 2
Scatter diagram	Visual method	Quick impression of relationship

TIP

Mnemonic for correlation types: “SiPMu” — Simple (2 variables), Partial (2 active, rest constant), Multiple (3+ variables).

Summary Cheat Sheet

Concept / Topic	Key Details
Correlation coefficient (r)	Measures strength and direction of linear relationship
Range of r	-1 to +1; never exceeds unity; no unit (dimensionless)
r = +1	Perfect positive correlation — all points on upward line
r = -1	Perfect negative correlation — all points on downward line
r = 0	Variables are uncorrelated — no linear relationship
Positive correlation	Both variables deviate in same direction (e.g., rainfall and yield)
Negative correlation	Variables deviate in opposite direction (e.g., price and demand)
Simple correlation	Only two variables studied
Partial correlation	Two variables studied; others held constant
Multiple correlation	Three or more variables studied simultaneously
Linear correlation	Change bears a constant ratio — plots as straight line
Nonlinear correlation	Change does not bear constant ratio — curved relationship
Scatter diagram	Simplest visual method for bivariate data
Karl Pearson’s r	Most widely used; measures degree of linear relationship
Bivariate normal	Joint distribution of two continuous concomitant variables
Test statistic for r	t = r√(n-2)/√(1-r²); follows t-distribution with (n-2) d.f.
Significance test	If t_calc > t_table → reject H₀ (significant correlation)
Direct method	Uses raw values of X and Y directly
Deviation method	Uses assumed means A and B to simplify calculations
Positive example	Heights-weights, feed-milk yield, rainfall-crop yield
Negative example	Price-demand, yield-pest infestation
Practical situations	Most relationships are nonlinear in practice

🔐

Pro Content Locked

Upgrade to Pro to access this lesson and all other premium content.

Pro Popular

₹ 199 /mo

₹2388 billed yearly

All Agriculture & Banking Courses
AI Lesson Questions (100/day)
AI Doubt Solver (50/day)
Glows & Grows Feedback (30/day)
AI Section Quiz (20/day)
22-Language Translation (30/day)
Recall Questions (20/day)
AI Quiz (15/day)
AI Quiz Paper Analysis
AI Step-by-Step Explanations
Spaced Repetition Recall (FSRS)
AI Tutor
Immersive Text Questions
Audio Lessons — Hindi & English
Mock Tests & Previous Year Papers
Summary & Mind Maps
XP, Levels, Leaderboard & Badges
Generate New Classrooms
Voice AI Teacher (AgriDots Live)
AI Revision Assistant
Knowledge Gap Analysis
Interactive Revision (LangGraph)

🔒 Secure via Razorpay · Cancel anytime · No hidden fees

Lesson Doubts

Ask questions, get expert answers

Lesson Doubts is a Pro feature.Upgrade