Lesson
03 of 18
Translate

🍡Correlation Analysis

Types of correlation, Karl Pearson's coefficient, Spearman's rank correlation, scatter diagrams, and significance testing with agricultural examples

Does more rainfall always mean higher crop yield? Does increasing fertiliser dose proportionally increase grain weight? These questions ask about the relationship between two variables — and correlation analysis is the statistical tool that measures the strength and direction of such relationships in agricultural research.


  • When there are two continuous variables which are concomitant their joint distribution is known as bivariate normal distribution. The word concomitant here means the two variables occur together or change together — for instance, the height and weight of plants measured simultaneously.

  • If there are more than two such variables their joint distribution is known as multivariate normal distributions. For example, if we measure plant height, number of tillers, and grain yield together, these three variables form a multivariate distribution.

  • In case of bivariate or multivariate normal distributions, we may be interested in discovering and measuring the magnitude and direction of the relationship between two or more variables.
  • For this purpose we use the statistical tool known as correlation. Correlation helps us answer the question: “Do these variables move together, and if so, how strongly?”

  • Definition:

    If the change in one variable affects a change in the other variable, the two variables are said to be correlated and the degree of association ship (or extent of the relationship) is known as correlation.


  • It studies the relation or association between two variables.
  • Two independent variables are not interrelated.
  • The measurement of correlation is called the correlation co-efficient (r) or correlation index, which summarizes in one figure the direction & degree of correlation. The correlation coefficient is a single number that captures both how strong and in which direction the relationship between two variables is.

  • Range of correlation varies between +1 to -1 (i.e. —1 ≤ r ≤ 1). The correlation coefficient never exceeds unity.
  • If r = +1 then we say that there is a perfect positive correlation between x and y. All data points fall exactly on an upward-sloping straight line.
  • If r = -1 then we say that there is a perfect negative correlation between x and y. All data points fall exactly on a downward-sloping straight line.
  • If r = 0 then the two variables x and y are called uncorrelated variables. There is no linear relationship between them.
  • No unit of measurement. The correlation coefficient is a pure number (dimensionless), meaning it does not depend on the units in which the variables are measured. Whether you measure yield in kg/ha or quintals/ha, the correlation coefficient remains the same.

Types of Correlation

Positive

  • If the two variables deviate in the same direction, i.e., if the increase (or decrease) in one variable results in a corresponding increase (or decrease) in the other variable, correlation is said to be direct or positive.
  • Ex:
    • Heights and weights
    • Household income and expenditure
    • Amount of rainfall and yield of crops
    • Prices and supply of commodities
    • Feed and milk yield of an animal
    • Soluble nitrogen and total chlorophyll in the leaves of paddy.

Negative correlation

  • If the two variables constantly deviate in the opposite direction i.e., if increase (or decrease) in one variable results in corresponding decrease (or increase) in the other variable, correlation is said to be inverse or negative.
  • Ex:
    • Price and demand of a goods
    • Volume and pressure of perfect gas
    • Sales of woolen garments and the day temperature
    • Yield of crop and plant infestation

No or Zero Correlation

  • If there is no relationship between the two variables such that the value of one variable change and the other variable remain constant is called no or zero correlation. In this case, knowing the value of one variable provides no useful information for predicting the value of the other.
Types of correlation
Types of correlation

Simple, Partial and Multiple Correlations

  • Simple correlation: When only two variables are studied. For example, studying the relationship between fertilizer dose and crop yield alone.
  • Partial correlation: More than two variables are studied but consider only two to be influencing each other, the effect of other influencing variable being kept constant. For example, studying the relationship between fertilizer and yield while holding rainfall constant.
  • Multiple correlations: Three or more variable are studied simultaneously. This gives a comprehensive picture of how several variables together relate to a response variable.

Linear and Nonlinear Correlation

  • If the amount of change in one variable tends to bear a constant ratio to the amount of change in the other variable is known as linear correlation. Graphically, this relationship plots as a straight line.
  • If the amount of change in variable doesn’t bear a constant ratio to the amount of change in other variable is known as nonlinear correlation. The relationship curve may be parabolic, exponential, or of some other form.
  • In the most of the practical situations we find a nonlinear relationship between variables. For instance, crop yield does not increase indefinitely with fertilizer — beyond an optimum point, additional fertilizer may actually reduce yield, creating a curvilinear relationship.
  • Absence of any relationship between the variable the value of correlation coefficient will be zero.

Methods of studying Correlation

  • Scatter Diagram
  • Karl Pearson’s Coefficient of Correlation
  • Spearman’s Rank Correlation
  • Regression Lines

Each method has its own strengths: the scatter diagram is visual and intuitive, Karl Pearson’s method gives a precise numerical measure for linear relationships, Spearman’s method works with ranked data, and regression lines help in prediction.


Scatter diagram

  • It is the simplest way of the diagrammatic representation of bivariate data. Thus for the bivariate distribution (xi, yi); i = j = 1,2,…n, If the values of the variables X and Y be plotted along the X-axis and Y-axis respectively in the xy-plane, the diagram of dots so obtained is known as scatter diagram.
  • From the scatter diagram, if the points are very close to each other, we should expect a fairly good amount of correlation between the variables and if the points are widely scattered, a poor correlation is expected. This method, however, is not suitable if the number of observations is fairly large. While not precise, the scatter diagram gives a quick visual impression of the nature and strength of the relationship.

Positive Correlation

  • If the plotted points shows an upward trend of a straight line then we say that both the variables are positively correlated.
Positive correlation scatter diagram
Positive correlation scatter diagram

Negative Correlation

  • When the plotted points shows a downward trend of a straight line then we say that both the variables are negatively correlated.
Negative correlation scatter diagram
Negative correlation scatter diagram

No Correlation

  • If the plotted points spread on whole of the graph sheet, then we say that both the variables are not correlated.
No correlation scatter diagram
No correlation scatter diagram

Karl Pearson’s Coefficient of Correlation

  • Prof. Karl Pearson, a British Biometrician suggested a measure of correlation between two variables. It is known as Karl Pearson’s coefficient of correlation. It is useful for measuring the degree of linear relationship between the two variables X and Y. This is the most widely used method for computing correlation in agricultural and biological research.
  • It is usually denoted by rxy or ‘r’.

i) Direct Method:

Correlation formula — direct method (part 1)
Correlation formula — direct method (part 1)
Correlation formula — direct method (part 2)
Correlation formula — direct method (part 2)
Correlation formula — direct method (part 3)
Correlation formula — direct method (part 3)
Correlation formula — direct method (part 4)
Correlation formula — direct method (part 4)

ii) Deviation method

Correlation formula — deviation method
Correlation formula — deviation method
  • Where
    • σx = S.D. of x and σy = S.D. of Y
    • n = number of items
    • dx = x - A, dy = y - B
    • A = assumed value of and B = assumed value of y

The deviation method simplifies calculations by using assumed means (A and B) to reduce the size of numbers being worked with. This was especially useful before calculators and computers became widespread.


Test for significance of correlation coefficient

  • If “r” is the observed correlation coefficient in a sample of “n” pairs of observations from a bivariate normal population, then Prof. Fisher proved that under the null hypothesis

H0: ρ = 0

This null hypothesis states that the population correlation coefficient (denoted by the Greek letter rho, ρ) is zero, meaning there is no true linear relationship between the two variables in the population. The sample correlation “r” that we compute could simply be due to sampling fluctuation.


  • The variables x, y follows a bivariate normal distribution. If the population correlation coefficient of x and y is denoted by ρ, then it is often of interest to test whether ρ is zero or different from zero, on the basis of observed correlation coefficient “r”.
  • Thus if “r” is the sample correlation coefficient based on a sample of “n” observations, then the appropriate test statistic for testing the null hypothesis H0: ρ = 0 against the alternative hypothesis H1: ρ ≠ 0 is
Test statistic for significance of correlation coefficient
Test statistic for significance of correlation coefficient

  • Follows Student’s t — distribution with (n - 2) d.f. The degrees of freedom are (n - 2) because two parameters (the means of X and Y) have been estimated from the data.
  • If calculated value of t > table value of t with (n - 2) d.f. at specified level of significance, then the null hypothesis is rejected. That is, there may be significant correlation between the two variables. Otherwise, the null hypothesis is accepted.

Example

  • From a paddy field, 12 plants were selected at random. The length of panicles in cm (x) and the number of grains per panicle (y) of the selected plants were recorded. The results are given in the following table. Calculate correlation coefficient and its testing.
Example data — panicle length and grains per panicle
Example data — panicle length and grains per panicle

Solution:

a) Direct Method:

Correlation coefficient — direct method solution
Correlation coefficient — direct method solution
  • Where, n = number of observations

  • Testing the correlation coefficient:
  • Null hypothesis H0: Population correlation coefficient “ρ” = 0
  • Under H0, the test statistic becomes
Significance test — t-statistic calculation (part 1)
Significance test — t-statistic calculation (part 1)
Significance test — t-statistic calculation (part 2)
Significance test — t-statistic calculation (part 2)
Significance test — t-statistic calculation (part 3)
Significance test — t-statistic calculation (part 3)
  • T critical (table) value for 10 d.f. at 5% LOS is 2.23
  • Since calculated value i.e. 9.6 is > t table value i.e. 2.23, it can be inferred that there exists significant positive correlation between (x, y). This means the relationship between panicle length and number of grains is not due to chance — longer panicles genuinely tend to have more grains.

b) Indirect Method:

  • Here A = 127 and B = 24
Indirect method — deviation table
Indirect method — deviation table
Indirect method — correlation calculation
Indirect method — correlation calculation
Indirect method — final result
Indirect method — final result

Summary Table

ConceptKey PointExam Tip
Correlation coefficient (r)Measures strength and direction of linear relationshipRange: -1 to +1; unit-free
r = +1Perfect positive correlationAll points on upward line
r = -1Perfect negative correlationAll points on downward line
r = 0No linear relationshipVariables are uncorrelated
Positive exampleRainfall and crop yieldBoth increase together
Negative examplePrice and demandOne up, other down
Simple correlationTwo variables onlyMost common in exams
Partial correlationTwo variables, others held constantIsolates one relationship
Multiple correlationThree or more variablesComprehensive analysis
Karl Pearson’s rMost widely used methodFor linear relationships
Test statistict = r√(n-2)/√(1-r²)d.f. = n - 2
Scatter diagramVisual methodQuick impression of relationship

TIP

Mnemonic for correlation types: “SiPMu” — Simple (2 variables), Partial (2 active, rest constant), Multiple (3+ variables).


Summary Cheat Sheet

Concept / TopicKey Details
Correlation coefficient (r)Measures strength and direction of linear relationship
Range of r-1 to +1; never exceeds unity; no unit (dimensionless)
r = +1Perfect positive correlation — all points on upward line
r = -1Perfect negative correlation — all points on downward line
r = 0Variables are uncorrelated — no linear relationship
Positive correlationBoth variables deviate in same direction (e.g., rainfall and yield)
Negative correlationVariables deviate in opposite direction (e.g., price and demand)
Simple correlationOnly two variables studied
Partial correlationTwo variables studied; others held constant
Multiple correlationThree or more variables studied simultaneously
Linear correlationChange bears a constant ratio — plots as straight line
Nonlinear correlationChange does not bear constant ratio — curved relationship
Scatter diagramSimplest visual method for bivariate data
Karl Pearson’s rMost widely used; measures degree of linear relationship
Bivariate normalJoint distribution of two continuous concomitant variables
Test statistic for rt = r√(n-2)/√(1-r²); follows t-distribution with (n-2) d.f.
Significance testIf t_calc > t_table → reject H₀ (significant correlation)
Direct methodUses raw values of X and Y directly
Deviation methodUses assumed means A and B to simplify calculations
Positive exampleHeights-weights, feed-milk yield, rainfall-crop yield
Negative examplePrice-demand, yield-pest infestation
Practical situationsMost relationships are nonlinear in practice
🔐

Pro Content Locked

Upgrade to Pro to access this lesson and all other premium content.

Pro Popular
199 /mo

₹2388 billed yearly

  • All Agriculture & Banking Courses
  • AI Lesson Questions (100/day)
  • AI Doubt Solver (50/day)
  • Glows & Grows Feedback (30/day)
  • AI Section Quiz (20/day)
  • 22-Language Translation (30/day)
  • Recall Questions (20/day)
  • AI Quiz (15/day)
  • AI Quiz Paper Analysis
  • AI Step-by-Step Explanations
  • Spaced Repetition Recall (FSRS)
  • AI Tutor
  • Immersive Text Questions
  • Audio Lessons — Hindi & English
  • Mock Tests & Previous Year Papers
  • Summary & Mind Maps
  • XP, Levels, Leaderboard & Badges
  • Generate New Classrooms
  • Voice AI Teacher (AgriDots Live)
  • AI Revision Assistant
  • Knowledge Gap Analysis
  • Interactive Revision (LangGraph)

🔒 Secure via Razorpay · Cancel anytime · No hidden fees

Lesson Doubts

Ask questions, get expert answers

Lesson Doubts is a Pro feature.Upgrade