Lesson
07 of 18
Translate

🏹Regression Analysis

Simple linear regression, direct and deviation methods, regression coefficient properties, and comparison with correlation — with agricultural examples

If a soil scientist knows the nitrogen content of a field, can she predict the expected wheat yield? Correlation tells us that nitrogen and yield are related, but regression goes further — it gives us an equation to make that prediction. This predictive power makes regression one of the most practical tools in agricultural research.


  • The term ‘regression’ literally means ‘stepping back towards the average’. This concept originated from the observation that extreme values in one generation tend to move (“regress”) back towards the population average in the next generation.

  • It was first used by a British Biometrician Sir Francis Galton. Galton noticed that very tall parents tended to have children who were tall but closer to the average height than their parents. He called this phenomenon “regression towards mediocrity.”

  • The relationship between the independent and dependent variables may be expressed as a function. Such functional relationship between two variables is termed as regression. While correlation tells us the strength and direction of a relationship, regression goes further by providing an equation that allows us to predict the value of one variable based on the other.
  • In regression analysis independent variable is also known as regressor or predictor or explanatory variable while dependent variable is also known as regressed or explained variable.

  • When only two variables are involved the functional relationship is known as simple regression.
  • If the relationship between two variables is a straight line, it is known as simple linear regression, otherwise it is called as simple non-linear regression. In agricultural research, simple linear regression is widely used to study relationships such as the effect of fertilizer dose on crop yield or the effect of plant density on grain weight.

Direct Method

The regression equation of Y on X is given as

Y = a + bX

  • Where,
    • Y = dependent variable — the variable we want to predict.
    • X = independent variable — the variable we use to make the prediction.
    • a = intercept — the value of Y when X is zero. It represents the baseline value of the dependent variable.
    • b = the regression coefficient (or slope) of the line — it tells us how much Y changes for every one-unit increase in X.
    • a and b are also called as Constants, the constants a and b can be estimated with by applying the “least squares method”. The method of least squares finds the values of a and b that minimize the sum of squared differences between the observed and predicted values of Y, giving us the “best fit” line.

  • Range of regression is varying between -∞ to +∞. Unlike the correlation coefficient which is bounded between -1 and +1, the regression coefficient has no such bounds because it depends on the units and scale of the variables.
  • This gives,
Regression formula — estimating constant b
Regression formula — estimating constant b
  • And
Regression formula — estimating constant a
Regression formula — estimating constant a
  • Where b is called the estimate of regression coefficient of Y on X and it measures the change in Y for a unit change in X.

  • Similarly, the regression equation of X on Y is given as

X = a1 + b1Y

  • Where X = dependent variable and Y = independent variable

Note that there are always two regression equations for a pair of variables — the regression of Y on X, and the regression of X on Y. These two lines are generally different (unless the correlation is perfect, i.e., r = +1 or r = -1).

Regression formula — X on Y equation
Regression formula — X on Y equation
  • And
Regression formula — estimating b1 for X on Y
Regression formula — estimating b1 for X on Y
  • Where b1 is known as the estimate of regression coefficient of X on Y and ‘a’ is intercept

Deviation Method

  • The regression equation of Y on X is
Regression of Y on X — deviation method
Regression of Y on X — deviation method

The deviation method expresses the variables as deviations from their respective means, which simplifies the computation by eliminating the need to calculate the intercept separately.

  • The regression equation of X on Y
Regression of X on Y — deviation method
Regression of X on Y — deviation method

Properties of Regression Coefficient

These properties are important for understanding how regression coefficients behave and how they relate to the correlation coefficient:

  • Correlation coefficient is the geometric mean of the two regression coefficients i.e.

r = ± √ b.b1

This relationship connects regression and correlation. The sign of r is determined by the sign of the regression coefficients (both will have the same sign).


  • If one of the regression coefficients is greater than unity, the other must be less than unity. This is a direct consequence of the fact that r cannot exceed 1 in absolute value. Since r² = b x b₁, if b > 1, then b₁ must be < 1 (and vice versa) to keep their product ≤ 1.
  • Arithmetic mean of the regression coefficients is greater than the correlation coefficient “r”.
  • Regression coefficients are independent of the change of origin but not of scale. This means shifting all X values by a constant does not change the regression coefficient, but multiplying all X values by a constant does change it.

  • Units of “b” are same as that of the dependent variable. More precisely, the unit of b is the unit of Y per unit of X. For example, if Y is yield in kg/ha and X is fertilizer in kg/ha, then b has units of kg yield per kg fertilizer.
  • Regression is only a one-way relationship between Y (dependent variable) and X (independent variable). Unlike correlation (which is symmetric), regression is directional — the regression of Y on X is different from the regression of X on Y.
  • The range of b is from -∞ to ∞. -∞ for negative b and +∞ for positive b.

👀 Note:

  • Both the lines regression pass through the point (X, Y). In other words, the mean values (X, Y) can be obtained as the point of intersection of the two regression lines. This is a very useful property — it means the two regression lines always cross at the point of averages.

  • If r = 0, the two variables are uncorrelated, the lines of regression become perpendicular to each other. This makes intuitive sense — when there is no relationship, the regression of Y on X is a horizontal line (Y = mean of Y), and the regression of X on Y is a vertical line (X = mean of X), which are perpendicular.
  • If r = ±1, in this case the two lines of regression either coincide or they are parallel to each other. When correlation is perfect, both regression lines become identical — they merge into a single straight line on which all data points lie.
  • If the regression coefficients are positive, r is positive and if the regression coefficients are negative, r is negative.

Distinguish between Correlation and Regression

CorrelationRegression
Correlation is the relationship between two or more variables. Where the change in one variable affects a change in another variable.Regression mathematical measure of the average relationship between two or more variables. Where one variable is dependent and other variable is independent.
Correlation coefficient measures extent of relationship between two variables.Regression coefficient estimates the change in one variable for a unit change in another related variable.
Correlation is a two-way relationshipRegression is a one-way relationship i.e. Y = a + bx
Change in X is independent of change in YChange in Y cannot change X
Correlation coefficient is independent of units of the variables.Regression coefficient is expressed in the units of dependent variable.
Correlation coefficient always lies between -1 and +1Regression coefficient lies between -∞ and +∞
In correlation both the variables are in dependentA regression variable will be in dependent and other variable will be independent.
Correlation coefficient calculated by r = Cov(x,y) / σxσyThe regression coefficient of y on x: byx = Cov(x,y) / V(x)

IMPORTANT

In summary: Correlation measures the strength and direction of a linear relationship (symmetric, unit-free, range -1 to +1). Regression establishes a predictive equation between a dependent and independent variable (directional, has units, range -∞ to +∞). Both tools are complementary and are often used together in agricultural research.


Summary Table

ConceptKey PointExam Tip
RegressionFunctional relationship; prediction equationY = a + bX
PioneerSir Francis Galton”Regression towards mediocrity”
Range of b-∞ to +∞Unlike r which is -1 to +1
Regression coefficient (b)Change in Y per unit change in XHas units (Y-units per X-unit)
Two regression linesY on X and X on Y (generally different)Coincide only when r = ±1
Intersection pointBoth lines pass through (X̄, Ȳ)Point of averages
r = 0Lines become perpendicularNo relationship
r = ±1Lines coincidePerfect relationship
r from regressionr = ±√(b × b₁)Sign matches regression coefficients
Least squares methodMinimises sum of squared errorsBest fit line
FeatureCorrelationRegression
PurposeMeasures strength of relationshipPredicts one variable from another
DirectionSymmetric (rxy = ryx)Directional (Y on X ≠ X on Y)
UnitsUnit-freeSame as dependent variable
Range-1 to +1-∞ to +∞
Number of equationsOne coefficientTwo regression equations

Summary Cheat Sheet

Concept / TopicKey Details
RegressionFunctional relationship for prediction; Y = a + bX
PioneerSir Francis Galton — “regression towards mediocrity”
Regression coefficient (b)Change in Y per unit change in X; has units
Range of b-∞ to +∞ (unlike r which is -1 to +1)
Least squares methodFinds a, b by minimising sum of squared errors
Two regression linesY on X and X on Y — generally different
Intersection pointBoth lines pass through (X̄, Ȳ) — point of averages
When r = 0Two lines become perpendicular
When r = ±1Two lines coincide (merge into single line)
r from regressionr = ±√(b × b₁); sign matches regression coefficients
If b > 1Then b₁ must be < 1 (and vice versa)
AM of regression coefficientsAlways greater than correlation coefficient r
Independent ofChange of origin but not of scale
Regression isOne-way relationship (directional); not symmetric like correlation
CorrelationMeasures strength; symmetric, unit-free, range -1 to +1
RegressionEstablishes prediction equation; directional, has units
Simple regressionOnly two variables involved
Simple linearStraight-line relationship: Y = a + bX
Simple non-linearCurved relationship between two variables
Intercept (a)Value of Y when X = 0 (baseline)
Slope (b)Rate of change — how much Y changes per unit X
Deviation methodExpresses variables as deviations from their means
Positive coefficientsr is positive; negative coefficients → r is negative
🔐

Pro Content Locked

Upgrade to Pro to access this lesson and all other premium content.

Pro Popular
199 /mo

₹2388 billed yearly

  • All Agriculture & Banking Courses
  • AI Lesson Questions (100/day)
  • AI Doubt Solver (50/day)
  • Glows & Grows Feedback (30/day)
  • AI Section Quiz (20/day)
  • 22-Language Translation (30/day)
  • Recall Questions (20/day)
  • AI Quiz (15/day)
  • AI Quiz Paper Analysis
  • AI Step-by-Step Explanations
  • Spaced Repetition Recall (FSRS)
  • AI Tutor
  • Immersive Text Questions
  • Audio Lessons — Hindi & English
  • Mock Tests & Previous Year Papers
  • Summary & Mind Maps
  • XP, Levels, Leaderboard & Badges
  • Generate New Classrooms
  • Voice AI Teacher (AgriDots Live)
  • AI Revision Assistant
  • Knowledge Gap Analysis
  • Interactive Revision (LangGraph)

🔒 Secure via Razorpay · Cancel anytime · No hidden fees

Lesson Doubts

Ask questions, get expert answers

Lesson Doubts is a Pro feature.Upgrade