What is the difference between simple and multiple linear regression?

Simple regression has one predictor; multiple regression has two or more. The bigger conceptual difference is that simple-regression slopes describe a marginal relationship between Y and X, while multiple-regression slopes describe a partial relationship — the predicted change in Y per one-unit change in that predictor, holding every other predictor in the model constant.

Why does a coefficient change when I add another predictor?

Because the new predictor takes over some of the variation in Y that the original predictor was previously absorbing. If experience and education are correlated, regressing wage on experience alone gives a coefficient that includes education's effect. Adding education separates the two, so experience's coefficient typically shrinks. This is the whole point of multiple regression.

Is a higher R² always better in multiple regression?

Adding any predictor — useful or not — can only increase R², so a higher R² is not by itself evidence of a better model. Use adjusted R² to compare models with different numbers of predictors; it penalizes adding predictors that do not improve the fit enough to justify the extra parameter. Better still, look at out-of-sample prediction error.

Can multiple regression prove that one predictor causes Y?

No. Multiple regression can show that a predictor is associated with Y after controlling for the other variables in the model, but causation requires either a randomized experiment or strong assumptions about what other confounders might exist outside the model. A controlled predictor with a significant coefficient is suggestive evidence, not proof.

Simple vs. Multiple Regression: What Changes When You Add Predictors

Simple linear regression has one predictor. Multiple regression has two or more. That sounds like a tiny change, but the moment a second predictor enters the model, the interpretation of every coefficient shifts in a way that catches most students off guard. This guide pins down what each model claims, what changes when you go from one predictor to several, and how to read the coefficients of a multiple-regression output correctly.

What Simple Linear Regression Says

Simple linear regression fits a straight line that predicts a quantitative response Y from a single predictor X:

Ŷ = b₀ + b₁X

The two coefficients have clean meanings:

b₀ (intercept). The predicted Y when X = 0. Often not directly meaningful (it can be the predicted GPA when SAT = 0), but mathematically necessary to anchor the line.
b₁ (slope). The predicted change in Y for a one-unit increase in X. Positive slope means Y rises with X; negative slope means Y falls with X.

A worked example: regressing exam score (Y) on hours studied (X) on a class of 30 students might give Ŷ = 50 + 4 × hours. The intercept 50 is the predicted score for a student who studied zero hours. The slope 4 says each additional hour of study is associated with a 4-point predicted increase in score.

The crucial qualifier is associated with, not causes. Regression on observational data describes patterns; it does not establish causation. For more on that distinction, see correlation vs. causation.

What Multiple Regression Says

Multiple regression generalizes the line to a plane (with two predictors) or a higher-dimensional fit. With p predictors:

Ŷ = b₀ + b₁X₁ + b₂X₂ + ... + bₚXₚ

The intercept is still the predicted Y when every predictor is zero. Each slope bⱼ now has a more careful interpretation:

> bⱼ is the predicted change in Y for a one-unit increase in Xⱼ, holding every other predictor in the model constant.

That last clause is the entire difference between simple and multiple regression. The slope in a simple regression is a marginal relationship — Y and X together, ignoring everything else. The slope in a multiple regression is a partial relationship — Y and Xⱼ, after accounting for the other predictors.

A 3D regression plane fit through scattered points across two predictor axes — A 3D regression plane fit through a scatter of points with two predictor axes and one response axis

A Worked Example: Why the Coefficient Changes

Suppose you regress hourly wage (Y) on years of experience (X₁) and education in years (X₂).

A simple regression of wage on experience alone might give:

Ŷ = 12 + 0.8 × experience

A multiple regression of wage on experience and education might give:

Ŷ = 5 + 0.5 × experience + 1.2 × education

The same predictor — experience — has a different coefficient in the two models. Why?

In the simple regression, the 0.8 captures the total association between experience and wage, including the fact that experienced workers also tend to be more educated. Education is "hiding" inside experience's coefficient.

In the multiple regression, education is in the model on its own, soaking up that part of the variation. What is left for experience is the association with wage after controlling for education — only 0.5 dollars per additional year of experience, holding education fixed. The 1.2 on education says that one more year of education predicts a $1.20 higher wage for workers with the same experience.

This is the "all else equal" interpretation, and it is the foundation of why multiple regression is useful: it lets you ask whether one predictor still matters after accounting for others.

Three Things That Shift With More Predictors

Beyond the "holding others constant" clause, three things change when you go from simple to multiple regression.

The intercept. With one predictor, b₀ is Y at X = 0. With several, b₀ is Y when every predictor is zero. That combination of zero values may be far from any real observation.
R² interpretation. Simple regression's R² is the square of the correlation between X and Y. Multiple regression's R² is still "the proportion of variance in Y explained by the predictors," but it always grows when you add a predictor — even a useless one. Adjusted R² penalizes for the number of predictors and is the right comparison statistic across models.
Significance tests. Each predictor in a multiple regression gets its own t-test for whether its slope differs from zero, and the model gets an F-test for whether any predictor is useful. The two tests can disagree — F can reject "no predictor is useful" while no individual t-test reaches significance, especially when predictors are correlated with each other.

Multicollinearity: The Trap of Correlated Predictors

Multiple regression assumes the predictors carry separate information. When two predictors are highly correlated with each other — height and arm span, hours studied and number of problems completed — the model has trouble assigning credit between them. The symptoms:

Large standard errors on the affected coefficients. The slopes are "right" on average but very imprecise from sample to sample.
Coefficients that flip sign or change wildly when you add or drop one of the correlated predictors.
A model that fits well overall (high R², significant F-test) while no individual predictor is statistically significant.

A common diagnostic is the variance inflation factor (VIF) for each predictor; values above about 5 or 10 are flags. The fix is usually to drop one of the redundant predictors, combine them into a single index, or accept the loss of individual interpretability if the model is being used purely for prediction.

When to Reach for Each

Simple regression is the right tool when you have one predictor and one response and the question is genuinely about that pair — what does GPA look like as a function of study hours, ignoring everything else.

Multiple regression is the right tool when several predictors plausibly matter and you want to isolate the contribution of one given the others. It is also the right tool whenever you suspect a confounder is muddying a simple relationship. Adding the confounder to the model lets you ask the partial-coefficient question.

A common mistake: running a separate simple regression for each predictor and treating those slopes as if they were partial slopes. They are not — they are marginal slopes, each contaminated by whatever the others do. Multiple regression has to be run as one model.

Getting Help

Once a multiple regression has been fit, the real challenge is reading the output. Reading regression output walks through coefficients, standard errors, t-statistics, p-values, and R² in a single example. To dig into what R² actually measures and where it misleads, interpreting R-squared is the companion piece.

Conclusion

Simple vs. multiple regression is not a different idea — it is the same line-of-best-fit principle extended to several predictors at once. The math is bigger, but the conceptual change comes down to a single clause: every multiple-regression slope is interpreted holding the other predictors constant. That clause is what makes multiple regression useful, and forgetting it is what produces most misreadings of regression output. If a coefficient changes when a new predictor is added, that is the model telling you the original number was lumping two effects together.

Simple vs. Multiple Regression: What Changes When You Add Predictors

What Simple Linear Regression Says

What Multiple Regression Says

A Worked Example: Why the Coefficient Changes

Three Things That Shift With More Predictors

Multicollinearity: The Trap of Correlated Predictors

When to Reach for Each

Getting Help

Conclusion

Frequently Asked Questions

Clear study guides,
straight to your inbox.

What Simple Linear Regression Says

What Multiple Regression Says

A Worked Example: Why the Coefficient Changes

Three Things That Shift With More Predictors

Multicollinearity: The Trap of Correlated Predictors

When to Reach for Each

Getting Help

Conclusion

Frequently Asked Questions

Keep reading

Interpreting R-Squared: What It Measures and Three Traps to Avoid

McGraw-Hill Math Answers

Clear study guides,straight to your inbox.

Clear study guides,
straight to your inbox.