Multiple Regression
Multiple regression answers questions like this:
What is the relationship between job satisfaction and the combination of
three variables: age, years of work experience, and years of education?
Simple regression is used to develop prediction equations, such as the
prediction of college GPA from SAT scores.
Key points:
- Regression requires interval or ratio data for both variables.
- Good predictor variables correlate well with the DV and very little
with each other.
- Regression is a form of correlation: you use it to predict some DV Y
from some IV X. The equation is Y = bX + a where b is the slope of the
regression line (varies
between + and - 1) and a is the Y-intercept (this is backwards from
algebra I).
- Slope: b = rxy * (Sy/Sx) where rxy is the correlation
coefficient between X
and Y, and the s's are the standard deviations. (rxy is the same as r.)
- rxy is usually the Pearson correlation.
- Null hypothesis: r = 0
- Fat scattergrams have low correlation. The lowest is zero, which
means either no correlation or a nonlinear relationship.
- In comparing a positive and negative correlation coefficient of equal
size, we compare the absolute value of r and say that they are THE SAME.
- If r is the correlation between two variables, then r**2 says how
much of the variance in Y can be predicted by the value of X.
Standard error of estimate s(x.y)
- Standard error of estimate: s(y.x) = sy * sqrt(1 - r**2)
- There is an INVERSE relationship between s(y.x) and rxy.
- S(y.x) can never exceed Sy because that sqrt is always < 1.0.
- A 95% confidence interval around a predicted Y score is two s(x.y) on
either side of the predicted score.
Types of correlation
- Simple correlation rxy, the correlation between variables X and Y.
- Partial correlation is the relationship between Y and X2 with X1 held
constant (i.e. X1 is independent of X2). The symbol is r(Y2.1) It is the
correlation between Y and X2 removing the influence of variable X1.
- Multiple correlation is the relationship between the DV (i.e. Y) and
both IV's taken together (i.e. X1 and X2). The symbol is actually given
in terms of R**2(Y.12).
- R**2(Y.12) is the variance in the DV predicted by both IVs taken
together.
- If you added a third variable (X3) to the multiple regression
equation, a desirable characteristic is that it predicts even more of
the variance in the DV (i.e. Y), and that it doesn't correlate very well
with either of the other IVs.
- If you have three variables, the percentage of the variance in
Variable 3 that can be predicted from Variable 1 is r(13)**2
Back to Statistics course
Lorraine Sherry
lsherry@carbon.cudenver.edu
Updated December 6, 1996