Factor Analysis
Key Concepts
- Factor analysis is a device for ordering and simplifying correlations
between related variables. It is used to reduce a large set of
intercorrelated variables to a small set of factors that can be used to
represent a construct. The factors represent underlying, unobservable
constructs.
- It uses correlation matrices to reduce data - there are no independent
or dependent variables here.
- Do not attribute causality to the factors. You are lucky if you can
name them meaningfully. It's best to follow up with other tests to get at
the underlying constructs that you find here.
- If the analysis converges, the ideal situation is to get 3 or 4
meaningful factors that explain most of the variance, and that you can
name.
- Factor analysis has its roots in correlations and in solid geometry.
I am using the explanation from Dennis Child. (1973). The Essentials of
Factor Analysis. London: Holt, Rinehart, and Winston.
- From here on, each variable will be associated with a vector
that comes from the regression line of its scattergram.
The Correlation Matrix
- This is a LTT (lower triangular table) that lists the correlations of
each variable with each and every other variable. The autocorrelation of
each variable with itself is 1.0, and is down the main diagonal.
- Visualize a vector that represents the regression line of best
fit for each variable, as the spokes of an umbrella. Let the handle of
the umbrella be variable #1. Then these correlations are given by the
magnitude of the vector times the cosine of the angle it makes with the
umbrella handle. The bigger the angle, or the less the magnitude, the
smaller the correlation.
- To do factor analysis, you've got to get rid of
the magnitude component - i.e., make it equal to 1.0.
- If you convert your scores (the value of each variable) to standard
scores by dividing the difference between the sample mean and the
individual score by the sample standard deviation
(X. - Xbar)/s
then you get rid of the magnitude (all vectors now have a magnitude of
1.0).
- From here on, you only work with cosines. Remember from geometry: the
projection of one vector of unit magnitude onto another vector of unit
magnitude is equal to the cosine of the angle between them. This is
very important.
Principal Components
- This begins the process of finding factors by resolving vectors. It
is known as extracting components. It will extract as many factors
as there are variables.
- Go back to the umbrella analogy again. With a "good guess", you set
up a test vector - that is the umbrella handle. You want the
loadings - the cosines of the angles between each vector and the
test vector - to be maximized (that is, to give the greatest possible
sum). Computer programs run iterations to do this maximization.
- You have now found your first factor. It is the resultant of
all your vectors.
- That is OK for one dimension, one factor - but one factor cannot
capture all the variance. You need to set up a second test vector
at a right angle to the first factor. Where? Start with a good
guess, and iterate until the cosines of the angles between each vector and
the second test vector are maximized. Naturally, they won't have as large
loadings as they did for the first vector.
- You have now found your second factor.
- Keep going until you have enough factors to account for all the
variance. There will be as many factors as you had variables to start.
But only the first few should account for most of the variance.
Interpreting principal components
- The sum of the squares of the loadings for each
variable have to sum to 1.0. This is a reality check.
- Factors are extracted based on the size of the correlations, using
linear combinations of the loadings of each variable.
- The sum of the squares of the loadings for each factor will
*not* sum to 1.0. The first factor, being the best guess, has the
greatest sum and thus accounts for the greatest part of the total
variance.
Communality
- A good factor has a lot of common variance, i.e., it accounts
for a large part of the intercorrelations between several variables.
- The sum of all the common factor variance of a single variable is
known as the communality, (h squared) i.e., the variance shared in
common with other variables.
SPSS tables do not look like Child's tables
- SPSS prints out a table of initial statistics that shows you
what
percentage of the total variance each factor accounts for, in order of
importance. For all of your factors, they sum to 100%.
- SPSS prints out a factor matrix that shows the loadings of each
variable on each factor. That's just an intermediate result.
- SPSS prints out final statistics, selecting only those factors
that have an eigenvalue of 1.0 or higher.
- SPSS prints out a scree test that graphs up the eigenvalue for
each factor. That gives you a visual picture of the relative importance
of your factors.
Eigenvalues
- Practice:
An eigenvalue is a "cutoff point" - any factor
should account for at
least the variance of a single variable. If not, its eigenvalue is less
than 1.0 and it is dropped.
- Theory:
Eigenvalues or "principal roots" are
derived from the physics of vibrating strings; they represent the roots of
the wave equation. The first eigenvalue is most important, and
corresponds the fundamental frequency of a vibrating string.
Subsequent eigenvalues correspond to higher harmonics.
Next Steps
- You could actually stop right here, because SPSS lists out the factor
loadings for each variable, and then sums up the percent of the
total variance associated with each factor.
- The problem is - some variables may have large loadings on more than
one factor. That's why we want to rotate the axes of the factors in
N-dimensional space (3-D for 3 factors, 4-D for 4 factors, etc.) This is
done with matrices, and is easy with a computer. You can't visualize this
for more than 3 factors.
Rotation
- Varimax rotation means you keep your axes orthogonal (90 degrees
apart).
- Oblimin rotation means you don't have to stick with 90 degrees.
- Nobody seems to agree which is best - do both and keep the one that
works for you.
- A good rotation should separate the variables so that highly
correlated variables load a lot on one factor and very little on the
others. In reality, this doesn't always happen.
- In practice - you and your advisor decide on a cutoff point (I use
50%). Circle or highlight any loadings of 50% or higher. Look at the
corresponding test items (the variables) and use your crystal ball to see
if you can find a pattern or common threads. If so, you can name the
factors.
Back to Statistics course
Lorraine Sherry
http://www.cudenver.edu/~lsherry/factor.html
Updated March 17, 1997