Validity
Part I: Threats to Validity
Key points:
Internal Validity
Internal validity deals with cause and effect. Did the experimental
treatment - and only the experimental treatment - have (or cause)
an effect?
External Validity
External validity refers to the generalizability of findings from a study
- to what populations, settings, conditions, etc., can the findings be
generalized?
Threats to Internal Validity
History:
Specific events occur between a first and second
measurement, that affect the experiment's outcome.
Note: This is different from Time - a threat to
external validity - where some event happens to all groups.
Intra-group History:
This is related to history. Specific
events happen to one group and not to another, that affect the
experiment's
outcome.
Maturation:
Normal biological, psychological, or
physiological process occur within the subjects as time passes. This is
especially important with young children.
Testing:
The effects of taking a pretest affect subsequent
performance on a highly related (or the same) posttest.
Instrumentation:
Changes in the instrument (test,
questionnaire) or the evaluators (different observers) produce changes
independent of any true experimental effects.
Regression:
The tendency of persons with extreme high or
low scores on the first test to have less extreme results the second time
around, on average. (Always a threat with pre/post tests.)
Selection
Any and all factors that make one group of
subjects different from another group of subjects at
the beginning of an experiment. Random assignment, not random
selection, strengthens internal validity. If subjects are randomly
assigned to groups, it is not expected that selection will be a threat to
internal validity.
Mortality:
A differential loss of subjects from the groups
being compared in an experiment. Just randomly throwing out subjects to
keep the N equal will not fix this threat.
Interaction:
Connection of two or more threats such that they
result in an interaction effect; particularly likely is the interaction
between selection and one of the other threats.
Threats to External Validity
Unrepresentative sample:
The sample does not represnt or
"mirror" the population - usually results from an inability to randomly
select the sample from the population you want to generalize to.
Reactive Effects:
Do study conditions (other than the treatment) cause subjects to react or
behave differently than they would if they weren't being studied?
- Pretest Sensitization: Like testing (threat to internal
validity), the pretest modifies the subject so he/she behaves differently
than un-pretested subjects.
- Hawthorne Effect: The fact that subjects know they are
being
studied affects the results.
- Novelty Effect: This occurs when the responses of the study
are
partly a function of the newness or novelty of the experimental
approach.
- Experimenter Effect: One experimenter administers the
treatment differently from the way another experimenter does (this ties in
with inter-rater reliability).
Time:
A historical event at the time of the study that
happens to all subjects alters the results (they would be different if
the experiment were conducted at a different time.)
Note: If the
event happened to only one group, then it is History, a threat to
internal validity, not external validity.
Other Threats:
- Multiple-Treatment Interference: Multiple treatments are
applied sequentially and subjects experience cumulative effects that
cannot be sorted out. This is why you want to use that Latin Square, a
counterbalanced design.
- Nongeneralizability of the Dependent Variable: This is when
the instrument used to measure the DV is not representative of the
population of such measures (two anxiety scales may give different results
for the same sample).
- Ambiguous Independent Variable: This occurs when the IV is not
clearly and operationally defined, so the study cannot be replicated
exactly. Like "The DARE Program" - just what does it mean? How was it
conducted?
Part II: Fundamental Concepts of Validity
What is validity??
Validity asks how well a test measures what it purports to measure.
Kinds of validity
- Content validity: How representative of a content domain are
items or tasks in a test? Are they a good sample of the total
subject-matter content? Do they represent the tasks to be performed in the
target environment?
- Criterion related validity: There are two kinds.
- Concurrent validity: You want to develop a test, or replace a
test, that is currently in use. You substitute the new measure for
one that is already available, that measures the same trait. The old
measure is the criterion.
- Predictive validity: You either predict future performance, or
you correlate performance on two tests separated by a decent time
interval (SAT vs. GPA on report cards a year later.) If you see "expect",
you are probably dealing with predictive validity.
- Construct validity: How well does an instrument measure some
theoretical construct of interest (usually in quotes). Do students who
score high on "xyz" actually have a high performance on some activity?
This usually requires several different measures to get at the theoretical
construct; usually a panel of experts are called in to judge the test for
appropriateness, accuracy, importance, etc.
Validity, reliability, and variability
Do not mix these up!
- The reliability (rxx) is the degree of linear relationship
between two
parallel forms of the same test, or the same test given before and after
some intervening time inteval, or of two halves of the same test. It
refers to the consistency, generalizability, stability, or dependability
of a test score.
- The validity coefficient of a measure is the square root
of the reliability. Validity coefficient is sqrt(r).
- The variability (or variance) is the square of the
reliability. Variance is r**2; it is the same thing as variability. It is
similar to the coefficient of determination that tells you the proportion
of the variance that two correlated variables share.
- Do not confuse validity with variability - and be sure the square root
goes in the proper direction!!! Read the questions carefully: both of
these quantities start with "V".
Restriction of range
This is when you are dealing with criterion-related validity. In other
words, you are selecting a group of people by pass/fail depending on some
sort of a cut score. That means you throw out the people who fail -
anyone who scores less than the cut score. By chopping the bottom tail
off the distribution of scores, you restrict the range. This lowers the
reliability of the test and "attenuates the validity coefficient" since
they are related.
Criterion measures
A really adequate criterion measure (or test) should produce scores that
correlate highly with the actual or target behavior. This is usually very
difficult to do, like having dentists demonstrate good coordination skills
and then use that to predict how well they can carve a good tooth
reconstruction.
Back to Statistics course
Lorraine Sherry
http://www.cudenver.edu/~lsherry/validity.html
Updated March 15, 1997