Klimczak, A.K., & Wedman, J. (1996). Instructional design project indicators: An empirical basis. Performance Improvement Quarterly, 9 (4), 5-18.
Lorraine Sherry
March, 1997
The second research question was essentially: Are there any significant differences between the mean perceived importance ratings of the individual success indicators that were reported by subjects representing four independent groups of stakeholders: designers, sponsors, trainers, and learners? Here, they did have a tacit research hypothesis. For the estimates of perceived importance on their success indicators, they were looking to reject:
H0: mu (item 1) = mu (item 2) = ... = mu (item 7).Also, since they were dealing with four different groups, they were looking for a main effect by group, i.e. to reject:
H0: mu (designers) = mu (trainers) = mu (sponsors) = mu (learners).They also wanted to know if there was any interaction of group x item.
| Group | Subjects | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
| 1 | 1..16 | * | * | * | * | * | * | * |
| 2 | 17..32 | * | * | * | * | * | * | * |
| 3 | 33..48 | * | * | * | * | * | * | * |
| 4 | 49..64 | * | * | * | * | * | * | * |
Thus, the design was a 4 x 7 two-way ANOVA where J = 4 was the number of groups and K = 7 was the number of success indicators, giving 28 cells in all. The descriptive statistics from the questionnaire responses are reported in Table 5, with means and standard deviations by each cell (group and success indicator). Looking at Table 5 in the article, you can see that the ANOVA design is clearly described (once you eliminate the "all groups" and "all indicators".) They also report the means for all groups and all indicators, but they did not include these in the ANOVA.
The design certainly addresses Research Question 2, which was the authors' intent. Using a two-way ANOVA was certainly preferable to using seven (or four) one way ANOVAs, because it enabled them to find a significant interaction. There was no reason to use an ANCOVA, since there were no covariates. To add state or gender to a design with 28 cells would probably have resulted in unfilled cells, so this didn't seem to be appropriate. Table 2 summarizes the two-way ANOVA:
| SV | SS | df | MS | F ratio | Signif? |
| Between Subjects | 386.41 | 64-1=63 | |||
| Groups (G) | 16.85 | 4-1=3 | 5.62 | 0.91 | no |
| Subjects within Groups (s:G) |
369.56 | 64-4=60 | 7.16 | ||
| Within subjects | 763.93 | 384 | |||
| Indicators (I) | 144.04 | 7-1-6 | 24.01 | 16.03 | p < .01 |
| Groups x Indicators (IxG) |
80.84 | (4-1)*(7-1)=18 | 4.49 | 3.00 | p < .01 |
| Ixs:G | 539.05 | (6)*(60)=360 | 1.50 | ||
| Total | 1150.34 | (7)*(64)=447 |
There was no significant difference among the groups. No group main effect means that the four groups were consistent in terms of the relative importance they attached to the seven success indicators. All groups ranked "job performance" as the most important success indicator. When the authors investigated the responses of different groups to different questions, they could see that there was a group x indicators interaction as shown in the ANOVA table.
One of the strong points of the paper is that the authors took the time to explain the interaction: stakeholder perspectives can potentially be a confounding influence when evaluating ID project success. For example, sponsors placed higher importance on "project budget" and lower importance on "job performance", whereas trainers, being less critical of the project than sponsors, placed higher importance on "job performance" and lowest importance on "project budget". (From a practical standpoint, trainers would very much like to keep their jobs!)
A graph of the cell means to illustate the interaction would heve been a useful addition to the article. Also, the follow-up one-way ANOVAS to the interaction were not such a good idea.
There was a significant main effect among the seven success indicators at the p < .01 level, so the authors used a Tukey pairwise comparison for the necessary follow-up MCP. This was sensible. However, if it were me, I'd have chosen Newman Keuls rather than Tukey, because for simple pairwise comparisons Newman Keuls is more powerful than Tukey, but also has a higher Type I error rate. Since the calculated p value was so low, the Type I error rate wasn't an issue here, so I'd have gone for more power. There was a significant difference between groups on a single success indicator ("bottom line"): the Tukey MCP revealed that designers differed significantly from sponsors and trainers in terms of the importance attached to this indicator.
To validate the review and subsequent clustering, the authors mailed the results of the literature review to six ID professionals (i.e. independent SMEs) who were asked to provide additions to and elaborations on the review results. The authors also conducted three independent focus group interviews, each with four to eight stakeholders involved in ID projects, to build on the list of project success indicators generated through the literature review. Combining the information from the three sources, namely, the literature review, the professionals' responses, and the focus group comments, they identified 31 tentative success indicators.
As with any qualitative method scheme, this procedure took several iterations until the final set of success factors were isolated. I have been through this, and I know how much work it takes! In a qualitative methods dissertation, one doctoral student took four complete iterations through all the data to isolate three independent themes relating to girls' reactions to reading stories. These authors did a good job in the two iterations that they described.
There are six criteria upon which the validity of a qualitative research study rests, which I thought the authors addressed adequately:
It might have been nice if they had done a factor analysis on these seven success indicators to find out if they were truly independent or whether they clustered in any way.
Few women were represented in the sample. Gender couldn't be used as another independent variable because they wouldn't have been able to fill all cells in the design . Here is the gender information for the four independent groups of stakeholders: designers: 7 M, 9F; trainers: 15 M, 1 F; sponsors 15 M, 1F; learners: 13 M, 3F. This is far from equal representation; we do not know if this imbalance also held true in the population from which the sample of volunteers was selected.
The volunteers also came from different states, but the authors did not indicate what states they came from. Knowing the ID field, I am sure Florida respondents would have answered differently from California or Massachusetts respondents, especially on the budget issue, since Florida instructional designers boast about being the low bidders on ID projects. Since selection was not random, there is certain to be some nonrepresentative sample bias.
Internal validity deals with cause and effect, so it was not an issue here. External validity means that you want to generalize to the population from which the respondents were selected. This is questionable, because the subjects were not randomly selected (i.e. nonrepresentative sample). There is also some question of the nongeneralizability of the dependent variable (the ratings on the seven success indicators) because their results did not replicate those of other researchers. They explain this in the discussion section. Usually "enjoyment" ranks first for learners; in this study it ranked next to last in terms of importance as an indicator of project success. The discrepancy between the themes in the literature and the results of the current study sparked lengthy discussion and conclusions sections. I think that a lot of this is related to external validity questions.
The authors say that the literature does not appear to focus on the most important success indicators, because it is primarily concerned with learning outcomes rather than project success indicators. In the conclusions, they say that the common evaluation frameworks (listed on p. 14 of the article) fail to include important success indicators that were used in their own study - specifically, "intended lifespan", "knowledge sharing", and "project budget". They also say that the Kirkpatrick model stated that results is the most important success indicator, followed by behavior, whereas their study showed that behavior was more important than results. Kirkpatrick's model had already been challenged by two other researchers in 1989, and the current study upheld the view of the challengers.
No test reliability information was given. Internal consistency reliability refers to the consistency of results produced by a measure. That is the only type of reliability that would have been appropriate for a single instrument given once. Since the items on the test were from different domains (seven hopefully different success indicators and 23 success factors), and the authors wanted to measure seven different constructs for this report, we wouldn't expect very high test reliability - although it would have been appropriate to estimate it. However, mixing up these assorted factors did have one salutary effect - they made the test longer and enabled the researchers to get two different journal articles out of a single instrument! Whether this actually increased reliability is doubtful.
In all, I thought the report was up to the high standards of PIQ, which is a reputable, refereed journal. The discussion was in depth, and the description of the qualitative methods used for Question 1 was given in substantial detail. The ANOVA was described nicely, and the complete summary table was given, rather than just the findings that were significant. The fact that these results backed up the challenges to the established models seven years ago further strengthened the study.
Back to Research
Critiques
Back to Advanced
Statistics