Research Review II
Critical Review of a Published Quasi-Experimental Research Study

Wenger, M.J., & Payne, D.G. (1994). Effects of a graphical browser on readers' efficiency in reading hypertext. Technical Communication 41(2), 224-233.

Lorraine Sherry

March 25, 1996

This quasi-experiment was performed to determine whether the use of a graphical browser had any effect on comprehension and retention of hypertext. Though the literature suggests that graphical browsers in hypertext may be useful for alleviating many of the cognitive demands supposedly associated with using hypertext, most articles are speculative in nature, and are not solidly grounded in empirical research. The few experimental investigations that do exist have shown little or no effect for browsers, and have been limited to describing outcomes on only a small set of measures. The present study replicates these results, indicating that browsers have no effect on recall of text material, comprehension of text, or recall of text structure. However, the browser did increase the amount of text read by users and reduced the number of nodes repeated during reading.
With the current popularity of the World Wide Web (WWW) and graphical browsers such as Netscape, designers are expending a great deal of time and effort in an attempt to alleviate readers' cognitive load of hypertext navigation (see Marra, in press). This cognitive loading results in an oft-cited problem of "getting lost in hyperspace" which may limit efficient and meaningful usage of hypertext systems (see Marchionini and Shneiderman, 1988, cited in this article; also Nielsen, 1990; Rivlin et al., 1994; Kahn, 1995). Most articles written by designers and software developers speculate that the use of graphical browsers, with navigation aids based on a structural analysis of the hypertext, may help readers process and recall the information contained in a hypertext document. The current article disputes this argument and, in a well-constructed quasi-experiment, adds to the literature the first experimental investigation disproving these conjectures.

The title of the study refers directly to the problem under investigation, i.e., is there any effect of a graphical browser on readers' efficiency in reading hypertext? The problem is clearly stated: "[t]he specific question the current study addresses is whether providing a graphical representation of the structure of a hypertext document would affect the performance of readers" (p. 224). The problem is highly significantÑin fact, two entire issues of Communications of the ACM (February 1994 and August 1995) were devoted specifically to designing hypermedia applications, with such titles as "Navigating in hyperspace: Designing a structure-based toolbox" (Rivlin et al., 1994), "Visual cues for local and global coherence in the WWW" (Kahn, 1995), and "Hypermedia and cognition: Designing for comprehension" (Thuring et al, 1995). Moreover, since the field of technical communications has no empirical research base of its own, this peer-reviewed article represents one of the first attempts by the technical communications community to create one. It has just been followed up in the latest issue of Technical Communication, 43 (1) by the same authors' correlational study, "Human information processing correlates of reading hypertext".

All important terms are clearly defined, with examples given as necessary, since the intended audience consists of technical communicators rather than software developers and HCI experts. The rationale and theoretical framework are also clear - "[t]he study described here is part of a larger empirical effort to learn more about the ways in which human information processing interacts with aspects of hypertext" (p. 224).

After the introduction and definition of terms, Wenger and Payne reiterated two conceptual assumptions commonly found in the literature:

  1. that a graphical representation of hypertext structure might give readers a more robust memory for the structure of the information; and
  2. that the use of a browser might improve the efficiency with which readers navigate the hypertext document.

Next, they discussed in detail the theoretical background for their study, ending with the observation that, though successful performance in hypertext-reading tasks depends on the reader's ability to abstract and retain information about the document's structure, the innate nonlinear characteristic of hypertext may not support the type of information processing that readers must do in order to abstract and retain that structural information. This is the crux of the problem, and also the basis for their hypothesis, namely, "[i]f this is the case, then it seems reasonable to believe that giving readers a representation of the structure of the information might serve to alleviate some of these possible problems" (p. 226).

In their discussion, the authors cite 40 references from the fields of cognitive science, HCI, software development, human factors, and technical communication, including usability. This eclectic reference list contains some of the seminal articles from each of these fields, with the literature review well-organized and clustered by topic. Major contributions by Marshall and Irish, Conklin, Trigg, Nielsen, Parunak, Shneiderman, Jonassen, and Halasz (see citations in the original article) are particularly important. In a literature review that spans several somewhat related fields, it is not uncommon to come across articles with at least 30 references. I assume that the literature on revising and structuring text (see Horn, 1985, on Information Mapping; and the ongoing debate about text revision by Britton et al. and Graves and Slater, 1989-1991) is familiar to the technical communications audience. The only references that I would add would be three "classics" for the non-initiateÑDraper and Norman's (1986) User Centered System Design, since the article has implications for usability; Ben Shneiderman's (1988) Designing the User Interface, with many principles, heuristics, and guidelines for the HCI designer; and Kintsch's (1989) Learning from text, since the concept of a mental model of hypertext structure is central to the authors' inquiry.

Wenger and Payne then cite some examples which illustrate the ways in which browsers have been used successfully by other authors. However, in searching for controlled studies on the effects of browsers, they found only two journal articles that examined the way a graphical representation affected retention - and both of these showed no statistically reliable effects on memory for hypertext. The question then arises, are there ways in which browsers that graphically represent the structure of a hypertext document can actually improve readers' efficiency in reading such a document? This leads directly to the authors' two hypothesis statements, namely, "[i]f browsers can relieve readers of some of the additional cognitive demands imposed by hypertext, then there are two possible ways in which reader performance can be improved" (p. 227) - in brief,

  1. if providing a graphical representation of information structure leads to easier abstraction and retention of structure, then readers who use a browser should test better for recall, comprehension, and retention of the hypertext structure; and
  2. second, if the browser reminds the reader which nodes have been visited and which ones have not, then readers using a browser should visit a higher proportion of the unread nodes and repeat fewer of the visited ones.

Note that the study is limited to using a browser to (hopefully!) improve recall, comprehension, and hypertext structure retention, and to minimize the number of nodes revisited - it does not address the issue of cognitive flexibility at all, nor was there any follow-up testing on retention. However, it does lead to an experimental methodology that lends itself to a browser (treatment) / no browser (control) situation, in which subjects read three expertly edited and structured hypertext documents and are assessed on their reading efficiency.

One problem with text comprehension that has been cited in the literature is the readability of the text. Wenger and Payne - technical communication experts themselves - guarded against this by extracting text passages from an introductory biology book, condensing each to about 1000 words, and structuring it according to a Descriptive Rhetorical Form having a central topic with related subtopics. They cited the authors who had developed this Descriptive rhetorical form, but they do not give any information about reliability or validity concerning the use of text that has been edited in this fashion as text-comprehension instruments. However, it is clear that they wished to increase experimental validity by decreasing the degree to which their texts were free of systematic error. Also, the use of equivalent text forms with the same time interval, which is intended to consistently measure the same behavior by the same subjects, is considered to be a rigorous method of assessing reliability.

Another issue concerning the readability of these particular text segments is the context in which they are used. The subjects in the study are introductory level psychology students who may have no interest in biology. College-level biology texts are filled with advanced concepts such as the structure and replication process of DNA, protein synthesis, the metabolic process, subtle differences among the various amino acids, and sophisticated topics taken from organic chemistry. Hence, using these text passages out of context may adversely affect text readability.

Having edited each passage, they then segmented it into nodes by dividing the text into paragraph boundaries, labeling each paragraph with a node name (similar to Horn's, 1985 "chunking" of structured writing), alphabetizing the node names, and then linking them hierarchically. This, in effect, randomized the order of the paragraphs within the text. Subjects could start reading the text at any link they chose, thus minimizing the possibility of imposing a predetermined path upon them.

There are five independent variables, namely

  1. recall of text (write down as many node titles as you can remember in two minutes),
  2. comprehension of text (respond to a set of multiple choice questions based on the text)
  3. recall of text structure (when presented with 10 pairs of node titles, state whether these nodes were linked in the text),
  4. repeated nodes (count the number of nodes each subject visited for ten seconds or more, and determine whether these were repeated or not), and
  5. coverage (count the number of nodes actually read).

Treatments were described clearly and in detail. Administration and scoring of the tests for the browser/no browser modes were also described clearly. Testing was objective--either counting occurrences of an event, or scoring multiple choice questions--so there was no evidence of reactivity bias or subjectivity in assessing performance. A counterbalanced design--a Latin Square--was used for the sequencing of the text segments among the randomly assigned subjects, to control for multiple-treatment interaction. Among the strengths of this study - and which have been identified in previous studies (see Graves & Slater, 1991) of text comprehension--were

  1. all three text passages were approximately the same length (912, 1101, and 969 words, respectively);
  2. each subject read only the carefully edited version of the text, not the original; and
  3. all subjects were tested on all three passages.

Since the subjects exhibited the usual individual differences in reading rate, the authors also collected baseline reading rates from three unrelated, expertly edited paragraphs and used them as covariates in later statistical analyses of average reading time per node. This was not construed as a pre-test, but rather, as a measurement of a confounding or "nuisance" variable. Subjects were instructed to read as much of the text as possible; the goal was complete coverage of all passages. We are told that experimental sessions lasted approximately 60 minutes per subject. We also know that the software recorded the elapsed time from the presentation of the node, so the authors were able to get an approximation of the time actually spent reading the text and separate it from the time required to make decisions regarding topic sequencing. Also, the authors were very careful to set a criterion of 10 seconds as the cutoff for determining when a node had actually been read versus when the subjects were browsing, and configured the recording software accordingly.

Subjects were also instructed in the use of the computer that was used to display the text, and were given a 15-minute practice session to use the mouse and receive descriptions of the post-tests. In addition, those subjects who were using the graphical browser developed by the authors, were instructed in the use of the browser and given a chance to practice using each of the browser's functions. The browser displayed both the structure of the text and readers' progress through the text, with visited nodes highlighted in blue, and still unread nodes highlighted in white. Instructions were repeated at the beginning of each new presentation of text, to eliminate any instrument decay effects.

The population of interest was appropriate--sixty undergraduate students, all selected from an introductory psychology course, so there were no subgroups within the sample. We do not know if selection was random or if students volunteered to be subjects, so there could be some threat to external validity here. All subjects provided a set of data on their computer expertise (presented in Table 1 in the article), which showed that there were no computer novices in the sample. The authors did not state whether the students were randomly assigned to the treatments--this is why I call the investigation a quasi-experiment. However, the presentation of all three text passages to all treated subjects via a balanced Latin Square would help strengthen internal validity by getting rid of multiple-treatment interaction. Plus, any pre-existing differences in reading ability were dealt with by measuring the baseline reading rate and they taking this into account in the data analysis. All subjects completed the experiment, so there was no mortality.

"The study was conducted as a 2 (Condition: Browser, No Browser) x 3 (Text: 1,2,3) mixed-factorial design, with Text manipulated within the subjects" (p. 228). Here, a "factor" is simply one of the independent variables in the treatment. We can only assume that 30 subjects were placed in each condition; that information was not explicitly stated. In their analysis, Wenger and Payne collapsed the data across the Text factor. My first impression is that they did this so they could use an ANOVA. However, since the individual differences in baseline reading rate--which showed up in the pre-tests on a different set of text passages--represented a nuisance variable, the authors sensibly decided to use an ANCOVA instead. The baseline reading rate was measured as reliably as the dependent variables, using the same genre of edited text passages; it was also highly correlated with all of the dependent variables. An ANCOVA statistically removes that portion of the subject's dependent variable score variance (i.e., recall, comprehension, and structural recall; and repeated nodes and coverage of text) that is systematically associated with the pre-existing variable (i.e., the individual differences in baseline reading rate).

A second confounding variable was the fact that subjects were free to quit when they believed they had read all available nodes. Thus, the authors presented both raw and conditionalized data on the recall, comprehension, and structural recall scores to take into account the number of nodes actually read by each subject.

We must remember that our audience consists of technical communicators, not researchers or statisticians. In a recent issue of Technical Communication (42, 2), several authors commented that the only way that a research base could be established and adopted by their practitioners was to couch research articles in clear, readable language, with a minimum of research-specific and statistics-specific technical jargon. Thus, the authors summarized their data quite clearly in Table 3, which shows the means and standard errors of the means for each dependent variable, for each condition: Browser or No Browser. Looking through the table, we can see that the differences for the first three independent variables (recall, comprehension, and structural recall) are insignificant between the two conditions. However, there appear to be significant differences for the repeated nodes and coverage variables.

The authors present the actual results of their data analysis quite tersely, stating that

While the overall ANCOVA was significant, F (6, 170) = 12.38, p < 0.001, this was due to the significant impact of the covariate (baseline reading rate). All main effects and interactions for the experimental factors (Condition and Text), after we partialed out the variability associated with the covariate (baseline reading rate), were nonsignificant (all Fs < 1.5, all ps > 0.25) (p. 230).
Using the usual p < 0.05 cutoff, I can verify that my first impression was correct--namely, that the first three dependent variables showed insignificant differences, once the baseline reading rate was statistically accounted for. Wenger and Payne also confirmed these results by t tests--all ts < 0.1, all ps > 0.25. Clearly, 25% probability is simply not good enough to discard the null hypothesis. Next, they report that subjects reading with the browser repeated fewer nodes than those reading without the browser (t = 2.35, p < 0.05), and subjects reading with the browser read a higher proportion of the available nodes than those reading without the browser (t = 3.93, p < 0.001). Here, there were clear and significant differences, especially for the fifth independent variable, i.e., coverage.

At this point we have to return to the first hypothesis, namely, that a graphical representation of the hypertext structure of an online document might give readers a more robust memory for the structure of the information. Note that this is not the null hypothesis--it is a hypothesis that there might actually be some increase in the reader's efficiency, no matter how slight. They did not set out to prove the null hypothesis, so they are quite correct in saying that a p value of more than 25% will not permit them to reject the null hypothesis. This is important because of the amount of speculation that has already appeared in the literature concerning the much-touted benefits of a graphical browser.

Now, the fact that their results show an insignificant effect might also be due to a small sample size. With 60 subjects, 30 per condition, we are on the borderline of what might constitute a reasonable sample. However, it does not seem reasonable that increasing the sample size to anything that could be obtained from volunteers in an introductory psychology course--even doubling the sample size--would allow the researchers to reduce the probability of the null hypothesis to 5% or lower.

One weak point of the article is that Wenger and Payne did not do a power analysis to find out just what sample size they would need in order to begin to see some significant effect on text recall, text structure recall, or text comprehension. However, I do feel that their sample size was adequate because they did get a significant effect (p < 0.05 or better) for the remaining dependent variables, namely, text coverage and number of repeated nodes. So it does look as if they are really dealing with a nonsignificant effect, which is in itself a very important finding--especially since it replicates the less rigorous studies of other researchers.

One strong point of this article is that, though they were dealing with a non-research based audience, they presented their information like good researchers, and did not confound results with implications or interpretations. In the Results section, they simply presented their results numerically, and then reiterated the same information in text form. For example, they state that "subjects reading with the browser answered the same proportion of the structural recall questions correctly as did the subjects reading without the browser" (p. 230). Regarding the significant differences, they state that "[t]his result, in combination with the data on number of repeated nodes, indicates that the benefit of the browser to subjects was to increase their efficiency by decreasing the number of repeated nodes and increasing the proportion of available nodes read (p. 231).

In the Discussion and Conclusions section, they presented the limitations of their study, their interpretation of their results, and their relevance to hypertext designers. Their conclusions are logical, and are confined to the evidence at hand, namely, that the graphical representation of the text did not improve readers' performance on measures of text recall, text comprehension, or recall of text structure. The practical meaning of these results is clearly stated:

This latter result is of importance since it has been the focus of a great deal of speculation in the published literature on hypertext. Instead, the benefit of the browser was to improve the efficiency with which the subjects moved through the documents, reducing the number of repeated nodes and increasing the total proportion of available nodes read (p. 231).
Wenger and Payne are quite honest about the limitations of their study, which are threefold. First, subjects were instructed to read as much of the text as possible, rather than reading to comprehend and retain as much of the text information as possible and then perform some task later on that would require them to relate different portions of text. This is one of the issues posed in the Graves and Slater vs. Britton et al. debate. Second, since there are many types of graphical browsers available, some more optimal and some less optimal (e.g. Netscape vs. Netcom), the lack of an effect on the measures of recall, comprehension, and structure recall might be attributable to some element of their rather simplistic browser design. And third, the text passages were short in comparison with typical textbook material that university students normally encounter in their assignments. Hence, if all the necessary structural information could be easily abstracted and memorized, then no manipulation of the text, including the browser treatment, would have enhanced performance.

The authors are logical and organized, and unbiased to the point of being rather conservative. Results are clearly presented, and definitely considered in the light of findings from other studiesÑthe current study replicates both the failure of two other research teams (Reynolds & Danserau, 1990; and Tripp & Roby, 1990; both cited in the original article) to find an effect of browsers on retention and their success at finding a positive effect on a measure of readers' efficiency in covering all the text presented. Wenger and Payne also discuss the question of external validity of controlled laboratory research on undergraduates, and whether the characteristics of their subjects are at all similar to typical technical professional users of hypertext.

This article is one of a series of empirical studies by the same authors, in which they continue to investigate the process involved in reading hypertext. Their original work (Wenger & Payne, 1992, cited in the original article) indicates that those processes may not necessarily differ from those used in reading linear text in quantity, but rather, in kind. This type of research has far-reaching implications for hypertext designers. Their recent work on the idea that spatial and relational processing play important roles in reading and using hypertext is a direct follow-up correlational study to the quasi-experiment described here.

In general, this is not only one of the strongest empirical research papers that I have read, but it is also one with important implications. The field of technical communication is changing rapidly; boundaries between the communities of its practitioners and those of the HPT/EPSS community and the software development community are rapidly disappearing. With the advent of the WWW and the everyday use of hypertext by exponentially increasing numbers of users, there is a great deal of speculation and money being spent on bigger, more powerful browsers, with more bells and whistles. Papers such as this lend a much needed balance to this type of activity, and furnish an excellent initial foundation on which the technical communications community may begin to build their own research base.

References

Britton, B.K., Van Dusen, L., Gulgoz, A., & Glynn, S.M. (1989). Instructional texts rewritten by five expert teams: Revisions and retention improvements. Journal of Educational Psychology, 81 (2), 226-239.

Graves, M.F., & Slater, W.H. (1991). A response to "Instructional texts rewritten by five expert teams". Journal of Educational Psychology, 83 (1), 147-148.

Horn, R.E. (1985). Results with structured writing using the Information Mapping writing service standards. In T.M. Duffy & R. Waller (Eds.), Designing Usable Texts. New York: Academic Press, Inc.

Kahn, P. (1995). (1995). Visual cues for local and global coherence in the WWW. Communications of the ACM, 38 (8), 67-69.

Kintsch, W. (1989). Learning from text. In L.B. Resnick (Ed.), Knowing, Learning, and Instruction. Hillsdale, NJ: Erlbaum.

Marra, R. (In Press). Human Computer Interface Design. To appear in Komers, P. A. M., Grabinger, S., & Dunlap, J. C. (Eds.), Hypermedia and Multimedia: Fundamental Concepts and Development Guidelines. Mahwah, NJ: Erlbaum.

Marchionini, B., & Shneiderman, B. (1988, January). Finding facts versus browsing knowledge in hypertext systems. IEEE Computer, 21, 70-80.

Nielsen, J. (1990). The art of navigating through hypertext. Communications of the ACM, 33 (3), 296-310.

Norman, D. (1986). Cognitive engineering. In S. Draper and D. Norman (Eds.), User-Centered System Design. Hillsdale, NJ: Erlbaum.

Rivlin, E., Botafogo, R., & Shneiderman, B. (1994). Navigating in hyperspace: Designing a structure-based toolbox. Communications of the ACM, 37 (2), 87-96.

Shneiderman, B. (1987). Designing the User Interface. Reading, MA: Addison-Wesley.

Thuring, M., Hannemann, J., & Haake, J. (1995). Hypermedia and cognition: Designing for comprehension. Communications of the ACM, 38 (8), 57-66.

[back arrow]Back to Research Critiques
[back arrow]Back to REM7500 page


Copuright © 1996 Lorraine Sherry
lsherry@carbon.cudenver.edu
Updated March 25, 1996