A Scrollbar-based Visualization for Document Navigation

Donald Byrd

Center for Intelligent Information Retrieval
Computer Science Department
University of Massachusetts
Amherst, MA 01003

8 February 1999

Abstract

We are interested in questions of control for best-match text-retrieval systems, specifically questions as to whether simple visualizations that nonetheless go beyond the minimal ones generally available can significantly help users. Recently, we have been investigating ways to help users decide–given a set of documents retrieved by a query–which documents and passages are worth closer examination.

We built a document viewer incorporating a visualization centered around a novel content-displaying scrollbar and color term highlighting, and studied whether the visualization is helpful to non-expert searchers. Participants’ reaction to the visualization was very positive, while the objective results were inconclusive.

Introduction

The advent of the World Wide Web has resulted in an explosion of text searching by end users as opposed to expert intermediaries. Most of the searching on the Web is via best-match systems, especially those of the so-called "search engines". However, it is clear that, for a great many users, current best-match text-retrieval systems leave much to be desired. If anything, experts (primarily librarians and intelligence analysts) are even more dissatisfied with best-match systems than "ordinary" users are. As user-interface designers and researchers, we have long felt that much of the problem is a question of control.

We have recently been investigating the "review of results" aspect of the task. Once the user has run a search and a number–often a very large number–of documents have been retrieved, how can they decide where to focus their attention? Which documents and passages are worth closer examination? We believe that, with appropriate visualizations, result lists could make it much easier to decide which documents are really likely to be relevant, and document viewers could make it much easier to decide whether the document being shown is in fact relevant. This is hardly a new idea, but we believe that the issues involved are subtle and that the optimal visualizations have not yet been seen. We devised a visualization centered around a novel content-displaying scrollbar and color term highlighting, built a document viewer incorporating the visualization, and studied whether the visualization is helpful to non-expert searchers.

In this paper, we discuss the state of the art of visualizations of text-document content, describe our new visualization, document viewer, and study, and show how it could work with a previous visualization of our own. We then report on a preliminary user study with our new visualization. Participants’ reaction to the visualization was very positive, while the objective results were inconclusive. Finally, we attempt to draw conclusions from our experience and we make suggestions for future research.

Visualizations for Text Retrieval

Several aspects of the information involved in a text-retrieval program can be visualized. A minimal list of sensible visualizations, with the "phases" of the task they apply to, might look like this (phases are named in the terms of Shneiderman et al 1997, 1998):

Name

Information

Phase

VQ

the query alone

formulation

VQDB

the query in relation to the database(s)

formulation

VQR

the query in relation to individual retrieved documents

review of results

VRR

the retrieved documents in relation to each other

review of results

Each of these visualizations can be done in many ways. First, even for a given visualization, different pieces of information might be visualized. For instance, VQ might show query structure, term weights, etc. VQR might show the numbers of occurrences of each query term (as in the commercial system CALVIN), the contributions of each term to the document’s score (as in Shneiderman et al 1997, Fig. 2), or the progression of appearances of terms in the document (as in the current research, or Hearst’s TileBars: see Hearst 1995). At its most basic, it might give term-occurrence information in Boolean form simply by listing the terms that appear in the document at all (as in PRISE: see Harman 1992).

Second, there are various graphical ways to realize a visualization of given information, varying in complexity, clarity, etc. For example, VRR might be realized in either 2-D or 3-D. In VQ, relative weights of terms might be shown in a pie chart or a histogram. But the possibilities go far beyond these simple questions: see any of several books by Tufte (1983, e.g.) for extensive discussion.

Third, while the term "visualization" suggests a passive display, visualizations can also be interactive, with affordances to let the user control the system. It is certainly possible (and it may well be desirable) to offer control of the full query-expressing power of a modern IR system in the framework of VQ and VQDB.

Fourth, for performance reasons, one might prefer to visualize an approximation to the desired information. For VQDB, for example, one can show the query against the actual databases to be used, or against a "proxy" query-formulation database. The former is obviously preferable, but the latter is often much more practical, especially in a client/server situation (and most especially on the World Wide Web). This is basically the "query previews" idea of Doan et al (1996).

Finally, note that some of these visualizations might be more or less tightly integrated: for example, VQR and VRR could be shown on a single display, as in LyberWorld (Hemmje et al 1994).

A number of visualizations in text-retrieval systems are shown in a special digital-libraries issue of Communications of the ACM (CACM 1995).

Visualizations of Text-Document Content

IR researchers have actually proposed many VQR’s, i.e., ways to visualize the content of text documents as it relates to a query, for example:

- the document lens (Robertson and Mackinlay 1995)

- TileBars (Hearst 1995)

- multiple bargraphs for term contributions to document score (Veerasamy & Belkin 1996)

- our own single bars for term contributions to document score (Shneiderman et al 1997)

- VOIR (Golovchinsky 1997)

- dynamic document viewers (Boguraev et al 1998)

- thumbnails (Ogden et al 1998)

- multiple fisheye views (Ogden et al 1998)

These VQR-type visualizations can be cleanly divided into those which show where features occur within the document and those which do not. Our own earlier visualization mentioned above is of the latter type, but the present research is concerned with the former.

TileBars, Scrollbars, and Other Visualizations That Show Feature Locations

Among the best-known visualizations of text-document content in IR is "TileBars" (Hearst 1995, Rao et al 1995). Rao makes the thought-provoking observation: "The TileBars interface allows the user to make informed decisions about which documents and passages to view, based on the distributional behavior of the query terms in the documents. The goal is to simultaneously and compactly indicate (i) the relative length of the document, (ii) the frequency of the terms sets in the document, and (iii) the distribution of the term sets with respect to the document and to one another." TileBars are displayed in a result list, one for each document retrieved.

Helping users make informed decisions about which documents to view is indeed important; so is helping them make informed decisions about which passages to view. But these are essentially independent questions. If you are going to show where terms are in a document and your visualization is as compact as TileBars are, you can certainly do it in a result list, and that way, the user gets help with both types of decisions at the same time. But we feel that seeing term locations in an overview is not that helpful. We will return to this point later. If, on the other hand, you are going to show where terms are with each individual document, there’s already a place to do it: in the scrollbar.

Scrollbars are of course implemented in the standard user-interface toolkits for nearly all modern operating systems: see for example Apple (1992), or Microsoft (1995). Scrollbars are nearly always used to visualize and control the portion of a document that is displayed in an adjacent and much larger area. When they are used in this way, they are without exception filled with a neutral pattern that conveys no information about the document’s content. However, we know of several systems that display an overview of a document’s content in a small greatly-elongated window that functions somewhat like a scrollbar in terms of both what it shows and how it is used.

First, in Ball and Eick (1996), see the smaller window in Fig. 3, and comments on it in the text (p. 35). Second, consider Microsoft’s WinDiff text-file-comparison utility for MS Windows. Besides displaying the exact text in each file in a large window, WinDiff (version 4.0) shows overviews of both files in narrow vertical strips to the left of the window, with colored bars marking differences. Clicking in either strip jumps the text display to that point. But no documentation we know of even mentions the strips.

The navigation aid these two "widgets" provide may be very useful, but overall, they are far less powerful than standard scrollbars. Nor do the non-standard appearances of these widgets facilitate learning to use them. But a third project actually shows document content inside a standard scrollbar much as we do. This work is described in two U.S. patents (Wroblewski et al 1994, 1995). These patents are the only available description of their authors’ work that we know of, and, curiously, we know of no software that incorporates their ideas. Wroblewski and his colleagues do not fill their scrollbars with a neutral pattern: instead, they display what they call an "enhanced scrollbar", where the enhancements include "maps of significant tasks-specific attributes of the data file....displayed in the scroll bar field of the display along with the scroll bar".

In contrast, TileBars are even more remote from standard scrollbars. See for example the Berkeley Digital Library Project TileBar demo, available as of 8 Feb. 1999 at http://galaxy.cs.berkeley.edu/tilebars/ . The view of actual document content does not appear until the user clicks on the TileBar; even then, the view replaces the entire contents of the window, including the TileBar, and it has a conventional scrollbar, which however allows only scrolling within the current segment of the document. So the TileBar widget bears only casual resemblance to a scrollbar.

Many visualizations that show where features occur within the document are examples of generalized fisheye views (Furnas 1986). Ogden et al’s multiple fisheye view is obvious, but the document lens is also a clear-cut case. It is still less obvious that TileBars or scrollbars that show feature locations have anything to do with fisheye views, but, if one considers space occupied as just one way to display salience, the basic idea is the same. The scrollbar or the entire TileBar is an independent view of the document, with a degree-of-interest function whose value is zero for non-features, and with color or intensity replacing area as the way of displaying salience.

Our Visualization

A typical screen display of our document viewer is shown in Fig. 1. The visualization has the following elements:

• Occurrences of each different word in the query and its variants are highlighted in a different color.

• The vertical scrollbar contains small icons in the same colors: this is the central feature.

• An area at the bottom of the window contains a "legend" relating the words and colors.

(Unfortunately, the black-and-white rendering in the figure loses much of the clarity of the original. On a standard color monitor, it is obvious that the word "smoking" appears six times in the window and the word "government" appears once. From the scrollbar, it is also obvious that the latter is the only recognized variant of "govern" in the entire document.)

The scrollbar icons show where in the document occurrences of the corresponding query words, or variants of them, are. The idea is to help the user find as quickly as possible the parts of the documents that are most likely to be relevant. The icons could be of any size and shape, but we use 3-by-3 pixel squares. The horizontal positions of the icons as well as their vertical positions correspond to the positions of the words in the text area. In effect, the scrollbar contains a miniature view of highlighted words in the entire document.

Note that, despite its unusual appearance, the vertical scrollbar works just like any vertical scrollbar: the top of the scrollbar corresponds to the beginning of the document, and the bottom of the scrollbar corresponds to the end of the document. The icons are simply superimposed on the neutral pattern that normally fills scrollbars. To make the colors as easy to see as possible in at least part of the scrollbar, our "thumb" or "car" is plain white instead of the usual (platform-dependent) color and/or pattern.

This visualization is of course yet another instance of VQR, showing the query in relation to individual retrieved documents. We have previously implemented the term-score-contribution bars form of VQR mentioned above (Shneiderman et al 1997). Now, calling that visualization "VQRa" and the present one "VQRb", it is particularly interesting to compare our work to Hearst’s TileBars. VQRa consists of stacked colored bar segments; the size of each segment represents a term’s contribution to the total belief score. Such a set of bar segments requires very little space, and–as with TileBars–a set is displayed with each document in a result list.

For allowing users to make informed decisions about which documents to view, we believe our VQRa is better than TileBars because it considers term weights, not raw term occurrences, and thereby shows why the documents were retrieved and ranked as they were. For allowing users to make informed decisions about which passages to view, we believe our VQRb is better than TileBars because it shows where terms occur in the text in the best possible way, via the scrollbar, so users can examine documents as efficiently as possible. In fact, VQRb should help the user determine whether the document discusses the desired concepts with far more confidence than either VQRa or TileBars do. If the document really does discuss those concepts, VQRb should also help determine whether it discusses the concepts in relation to each other with at least as much speed and confidence as TileBars, and with much more confidence than VQRa.

We designed the experiment described later to begin shedding light on whether VQRb is actually useful.

Implementation

CIIR’s InQuery retrieval engine is written in C; more recently, CIIR has developed JITRS (for JavaInQuery Text Retrieval System), a Java class library that uses the JNI (Java Native Interface) package to allow Java programs to communicate with InQuery on a client/server basis. We implemented a document viewer incorporating the content-displaying scrollbar in Java, using JITRS for retrieval, and using the "Swing" package (part of Sun’s Java Foundation Classes) for the user interface. Swing contains an object-oriented GUI toolkit, and the capability it offered of overriding scrollbar methods greatly eased implementation.

The Experiment

We compared an experimental system incorporating our full visualization, to a control system with no visualization except for highlighting words in the text in a single color. To minimize irrelevant differences between the experimental and control systems, the code for the control system’s scrollbar was in fact identical to that for the experimental system except that the control system skipped drawing the icons.

Participants

There were six participants, four male and two female, all college students. All were adult native speakers of English, at most 30 years old, with at least some experience with computers, and with normal color perception. All had experience with online searching (averaging over three years), but none had professional training or professional experience as a searcher.

Tasks

The study was modeled to a considerable extent after the TREC 6 Interactive track experiment (Byrd et al 1997, Harman 1997). Each participant did the same 10 tasks in the same order; the tasks involved identifying relevant documents in a given database.

Specifically, for each task, we gave the participant a description of an information need, plus–since we were interested only in the document viewer–a fixed query and a fixed number of documents to retrieve. The combination of fixed database, fixed query, and fixed number of documents to retrieve means that, effectively, a result list was predefined for each query. We asked participants to consider each result list and to judge relevance of as many as documents as possible in five minutes.

The number of documents in each result list was 30. Why that number? First, because it is generally agreed that 30 at the most is an upper limit on the number of documents users of best-match interactive IR systems will bother with, at least on the Web. Second, because this is a large enough number to make the chances of a ceiling effect minimal with only five minutes per search.

Database and Topics. For the usual reasons (so we could use TREC relevance judgments, etc.), we chose to use part of the TREC document collection with information needs from the TREC topic collection. Note that the content-displaying scrollbar is not likely to be of much use with short documents, since a user can browse through such documents very quickly with no more aid than conventional single-color highlighting of query terms. But we wanted to encourage users to rely on our scrollbar icons as much as possible, so we needed long documents. The Federal Register consists of official U.S. government documents. In general, these documents are long; the longest are well over a megabyte. Also, they tend to contain large amounts of bureaucratese and/or trivial details, and they have no titles that a program can recognize as such and display, even though most contain something a human being can recognize as a title. All these factors make Federal Register a very difficult place to find information and a potentially fruitful test collection for a document viewer. For this study, we chose the 1989 Federal Register (FR89), which is one of the TREC Volume 1 document collections. FR89 contains about 26,000 documents whose raw text totals over 260 megabytes.

Queries. Wanting short and unstructured queries, we started with the TREC topic titles, and made minor changes in two cases.

Although FR89 contains many long documents, not all queries will find them. We selected queries whose top 30 documents had an average length against FR89 of over 1000 words.

Additional criteria for the queries we chose were:

• Maximum length of any retrieved document not too high. This is mostly because our document viewer takes quite a while to display a long document. We set a limit of 50,000 words.

• Neither too few nor too many non-stopped terms. If there’s only one term, our multiple-color feature wouldn't be used; if there are too many, distinguishing the colors would be very hard. We deemed 2 through 5 terms to be acceptable.

• Top-30 precision neither too high nor too low, to avoid ceiling and floor effects. Our queries had a minimum of 0.10 and a maximum of almost 0.65.

The queries we ended up with, together with the original TREC topic numbers, are listed in the table below. Note that two of the queries differ slightly from the corresponding topic titles: we omitted a word from the title of topic 182 to reduce the number of terms to five, and we replaced "U.S." with "American" in the title of topic 106 to sidestep a problem with InQuery.

TREC Query (title if different)

number

1. 95 computer-aided crime detection

2. 106 American control of insider trading (U.S. control of insider trading)

3. 108 Japanese protectionist measures

4. 115 impact of the 1986 immigration law

5. 119 actions against international terrorists

6. 123 research into & control of carcinogens

7. 125 anti-smoking actions by government

8. 174 hazardous waste cleanup

9. 182 commercial overfishing food fish deficit (commercial overfishing creates food fish deficit)

10. 188 beachfront erosion

 

Procedure

We ran the experiment in our usability laboratory on campus. A "facilitator" was in the room with the participant all of the time except while the participant was doing the tutorials. The same person acted as facilitator for all participants.

First, each participant filled out a questionnaire to give us basic demographic information. Then they took a standard psychometric test from ETS (Ekstrom et al 1976), a test of structural visualization (VZ-2, the Paper Folding test): the mean score was 14.8 of a possible 20.

Next, the participant was given a tutorial to learn one system, then they worked on the first five topics. After a short break they were given a tutorial on the other system, then they worked on the other five topics. Each search had a 5-minute time limit, and the participant was instructed to stop working if they had not finished in 5 minutes. A countdown timer displayed on-screen ran continuously, even while the user was waiting for the system to show a document: we will discuss the implications of this later.

We gave the participant a short questionnaire after each search. After all the searches were finished we gave them a final questionnaire, then "debriefed" them.

We ran each participant through the entire study in a single essentially continuous period of about two and a half hours. Half did the first five searches with the experimental system, and the other half did the first five with the control system: thus, there were two conditions. (We considered randomizing the order in which participants were given the searches, to minimize order effects. However, this would introduce significant complications, especially since we did not want participants to switch systems repeatedly, and we decided–as the TREC 6 designers did–that the benefit did not justify the added complexity.) With six participants, this design gives 6 x 5 = 30 data points per cell, enough for a meaningful analysis of variance.

Results

We analyzed the participant’s relevance judgments by comparing them to the official TREC judgments. We then performed an ANOVA (ANalysis Of VAriance) using query, participant and system as factors. For dependent variables, we used

- Number of documents judged

- Number of documents correctly judged

- Accuracy

Query- and participant-dependent results were significant. However, we found no system-dependent results. The differences between the experimental system and the control system were what would be expected by chance. We did observe a slight increase in accuracy with the experimental system, but it was not enough to be statistically significant.

We also asked participants whether they preferred FancyV (the full visualization) or SimpleV (the very limited one), and how strong their preference was on a five-point scale ("not at all" to "extremely"). Combining these questions gives nine values. Using -4 = extremely strong preference for SimpleV, 0 = no preference, and +4 = extremely strong preference for FancyV, we got one -2, two +3, and three +4, for a mean of 2.67: a fairly strong preference for FancyV.

Two participants who preferred FancyV commented that–while the visualization wasn’t always useful–when it was not useful, it didn’t get in the way.

Discussion

It is not surprising that we found no system effects of statistical significance: six participants is a very small number. In addition, there were some problems with our design and/or implementation:

• We carefully chose queries with acceptable top-30 precision. Unfortunately, for at least one query, all relevant documents were close enough to the bottom of the 30 that none of the participants ever looked at a relevant document.

• Once started, the countdown timer ran continuously, even while the participant was waiting for the system to show a document they had requested. This was a serious problem because the system’s performance with long documents left much to be desired, so that participants spent a significant part of the time–often over a minute of the five available–doing nothing.

Finally, it is quite possible that the visualization simply has too long a learning curve to see any effect in at most 25 minutes of real use after a short training period.

On the other hand, the strong preference participants had for the visualization is very encouraging; user satisfaction is important independent of any objective criteria.

Conclusions

At the beginning of this paper, we distinguished two types of searchers, end users and experts. There is reason to believe that optimal visualizations for the content of retrieved text documents will make life noticeably easier for searchers of both kinds. In this initial study, we tested only ordinary users; in a follow-up study, we expect to test both types, as we did for TREC 6 (Byrd et al 1997).

Other interfaces that might be interesting to study are intermediate to the two discussed here, i.e., no scrollbar icons but still highlight query terms in the text in different colors, and one "past" the control system, i.e., no visualization at all.

Like many visualizations, ours does not scale well in all respects. In particular, as we have mentioned, it is difficult to distinguish more than about five colors. This could be alleviated by using larger icons, though of course there are drawbacks to that. Another solution, and one commonly used in situations like this, is to cluster the query terms, either manually (as with TileBars) or automatically.

Finally, note that displaying in scrollbars indications of the locations of interesting features is in no way limited to text. Nor is it limited to showing the results of searches: an outline or HTML editor could display icons at the positions of important hierarchic levels. All that is required is that the system be able to identify interesting features. Non-icon-based displays could be useful in such applications as signal-processing programs. We believe that displaying indications of document content in scrollbars in whatever form has great potential to make programs of many types easier to use.

References

Apple Computer, Inc. (1992). Macintosh Human Interface Guidelines. Reading, Massachusetts: Addison-Wesley.

Ball, Thomas A., and Stephen G. Eick (1996). Software visualization in the large. IEEE Computer 29, 4 (April 1996), pp. 33-43.

Boguraev, Branimir, Christopher Kennedy, Rachel Bellamy, Sascha Brawer, Yin Yin Wong, and Jason Swartz (1998). Dynamic Presentation of Document Content for Rapid On-Line Skimming. Spring Symposium, AAAI.

Byrd, Donald, Russell Swan, and James Allan (1997). TREC-6 Interactive Track Report, Part I: Experimental Procedure and Initial Results. Tech Report IR-117, University of Massachusetts Computer Science Dept.

CACM (1995). Special issue on digital libraries. CACM 38,4.

Doan, Khoa, Catherine Plaisant, and Ben Shneiderman (1996). Query Previews in Networked Information Systems. Proc. Third Forum on Research and Technology Advances in Digital Libraries, ADL ’96. IEEE CS Press, 120—129. Also available as TR 95-16 at http://www.cs.umd.edu/projects/hcil/Research/tech-report-list.html#1996.

Ekstrom, R.B., J. W. French, H. H. Harman, and D. Dermen. (1976). Manual for Kit of Factor-Referenced Cognitive Tests 1976. Princeton, New Jersey: Educational Testing Service.

Furnas, George W. (1986). Generalized Fisheye Views. Proceedings of SIGCHI ’86, pp. 16—23.

Harman, Donna (1992). User-Friendly Systems Instead of User-Friendly Front Ends. JASIS 43, pp. 164—174.

Harman, Donna (1997). The Fifth Text REtrieval Conference. Gaithersburg, MD: NIST.

Hearst, M.A. (1995). Tilebars: Visualization of term distribution information in full text information access. Proc. ACM SIGCHI Conference on Human Factors in Computing Systems.

Hemmje, M., Kunkel, C. and Willett, A. (1994). LyberWorld - A Visualization User Interface Supporting Fulltext Retrieval. Proceedings of the 17th Annual Int. Conference on Research and Development in Information Retrieval (SIGIR ’94, Dublin, 3-6 July 1994), pp. 249—257.

Microsoft Corporation (1995). The Windows Interface Guidelines for Software Design. Microsoft Press.

Ogden, William, Mark Davis, and Sean Rice (1999). Document thumbnail visualizations for rapid relevance judgments: When do they pay off? To appear in Proceedings of TREC-7.

Rao, Ramana, Jan Pedersen, Marti Hearst, Jock Mackinlay, Stuart Card, Larry Masinter, Per-Kristian Halvorsen, and George Robertson (1995). Rich Interaction in the Digital Library. CACM 38, 4, pp. 29—39.

Shneiderman, Ben, Donald Byrd, and W.B. Croft (1997). Clarifying Search: A User-Interface Framework for Text Searches. D-Lib Magazine, January 1997. Available at http://www.dlib.org.

Shneiderman, Ben, Donald Byrd, and W.B. Croft (1998). Sorting out Searching: A User-Interface Framework for Text Searches. CACM 41,4 (April 1998), pp. 95—98.

Tufte, Edward (1983). The Visual Display of Quantitative Information. Cheshire, Connecticut: Graphics Press.

Veerasamy, Aravindan, and Nicholas J Belkin. (1996). Evaluation of a Tool for Visualization of Information Retrieval Results. Proceedings of SIGIR ’96, pp. 85—92.

Wroblewski, David A., William C. Hill, and Timothy P. McCandless (1994). Computer Display Unit with atttribute enhanced scroll bar, U.S. patent number 5339391.

Wroblewski, David A., William C. Hill, and Timothy P. McCandless (1995). Attribute-enhanced scroll bar system and method, U.S. patent number 5479600.

Fig. 1