2013 Stéfan Sinclair, Stan Ruecker, and Milena Radzikowska, “Information Visualization for Humanities Scholars”
Information Visualization for Humanities Scholars
Stéfan Sinclair, Stan Ruecker, and Milena Radzikowska
¶ 1 Leave a comment on paragraph 1 0 Information visualization for humanities scholars needs to accommodate a mix of evidence and argumentation. The humanities approach consists not of converging toward a single interpretation that cannot be challenged but rather of examining the objects of study from as many reasonable and original perspectives as possible to develop convincing interpretations (for a fuller argumentation of this approach in a digital context, see Drucker). In this sense, we can evaluate a visualization system by determining how well it supports this interpretive activity: a visualization that produces a single output for a given body of material is of limited usefulness; a visualization that provides many ways to interact with the data, viewed from different perspectives, is better; a visualization that contributes to new and emergent ways of understanding the material is best.
¶ 2 Leave a comment on paragraph 2 0 In this context, there is an important difference between static and interactive visualizations. A static visualization aims to produce a single perspective on available information. Conventional pie charts, bar charts, and graphs are good examples—these eighteenth-century inventions of William Playfair provide the reader with useful ways to understand information (Tufte), but they are fundamentally tools for display (nevertheless subject to a variety of interpretations of the visual information). Interactive visualizations, on the other hand, aim to explore available information, often as part of a process that is both sequential and iterative. That is, some steps come before others, but the researcher may revisit previous steps at a later stage and make different choices, informed by the outcomes produced in the interim. In a pie chart, by contrast, a static, synchronic object, the visual subdivision of the whole into parts can be useful, but the format does not readily lend itself to experimentation.
¶ 3 Leave a comment on paragraph 3 2 The Web site Many Eyes, designed “for shared visualization and discovery,” includes a wide range of information-visualization interfaces that work with user-provided data (see Danis, Viegas, Wattenberg, and Kriss). On the single-output side are graphics like the word cloud and the more compact and visually striking Wordle, originally designed and developed by Jonathan Feinberg, a cluster of words in which the size of each word corresponds to its relative frequency in a document (fig. 1). Although Wordles are sometimes more decorative than functional—even according to the Wordle site (see also Harris’s criticism of word clouds)—some people use them to produce eye-catching summary images for documents or collections of documents (such as conference proceedings), and others use them to consider possible keywords for documents. Although each Wordle is a static visualization, the tool allows users to modify the layout and styling parameters in ways that can draw attention to different features or produce different aesthetic effects. For fiction, they often show the prevalence of proper nouns, which typically need to be removed along with function words (prepositions, articles, and conjunctions) to increase the value of the image (depending on one’s purposes). Wordles (and Wordle-inspired visualizations) are noteworthy for how widely they are used, especially in popular contexts such as advertising (see Smith on how word clouds can quickly convey the differences in vocabulary in toy advertising for girls and for boys).
¶ 4
Leave a comment on paragraph 4 0
Figure 1. The Many Eyes Wordle shows word frequency by word size. This version shows the text of a draft of this article, with function words omitted.
¶ 5 Leave a comment on paragraph 5 0 The Wordle tool provides a very convenient and fast way of generating a view of a document; it can answer a question like, “What words appear most frequently in this text?” Although it is possible to compare two or more Wordles to get a sense of how similar or different documents are in terms of their most commonly occuring words, the interface is not conducive to document comparison. The Bubblelines interface in Voyant Tools, developed by Stéfan Sinclair and Geoffrey Rockwell, is an example of a visualization tool that is better suited to document comparison. Each document is represented as a separate line, and the interface allows the user to experiment with different search terms to compare across each document (fig. 2).
¶ 6
Leave a comment on paragraph 6 0
Figure 2. The Bubblelines interface in Voyant Tools shows the distribution of user-selected terms within multiple documents. This example compares the inaugural addresses given by George W. Bush in 2005 and by Barack Obama in 2009. We see here that Bush makes more frequent use of freedom, Obama of new. Perhaps surprisingly, the term hope appears more in Bush’s speech than in Obama’s.
¶ 7 Leave a comment on paragraph 7 0 The New York Times has published a collection of specialized interactive visualization interfaces that are designed to allow users to compare terms in several documents (they are specialized because they do not allow users to provide texts). Like the Wordles, these are noteworthy exemplars of widely seen and used visualization interfaces. “Inaugural Words: 1789 to the Present” provides word clouds for inaugural speeches, and “The State of the Union Address” compares the words of George W. Bush across seven years of speeches. One of the more interactive visualizations in Many Eyes is the word tree, which allows people to study text sequences (phrases) by navigating a tree with phrase-frequency information indicated by type size (fig. 3).
¶ 8
Leave a comment on paragraph 8 0
Figure 3. The Many Eyes word tree combines a concordance with a word cloud. The user selects words as the starting points or end points in the concordance. This example shows the text of this essay and a search for the word can, which appears twenty-eight times.
¶ 9 Leave a comment on paragraph 9 1 In proliferating the perspectives on data, visualizations can be useful to humanities scholars by providing additional insight into small amounts of text or data, thus supporting what John Unsworth calls “scholarly primitives,” especially for showing patterns that result from filtering, sorting, grouping, and otherwise visually rearranging the material. Visualizations can provide similar insights across large amounts of information that would otherwise be too abundant to grasp or process (e.g., the Google Books Ngram Viewer, based on over 5 million books as of December 2010; see also Michel et al.). Visualizations can also produce comparisons between a pair of documents or of one target document against many other documents.
¶ 11 Leave a comment on paragraph 11 0 In the context of the humanities, the availability of data to be visualized has been steadily increasing with the proliferation of digital collections (see Greenstein). In addition, relatively sophisticated tools are being developed for working with the various digital materials (see, e.g., the DiRT list). Some of the tools perform tasks behind the scenes, such as cleaning, searching, or sorting. Others consist of visualization components in the interface, which make the results from the analytic operations accessible in meaningful ways.
¶ 12 Leave a comment on paragraph 12 0 Taken individually, many of the available tools for digital humanities scholarship seem relatively simple (e.g., TAPoRware), intended to perform a single function well (as per the Unix model of piping data through a variety of tools), and the visualization component may either supplement that function or make it possible (the TAPoR recipe “Compile Textual Data and Visualize the Results Using Excel” describes such a process).
¶ 13 Leave a comment on paragraph 13 0 Many of the simpler tools feature an open-ended search mechanism (similar to a search engine like Google). When browsing a collection of items like a library database, for instance, the most common strategy for finding out what is in a collection is to do a series of keyword searches and look at the search results. This approach is most effective when the goal is a single document for which the searcher knows the title, author, or some other identifying piece of information (see Seal, Girdlestone, and Warwick).
¶ 14 Leave a comment on paragraph 14 0 In the humanities, however, an equally important task is to locate (or discover) new material, with no prior knowledge of the kinds of details used for retrieval. Using a search-based system, the task becomes a guessing game, where the searcher begins with some likely candidates for search terms and then reviews the results and tries again until some relatively sufficient set of items is assembled for further study. Adding visualizations to the browsing task can supplement this function in several ways, giving the searcher the option to assemble groups of documents visually instead of working sequentially through a list. These kinds of interfaces are often referred to as exploratory interfaces (see Shen et al.).
Browsing by Grouping
¶ 15 Leave a comment on paragraph 15 0 The literature on human-computer interaction provides a wide range of experimental systems that enable grouping by browsing. The scatter-gather browser developed by Peter Pirolli and his colleagues is a good example; it uses dots to represent documents for users to collect and disperse (Pirolli, Schank, Hearst, and Diehl). Ben B. Bederson’s PhotoMesa similarly allows the user to interact with a page display, as seen in the thumbnails of digital images in figure 4, visually grouping them by keywords. Kerry Rodden, Wojciech Basalaj, David Sinclair, and Kenneth Wood address the question of how similarity among the grouped images supports browsing tasks. In 2007, Jonathan Harris and Sepandar Kamvar produced a suite of dynamic displays involving posts about emotions scraped from blogs and visualized as colored dots arranged, in one of the displays, by attraction to the current cursor location.
¶ 16
Leave a comment on paragraph 16 0
Figure 4. Bederson’s PhotoMesa provides functionality similar to Pirolli et al.’s scatter/gather browser but uses thumbnail images to represent items.
¶ 17 Leave a comment on paragraph 17 0 We have developed two experimental interfaces that deal with browsing collections or documents through visual grouping. The first is the Mandala browser (fig. 5), which allows researchers to open a document or multiple documents at the same time and iteratively construct visual Boolean queries that draw on the underlying data (such as the XML-encoded version of Romeo and Juliet in fig. 5). The user may perform conventional searches, but there is also a variety of automatic and semiautomatic mechanisms for exploring the data.
¶ 18
Leave a comment on paragraph 18 0
Figure 5. The Mandala browser, originally developed by Oksana Cheypesh, Constanza Pacher, Sandra Gabriele, Stéfan Sinclair, and Stan Ruecker, is a universal space for researchers investigating structured text collections or individual text documents encoded with XML (although additional formats are supported, such as PDF, plain text, and comma-separated values). This screenshot shows the browser loaded with Romeo and Juliet, where each dot represents a speech in the play. Multiple search terms can be defined that attract the items (speeches) toward the label (dots on the periphery represent speeches that do not match any queries). In this example, the user has defined several magnets to identify different speakers (Romeo, Juliet, Nurse, Mercutio), as well as several magnets to match terms that indicate family relations in the text (mother, father, sister, brother, cousin). Dots that are displayed close to the magnet represent speeches that only match that query (e.g., speeches by Romeo or speeches that only contain mother), whereas dots between magnets match more than one query (e.g., speeches by Romeo that contain mother).
¶ 19 Leave a comment on paragraph 19 0 For example, someone might be interested in studying a selection of plays by Shakespeare and looking at invocations of the natural elements to see how they are used to help define supernatural figures. The hypothesis might be that Lear on the heath is transformed not just into a madman but also into a supernatural creature. The researcher could iteratively construct a query space that first asks for all the speeches by selected characters, such as Puck, Ariel, the three weird sisters, and Lear. Then key vocabulary could be added to the query, with terms such as wind, water, and air. The point would be to collect the set of speeches from a specific set of characters that could serve as a baseline for the comparison. It might turn out that there is very little actual invocation of the elements, even by the acknowledged supernatural characters, which would invalidate one of the felicity conditions of this line of thought, and the researcher could stop there. Alternatively, the subsets that appear might prove fruitful, and the study could extend to other figures with a high degree of agency, such as Prospero in The Tempest or the Duke in Measure for Measure.
¶ 20 Leave a comment on paragraph 20 0 The significance of the Mandala browser is that the queries are built up in single steps, with aggregated visual feedback at each step, and the sets and subsets of results are usually easy to see and understand. It is possible to carry out this kind of inquiry using digital copies of all the plays, a word processor, the search function, and a fairly significant amount of cutting and pasting (Brown et al.). In Mandala, however, the query space consists of one small dot for each speech and one colored magnet for each part of the query, making the entire investigation possible in a single screen that changes dynamically as the user progresses through the different stages. The Mandala browser provides the crucial mechanism for the user to read text represented by the items. Our various user studies with humanists have repeatedly indicated that researchers want to be able to return to the text. Mandala is designed to be open-ended (fed by user-provided data), but we have experimented in particular with Shakespeare’s plays, the Orlando project archive, the early modern English witchcraft trial documents from Chadwyck-Healey, the course descriptions from the 2008 calendar at the University of Alberta, and a variety of interview transcripts.
¶ 21 Leave a comment on paragraph 21 0 The second of our experimental interfaces for collection browsing through visual grouping is a family of transferred interfaces that we call the showcase browsers (fig. 6; see Chow and Ruecker). Like Mandala, each showcase browser is based on rich-prospect browsing principles; that is, it shows on the default screen one meaningful representation of every item in the collection, combined with tools for visually grouping the items.
¶ 22
Leave a comment on paragraph 22 0
Figure 6. A screenshot of a showcase browser of biodiversity projects in Edmonton. The projects can be grouped and subgrouped by any combination of the criteria shown on the buttons on the left.
¶ 23 Leave a comment on paragraph 23 0 Our experiments with the showcase family of browsers have included browsers for the following items: pill identification, conference delegates, historical buildings, crayon drawings, research faculty, and wasp wing features. There are some particular requirements for the information that is useful for this kind of browser. There should be a single image that can be used to represent every item in the collection. The metadata should consist of faceted items, where each collection item has a single value for each facet, although it is possible to construct sets of items that meet multiple matches. Eye color was used for the conference delegate browser, for example, since most people have a single eye color. In the rare cases of people who have one blue eye and one green one, we have to address the design question, Should these people have their images multiplied when necessary, so that they go into both groups, or should they form a new, third group? We typically opt for the creation of a new group, which happens automatically, since the system constructs the groups based on the available metadata. If there was an error in the metadata, for instance, and someone had an eye color of “square,” then an eye color group with the group heading “square” would appear with one item in it (we have repeatedly found that visualization interfaces can also serve as useful checks for the integrity of data).
¶ 24 Leave a comment on paragraph 24 0 Although we originally conceived of the showcase browser as a browser for images and their associated metadata, we have subsequently developed a version where the photos of pills or people or buildings have been replaced with tiles containing text—something like the statistics on the reverse side of sports trading cards. In this case, the tiles show the bibliographic data for the items in a text collection. Our current design, as seen in figure 7, is dynamic and allows the researcher to add or subtract metadata from the tiles, so that one researcher might decide to use tiles that show author name, title of work, and date of publication, whereas another might omit authors and dates and show instead the genre and word count (persistence of data and user-rights management can be challenging aspects of some visualization interfaces, especially Web-based ones). The number of kinds of metadata is determined by what is available in the archive.
¶ 25
Leave a comment on paragraph 25 0
Figure 7. This image shows a screenshot of a digital image browser for text collections (designer Ian Craig; programmer Alejandro Giacometti).
¶ 26 Leave a comment on paragraph 26 0 Using this kind of faceted browser, the display of the complete set of items is successively reduced as the user chooses how to group the items and then selects the group of interest. Typically, instead of removing unselected items entirely, we create a subset of them at a much smaller scale, located at the bottom of the screen (the trail of remaining smaller items is an important representation of intellectual work that has been accomplished, as argued by Vannevar Bush in 1945 in his description of the links followed in the Memex machine, a precursor to the Web). The experience is one of visually sifting from a large body of data to a smaller one until the remaining items represent the subcollection of interest.
¶ 27 Leave a comment on paragraph 27 0 The showcase design would not facilitate the kind of task described for Mandala, where the goal was to find the speeches in several plays that might support the hypothesis that Lear is transformed on the heath into a supernatural figure. An appropriate research task for the showcase design would be the creation of a work set consisting of a smaller number of items from a large collection, in which the bibliographic characteristics of the items are the distinguishing features. Someone interested in sifting through a collection for plays by women authors in the eighteenth century, for example, could locate this subset by subdividing the collection by genre, then selecting the group containing plays. This group could be subdivided by the sex of the authors; then, the user could choose the subset by women. Finally, the user could divide the plays by women according to date of first publication or performance, depending on what kinds of dates the archive made available. Choosing those plays from the eighteenth century would complete the process.
¶ 28 Leave a comment on paragraph 28 0 This example described steps that proceeded from genre to sex to date, but since the data is faceted, the user is free to carry out the steps in any order. At each intermediate stage, the resulting subsets will be significantly different (and potentially generative of different insights), depending on the sequence the user follows. From the perspective of the interface designer, providing this variety of choices means that it is not necessary to predict what the most likely sequence will be. The user is not navigating a hierarchy of information in any conventional sense but is instead working through a series of faceted subdivisions that are dynamically grouped at each step. The interface is enabling iterative steps that can fit into a fluid interpretive process.
¶ 29 Leave a comment on paragraph 29 0 The experience of navigating a collection using a showcase browser is typically somewhat different from the experience of navigating a collection using a menu hierarchy. Both systems work by successively subdividing the collection, but in a showcase browser the order of the subdivision is chosen by the user. It is therefore optimal to use a showcase browser for collections and their associated metadata where there is a one-to-one relationship between the subcategories in the metadata and the individual collection items. Most literary works, for instance, can be categorized under a single genre: Middlemarch is a novel. However, if a work has more than one genre, then it is necessary to duplicate that work so that it will appear under both genre categories (or create new, hybrid categories): Oklahoma! is both a musical and a western. We see how decisions by designers of a tool can have a critical impact on ontological and epistemological aspects of research.
Revealing Features
¶ 30 Leave a comment on paragraph 30 0 Browsing a collection can be a fascinating and useful scholarly activity, but it is typically only one early step in a longer scholarly process of reading and interpretation. Once the user has settled on relevant results after having done a retrieval task or browsed a collection, what remains is a list of works. A typical approach to this list is to begin reading at the top and through to the end, perhaps making notes along the way and beginning the process of interpretation that may lead to a research result. Designers of humanities visualizations, however, hope to produce systems that can also assist in these subsequent steps. We already have some evidence that visualizations of this kind have a role to play (Rockwell and Bradley).
¶ 31 Leave a comment on paragraph 31 0 Grouping tools like the Mandala browser can also be used in this context, since one of the useful mental processes in interpretation is to group similar things together and see what kinds of patterns, if any, emerge from the groupings. The activity is usually iterative, leading to many dead ends before something worth pursuing further begins to emerge.
¶ 32 Leave a comment on paragraph 32 0 The Mandala browser, however, is suitable only for some of these explorations, since it proceeds by small increments defined by the researcher. Someone looking for material enclosed in a particular XML tag and combining it with a search term can benefit from the visualization of Mandala. But Mandala is not currently designed to go beyond Boolean combinations of specific tags and specific search strings. The system cannot produce a set of magnets, for example, that represent a semantic class rather than a single word. This situation is common to many tool sets: they enable some processes but not others, and often it is best to think of how to use several tool sets in combination.
¶ 33 Leave a comment on paragraph 33 0 The researcher who is interested in seeing all the speeches in Shakespeare that mention the natural elements either has to be lucky enough to find them grouped by the collection developers into one tag or has to list them individually. If the terms are used to define individual magnets (it is possible to define magnets that correspond to one of several terms), the display quickly becomes overwhelmingly complex, and there is always the chance that some key elements might be missed.
¶ 34 Leave a comment on paragraph 34 2 Other systems circumvent this problem. Two strategies in particular hold significant promise: clustering and classification. We experimented extensively with both in the MONK project (Metadate Offer New Knowledge). Clustering requires some fairly sophisticated logic in the system, allowing the reader to identify a set of texts and then to have the interface show how the texts—in variable units, such as paragraphs—group together into automatically identified clusters. There are dozens of available algorithms for performing this kind of clustering, which emphasize different features of the works being clustered. Mallet, developed by Andrew Kachites McCallum, is a widely used topic-modeling tool, although it is less often used in the digital humanities, partly because it is neither especially user-friendly nor designed for humanistic inquiry. As Martin Mueller, Jean-Frédéric de Pasquale and Jean-Guy Meunier, and others have pointed out, for a system to be useful to literary scholars, the clustering must involve more than just an automatic identification of topics, because, in literary studies, how something is said is often as significant as what is said.
¶ 35 Leave a comment on paragraph 35 0 In the classification tool set (fig. 8), MONK provides an interface that uses a variety of supervised classification algorithms to find similarities between works or parts of works that have been suggested by the user (supervised classification is a process in which the user identifies characteristics of a subset of items and the system uses this training set to classify the remaining items). The result is a tool set (“Search by Example”) with which the user can collect a set of documents that have some common feature of interest, then see how the system views the similar features of those works or parts of works and also what other works in the collection share those features. In one case study, for example, Catherine Plaisant and her colleagues identified a set of the poems by Emily Dickinson that had erotic content (i.e., “hot” versus “not hot”), then used a classification system to see what those poems had in common with one another (e.g., an unusual prevalence of possessive pronouns) and also what other poems the system thought should belong in the set. Other projects are looking at subjects such as the sentimental in novels (Steger), language in literary works versus newspapers (Horton), and reports of curses and spells in reports from early modern witchcraft trials (Uszkalo).
¶ 36
Leave a comment on paragraph 36 0
Figure 8. The MONK workbench, designed by Milena Radzikowska and Stan Ruecker and programmed by Amit Kumar, Andrew Macdonald, and Stéfan Sinclair, combines tools into tool sets that allow the user to carry out a variety of multistep research tasks with literary texts. In the classification tool set shown here, the user creates a set of passages representing some phenomenon of interest, and the system suggests similar documents while also making the set of distinguishing features available for examination.
¶ 37 Leave a comment on paragraph 37 0 It is worth saying a few words about the tool-set design of MONK. Although individual tools can be helpful, often a scholarly process can take advantage of more than one tool in sequence. Several digital humanities projects have therefore made use of the idea of recipes or tool sets, which guide the researcher through a process where some steps are facilitated by tools and others are manual, and different kinds of data are accessed or produced with each step. Examples of these kinds of projects include TAPoR (Text Analysis Portal for Resarch), MONK, and JiTR (Just-in-Time Research). Similarly, some interfaces are designed such that multiple modular tools can interact, even if they do not appear in sequential steps; the Voyant Tools interface is an example, as with this corpus from Shakespeare. Both tool sets and modular tools are conducive to iterative, interactive processes that depart from a simple model of data input and output.
Time and Space
¶ 38 Leave a comment on paragraph 38 0 Another general class of phenomena deals with patterns involving spatial organization, chronological sequence, or both. Some standard approaches to visualizing this kind of information exist. For spatial data where a map is involved, geographic information systems provide considerable flexibility in plotting data of various kinds (see, e.g., F. Black, MacDonald, and J. Black), and there is a growing interest in areas such as literary cartography. For chronological data, the timeline is a venerable visual format, whether manifested statically or interactively. Examples include the Simile project from MIT, ThemeRiver from Pacific Northwest Laboratories, and the somewhat more unusual SpiraClock from L’Ecole des Mines de Nantes. The Scholars’ Lab has recently released Neatline, an Omeka-based tool that “allows scholars, students, and curators to tell stories with maps and timelines.” The nondeterministic and interpretive aspects of Neatline are essential and make it an excellent example of how tools developed by digital humanists can differ from those developed in more purely scientific disciplines.
¶ 39 Leave a comment on paragraph 39 0 Yet not all spatial data can or should be related to geography, and in some cases the sequential data and the spatial arrangement are related to each other. In Watching the Script (fig. 9), for example, a reader can read the speeches of a play and also think about the blocking of a particular theatrical performance (Sinclair, Ruecker, Gabriele, and Sapp). This interface represents characters as colored circles that are positioned on a stylized stage and speeches as scrolling text. Student directors can arrange the blocking, adding annotations about their reasons for putting each character in a given location. Readers can choose which portions of the play or which characters to watch and also control the playback speed.
¶ 40
Leave a comment on paragraph 40 0
Figure 9. Watching the Script, designed by Sandra Gabriele and programmed by Stéfan Sinclair and others, draws on XML-encoded plays to allow students, directors, and actors to see a stylized reproduction of a play or screenplay that includes the blocking information.
¶ 41 Leave a comment on paragraph 41 0 Recently, we have been working on reimplementing this interface in a prototype called the Simulated Environment for Theatre (SET), which uses a gaming engine to provide a more realistic 3-D experience (fig. 10). Building on the design of previous visualizations for theater, such as Michael Best’s Scenario, SET allows the user to choose the viewpoint for seeing the stage, originating from anywhere in the theater (Roberts-Smith et al., “Visualizing Theatrical Text”). The development of SET was in large part motivated by feedback from domain experts in theater who, while recognizing the value of our initial interface, felt that the potential for the interface was limited from a practitioner’s point of view, especially because of its focus on text rather than on time and space.
¶ 42
Leave a comment on paragraph 42 0
Figure 10. The 3-D SET interface provides an alternative that focuses more on the line of action in the play than on the text.
Typographic Form as Interactive Visualization
¶ 43 Leave a comment on paragraph 43 0 Although many visualizations introduce additional graphic elements, it is also possible to use only the organization of words to convey meaning. A classic example is W. Bradford Paley’s TextArc, which combines words plotted in sequence around the periphery with the words that appear more than once plotted in the interior of the circle according to their average position, sometimes called a weighted centroid (fig. 11). The TextArc interface is less about allowing the user to generate different visualizations based on specified parameters and more about representing the text in a novel way. TextArc is a time-based visualization in which the system reads the original text and produces a curving line that navigates within the textual space depending on where each word is plotted.
¶ 44
Leave a comment on paragraph 44 0
Figure 11. TextArc is a visual collocation tool. This image shows TextArc loaded with Alice’s Adventures in Wonderland.
¶ 45 Leave a comment on paragraph 45 0 Among our projects, we have a variety of interfaces where text is privileged as a visual indicator of data, including a series of repetition graphs, designed by Piotr Michura, that show patterns of repetition in the context of the rest of the document (fig. 12). By plotting the words horizontally in sequence but shifting down one row for every repetition, the graph results in a visual thumbnail of a book where repeated phrases show as sharp, steep curves (Michura, Ruecker, Radzikowska, and Fiorentino). This interface is a radically different view of the text that can help provoke new interpretive insights. The glyphs in figure 12 are actual words, and, although they are illegible at this scale, they convey graphical meaning. The first text is an excerpt from Shakespeare’s Hamlet, and the second is from Gertrude Stein’s Making of Americans; Stein’s propensity for repetition and Shakespeare’s relatively high vocabulary richness are neatly summarized and confirmed by the graphs.
¶ 46
Leave a comment on paragraph 46 0
Figure 12. The repetition grid, designed by Piotr Michura, works at two scales. With the text at a legible size, it provides the user with details of reuse of individual words in a document. Zoomed out to a macro scale, it creates a thumbnail sketch of patterns of repetition in a document.
Interactive Glyphs
¶ 47 Leave a comment on paragraph 47 0 To this point, our discussion of visualizations for humanities researchers has focused on displays where image and text are treated as primary visual objects. Another set of our experiments are slightly more conventional scientific visualizations, in that they deal with various visual objects that stand in place of numeric data. These kinds of visualizations allow the user to see the forests that might otherwise appear to be a lot of individual trees.
¶ 48 Leave a comment on paragraph 48 0 For the literary scholar interested in studying patterns, there are two related but distinct needs. First, an interesting pattern of some sort must be identified. Stan Ruecker, for example, described how, in reading the letters of Lady Mary Wortley Montagu, he was struck by the repeated introduction of absurd fantasies. Montagu had a wide range of correspondents, and, in her letters to many of the principal ones, she introduced an absurd motif. In more than one instance she proposed leaving society to run off together to a desert island. In writing to her daughter from Italy, she develops at some length the idea that the local people have identified her as a witch, and so maintains that she is considering taking up witchcraft. The presence of these absurd fantasies suggests several questions: What does it mean that this writer chooses to suggest them? What purpose might they serve? What implications does it have for her relationship with her correspondent, and what does her correspondent’s response reveal about the nature of that relationship? Looking further afield, is Montagu the only letter writer of her period who includes absurd fantasies? Do writers from other periods use them too?
¶ 49 Leave a comment on paragraph 49 0 The last two questions lead to the second need of the scholar interested in identifying and studying patterns: knowing the scope of a pattern. Is it idiosyncratic to a particular writer, genre, historical period, and so on, or is it a widespread phenomenon? And if it is widespread, is it subject to meaningful local variations? To take another more general example, many writers employ extended metaphors. In the case of metaphor, the use is general, spread across genres and historical periods. But the localized use of particular metaphors by a given writer can be significant enough to be of scholarly interest, especially when understood against a comparative background.
¶ 50 Leave a comment on paragraph 50 0 A visualization intended to show patterns of interest should therefore provide these two features: a way to find possible patterns to investigate and a means to locate those patterns against a baseline provided by a set of other relevant works. What constitutes the set of relevant works will vary from project to project and needs to be selectable by the scholar. For some projects, such as tracing absurd fantasy in Montagu’s letters, it may be important to compare a pattern in one work with any instances of that pattern in other works of the same genre and century. For other projects, it may be more useful to look at a pattern in one work against a comparative background of other works by the same author. Still another project might be interested in identifying patterns across an entire collection, without reference to a particular work or author.
¶ 51 Leave a comment on paragraph 51 0 An example of this kind of complex visualization is the MONK project’s lexical-bibliographic glyph, which we placed as the central element in a scrolling list of significant words, lemmas, or parts of speech (fig. 13). The system identifies the list of words and associates them with a background of bibliographic data. As the scholar scrolls through the list of words, lemmas, or parts of speech, for each item in the list the system produces a circular object—an information glyph. The size of the circle in the middle of the glyph is related to the relative significance of the item and therefore changes from item to item. Plotted in segmented rings around this central circle are the bibliographic data, which break down the collection by work, author, date, genre, and so on, depending on which rings the user thinks might be useful for comparison.
¶ 52
Leave a comment on paragraph 52 0
Figure 13. The lexical-bibliographic glyph, designed by Carlos Fiorentino as part of the MONK project, allows the user to simultaneously view combinations of data from the micro level of the word, lemma, or part of speech with the macro level represented by the bibliographic information associated with the works. In this screenshot, the user can see the relative frequency of the selected term say in the works of Stein, Blake, and Dickinson and then further explore the associated metadata.
¶ 53 Leave a comment on paragraph 53 0 The conceptual complexity and analytic potential of a visualization interface such as the lexical-bibliographic glyph come at the cost of accessibility; we are a long way from the relatively intuitive Wordle visualization. We believe that through careful visual design and iterative user testing it is possible to create complex visualization tools that are useful to a wide range of scholars, although any one tool is unlikely to satisfy all types of humanities scholars. Rather, we should strive to create a full ecology of visualization tools that are sensitive to the particularities of humanities research and that offer a range of analytic functionality, from the most basic and recognizable to the most sophisticated and defamiliarizing.
¶ 54 Leave a comment on paragraph 54 0 A primary index to the quality of visualizations for humanities scholars is the quality and originality of scholarship that the systems support. In each of the projects mentioned here, we have been working with humanities researchers in an effort to produce a useful visual form of the data. Since humanities scholarship is often exploratory, we have also come to believe that interactive formats are in most cases preferable to static ones, since they allow the person using the system to add and subtract elements, experiment with different forms, pursue hunches or insights, and so on. It is therefore important that the expectations of the scholar correspond to the affordances of the visualization. It is important for the scholar to know enough about the visualization tools to understand that the interpretive work is being guided and biased by the data and software. Failing that, we need to have methodologies that are sufficiently well tested and understood for scholars to be able to use the tools with confidence. The question remains whether humanistic inquiry lends itself well to well-trodden methodologies when originality and idiosyncrasy are the norm.
Works Cited
Bederson, Ben B. “PhotoMesa: A Zoomable Image Browser Using Quantum Treemaps and Bubblemaps.” Proceedings of the Fourteenth Annual ACM Symposium on User Interface Software and Technology. New York: Assn. of Computing Machinery, 2001. 71–80. ACM Digital Library. Web. 6 Sept. 2012.
Black, Fiona A., Bertrum H. MacDonald, and J. Malcolm W. Black. “Geographic Information Systems: A New Research Method for Book History.” Book History 1.1 (1998): 11–31. Print.
Brown, Susan, Stan Ruecker, Milena Radzikowska, Matt Patey, Stéfan Sinclair, Jeffery Antoniuk, Sharon Farnel, and Isobel Grundy. “Visualizing Varieties of Association in Orlando.” Journal of the Chicago Colloquium on Digital Humanities and Computer Science 1.1 (2009): n. pag. Web. 6 Sept. 2012. <https://letterpress.uchicago.edu/index.php/jdhcs/article/view/7/52>.
Bush, Vannevar. “As We May Think.” Atlantic Monthly July 1945: n. pag. Web. 6 Sept. 2012. <http://www.theatlantic.com/magazine/archive/1945/07/as-we-may-think/303881/>.
Chow, Rosan, and Stan Ruecker. “Transferability—A Wonder on the Ground of Design Research.” Proceedings of Wonderground. Instituto de Artes Visuais, Design e Marketing. IADE, 2006. Web. 6 Sept. 2012. <http://www.iade.pt/drs2006/wonderground/proceedings/fullpapers/DRS2006_0318.pdf>.
Danis, C. M., F. B. Viegas, M. Wattenberg, and J. Kriss. “Your Place or Mine? Visualization as a Community Component.” Proceedings of the Twenty-Sixth Annual SIGCHI Conference on Human Factors in Computing Systems. New York: Assn. for Computing Machinery, 2008. 275–84. Hint.fm. Web. 6 Sept. 2012.
Drucker, Johanna. SpecLab: Digital Aesthetics and Projects in Speculative Computing. Chicago: U of Chicago P, 2009. Print.
Greenstein, Daniel. “Digital Libraries and Their Challenges.” Library Trends 49.2 (2000): 290–303. Print.
Harris, Jacob. “Word Clouds Considered Harmful.” Nieman Journalism Lab. Nieman Foundation for Journalism, Harvard U, 13 Oct. 2011. Web. 6 Sept. 2012. <http://www.niemanlab.org/2011/10/word-clouds-considered-harmful/>.
Harris, Jonathan, and Sepandar Kamvar. We Feel Fine: An Exploration of Human Emotion in Six Movements. Harris and Kamvar, 2007. Web. 6 Sept. 2012. <http://wefeelfine.org/>.
Horton, Tom. “Introducing Nora: A Text-Mining Tool for Literary Scholars.” AHRC ICT Methods Network Workshop on Historical Text Mining. Lancaster U, United Kingdom. 20 July 2006. Address.
Michel, Jean-Baptiste, Yuan Kui Shen, Aviva Presser Aiden, Adrian Veres, Matthew K. Gray, the Google Books Team, Joseph P. Pickett, Dale Hoiberg, Dan Clancy, Peter Norvig, Jon Orwant, Steven Pinker, Martin A. Nowak, and Erez Lieberman Aiden. “Quantitative Analysis of Culture Using Millions of Digitized Books.” Science 14 Jan. 2011: 176–82. Print.
Michura, Piotr, Stan Ruecker, Milena Radzikowska, and Carlos Fiorentino. “The Novel as a List of Words.” The Potential and Limitations of a List: An International Transdisciplinary Workshop. Center for Theoretical Study, Charles U and Philosophical Inst. of the Acad. of the Sciences of the Czech Republic, Prague. 8 Nov. 2007. Address.
Mueller, Martin. “Digital Shakespeare, or towards a Literary Informatics.” Shakespeare 4 (2008): 300–17. Print.
Pasquale, Jean-Frédéric de, and Jean-Guy Meunier. “Categorisation Techniques in Computer-Assisted Reading and Analysis of Texts (CARAT) in the Humanities.” Computers and the Humanities 37.1 (2003): 111–18. Print.
Pirolli, Peter, Patricia Schank, Marti Hearst, and Christine Diehl. “Scatter/Gather Browsing Communicates the Topic Structure of a Very Large Text Collection.” Proceedings of the SIGCHI Conference on Human Factors in Computing Systems: Common Ground. New York: Assn. for Computing Machinery, 1996. 213–20. Patricia Schank. Web. 11 Nov. 2012. <http://codeguild.com/patti/downloads/PirolliSchankEtAl-CHI1996.pdf>.
Plaisant, Catherine, James Rose, Bei Yu, Loretta Auvil, Matthew G. Kirschenbaum, Martha Nell Smith, Tanya Clement, and Greg Lord. “Exploring Erotics in Emily Dickinson’s Correspondence with Text Mining and Visual Interfaces.” Proceedings of the 6th ACM/IEEE-CS Joint Conference on Digital Libraries. New York: Assn. for Computing Machinery, 2006. 141–50. Human-Computer Interaction Lab. Web. 11 Nov. 2012. <http://hcil2.cs.umd.edu/trs/2006-01/2006-01.pdf>.
Roberts-Smith, Jennifer, Teresa Dobson, Sandra Gabriele, Stan Ruecker, and Stéfan Sinclair, with Matt Bouchard, Shawn DeSouza-Coelho, Diane Jakacki, Annemarie Kong, David Lam, and Omar Rodriguez-Arenas. “Visualizing Theatrical Text: From Watching the Script to the Simulated Environment for Theatre (SET).” Digital Humanities Quarterly. Forthcoming.
Roberts-Smith, Jennifer, Sandra Gabriele, Stan Ruecker, and Stéfan Sinclair, with Matt Bouchard, Shawn DeSouza-Coelho, Annemarie Kong, David Lam, and Omar Rodriguez. “The Text and the Line of Action: Re-conceiving Watching the Script.” Proceedings of the INKE 2009: Birds of a Feather Conference. University of Victoria. Web. 11 Nov. 2012. <http://journals.uvic.ca/index.php/INKE/article/view/170/155>.
Rockwell, Geoffrey, and John Bradley. “Watching Scepticism: Computer Assisted Visualization and Hume’s Dialogues.” Research in Humanities Computing. Ed. Giorgio Perissinotto. Vol. 5. Oxford: Clarendon, 1996. 32–47. Print.
Rodden, Kerry, Wojciech Basalaj, David Sinclair, and Kenneth Wood. “Does Organisation by Similarity Assist Image Browsing?” Proceedings of Human Factors in Computing Systems (CHI 2001). New York: Assn. for Computing Machinery, 2001. 190–97. Print.
Ruecker, Stan. “Intimacy through Private Language in the Letters of Lady Mary Wortley Montagu.” British Society for Eighteenth Century Studies Conf. Oxford. 4 Jan. 2001. Address.
Seal, J., M. Girdlestone, and C. Warwick. “Querying Keywords: Analysis of a Survey to Look at On-line Research Methods of the Actual and Potential Users of the Perdita Project.” Digital Evidence: Selected Papers from DRH2000, Digital Resources for the Humanities Conference. Ed. Michael Fraser, Nigel Williamson, and Marilyn Deegan. London: Office of Humanities Communication, 2001. 277–94. Print.
Shen, R., N. Srinivas, S. N. Vemuri, W. Fan, R. S. Torres, and E. A. Fox. “Exploring Digital Libraries: Integrating Browsing, Searching, and Visualization.” Proceedings of the 6th ACM/IEEE-CS Joint Conference on Digital Libraries. New York: Assn. for Computing Machinery, 2006. 1–10. Print.
Sinclair, Stéfan, Stan Ruecker, Sandra Gabriele, and Anthony Sapp. “Digital Scripts on a Virtual Stage: The Design of New Online Tools for Drama Students.” Proceedings of the Fifth IASTED International Conference on Web-Based Education. Anaheim: ACTA, 2006. 155–59. Print.
Smith, Crystal. “Word Cloud: How Toy Ad Vocabulary Reinforces Gender Stereotypes.” The Achilles Effect. Smith, 28 Mar. 2011. Web. 6 Sept. 2012. <http://www.achilleseffect.com/2011/03/word-cloud-how-toy-ad-vocabulary-reinforces-gender-stereotypes/>.
Steger, Sara. “Formulaic Emotion: Reading Victorian Deathbed Scenes from a Distance.” Digital Humanities 2009. College Park: Maryland Inst. for Technology in the Humanities, 2009. 32–34. Print.
Tufte, Edward. The Visual Display of Quantitative Information. Cheshire: Graphics, 2001. Print.
Unsworth, John. “Scholarly Primitives: What Methods Do Humanities Researchers Have in Common, and How Might Our Tools Reflect This?” Symposium on Humanities Computing: Formal Methods, Experimental Practice. King’s College, London. 13 May 2000. Institute for Advanced Technology in the Humanities. Web. 11 Nov. 2012. <http://jefferson.village.virginia.edu/~jmu2m/Kings.5-00/primitives.html>.
Uszkalo, Kirsten. “The Devil and Mother Shipton: Serendipitous Associations and the MONK Project.” Digital Humanities 2009. College Park: Maryland Inst. for Technology in the Humanities, 2009. 35–37. Print.
Thank you for the fine work, Stéfan, Stan, and Milena! A great entry for an exciting new publication venue, as is the whole anthology.