William A. Kretzschmar, Jr.
¶ 1 Leave a comment on paragraph 1 0 We are all familiar with maps as geographic tools. In the past we may have used a printed atlas to find the locations of cities or to plot routes to get from one place to another. Indeed, map and road map have become common metaphors for the arrangement and relations of things (e.g., “map out the consequences”) or for directions (“Do I have to draw you a road map?” as an ironic comment on how to perform a simple task). Now, we may well go online to do the same things that we used to do with paper maps. MapQuest, Google Earth, and Bing Maps are as familiar as the old atlas on the shelf. Less common but still familiar to academics are specialized atlases that combine information with maps, such as historical atlases that document cultural or demographic changes (e.g., the extension of the Roman Empire at different points in time).1 Maps with additional, nongeographic information are also available on computers, like those that show weather systems or election returns. A geographic information system (GIS) associates information with maps by means of a computer.2
¶ 2 Leave a comment on paragraph 2 1 To begin with a concrete example of a humanities GIS, we can consider the GIS prepared for the documentary archive site Salem Witch Trials. Figure 1 shows a map of Salem Village that plots where the accusers and the accused lived. This is no static map, however. Clicking on the time line at the bottom of the map changes the information plotted. When first accessed, the map has no locations plotted on it; different days show different people relevant to the day. The first day on the time line, for instance, shows a cluster of people, including Sarah Good, Sarah Osborn, and Tituba, whose arrest warrants were issued on that date. Figure 1 shows a map for the day Martha Cory was put in jail, 21 March 1692. Clicking on her name brings up another screen with links to her biography, documents like the warrant for her arrest, and even the image Giles and Martha Cory at Table, shown in figure 2. The central idea illustrated by the Salem Witch Trials GIS is the use of a map to access many different kinds of information. The place in the community where an accuser or an accused witch lived may merit the creation of a map, but the locations can also help organize information related to each person. Maps provide a graphic means to associate data, texts, and more graphics into the complex cultural matrices that we expect to find in our study of the humanities.
¶ 5 Leave a comment on paragraph 5 0 During the past few decades there have been rapid advances in parallel with advancing computer power for such systems. Sometimes a GIS may be used for mainly cartographic tasks, but increasingly GIS implementations allow users to access information stored in databases, often to help them make decisions by processing data in association with maps. After the national censuses in 2000 and 2010, for instance, states had to draw up congressional districts to match their new population distribution, and GIS proved to be a crucial tool. In the commercial world, planners often use GIS to help decide where to locate a new business, looking at population data, traffic patterns, and locations of established businesses. GIS tools can also be used for image enhancement, whether in analysis of geographic satellite pictures or the pixels of other images, since the locations processed may be spatially related and not necessarily geographic. In academic research, GIS can be used for studies of ecology (e.g., Dale), archaeology (e.g., Wheatley and Gillings), or public health (e.g., Lawson). Any study that associates data of any kind with geographic or spatial locations will profit from use of a GIS—including language and literary study.
¶ 6 Leave a comment on paragraph 6 0 When we choose to represent information on a map, we are engaged in scientific modeling (see Kretzschmar). Models are real and concrete entities, representations separate from the reality that generates them. They are useful as replacements for things or phenomena that cannot be directly observed because they are too large (weather systems) or too small (atoms) or for things that we wish to manipulate and understand without interfering with the source. Models need not be physical copies but may consist of formulas or other deliberate statements of fact, inference, or relation that describe or predict an aspect of reality (as in econometric or climatological models). Model makers must necessarily determine the attributes to include and the scale of the representation. Road maps may have only the major highways or also show local streets; weather maps may show satellite images, radar, or isobars and fronts as schematic lines. In so doing model makers must be systematic in the creation of the representation (e.g., show all the roads for the scale and area represented). The making of any model is a deliberate act of the maker, in part a reflection of the maker’s theoretical foundations and assumptions about what is represented. The more explicitly these ideas are formulated and made known, the more usable the model will be for others besides the maker—and vice versa.
¶ 7 Leave a comment on paragraph 7 0 Now let us apply this general treatment of models to linguistic and literary maps. A descriptive model of language data simply associates the data with locations without trying to make generalizations or draw conclusions, as in figure 3, a classic map that shows how people interviewed for the Linguistic Atlas of New England pronounced bureau or dresser in response to a question about the large piece of bedroom furniture with drawers (Kurath et al.). Figure 4 shows a more schematic model, generated online in a GIS from the Linguistic Atlas of the Middle and South Atlantic States (LAMSAS), which is still descriptive because each symbol indicates the presence or absence of the response chest of drawers in the LAMSAS database. The homemade GIS software uses Python scripts and produces maps in a few seconds as users ask for them. Most GIS applications in language and literary study produce descriptive maps.
¶ 10 Leave a comment on paragraph 10 0 A predictive model, on the other hand, will process the data for the purpose of making generalizations and drawing conclusions. Figure 5 uses the same data as figure 4 but includes density-estimation statistics to render a prediction about the likelihood that the response chest of drawers might have been elicited at any location in the region at the time of the survey (in quartiles, with the darkest squares the most likely locations). GIS software, in this case MapInfo, was used to generate the LAMSAS density-estimation maps. A library of density-estimation maps is available on the site, but the calculations and manipulation required to make the maps are too complex for us to make them interactive and immediate like the simple descriptive plot of figure 4. While predictive applications of GIS are more common in the social and natural sciences, where use of statistics is a larger part of the research, they are having an increasing role in humanities computing.
¶ 12 Leave a comment on paragraph 12 0 Both descriptive and predictive models are valuable tools in the study of language and literature. Yet there is room for confusion between these two poles. Traditional isoglosses may appear to be descriptive lines that show the limit of occurrence for some linguistic feature, as in figure 6. In practice, however, isoglosses have been used as predictive tools,3 because the isoglosses have allowed for scattered occurrences on the wrong side of the isogloss and thus have made quantitative predictions about the preponderance of the feature in the area marked (as for the triangle icons in figure 6; see Kretzschmar). In such cases, the model is not so much inaccurate as it is based on an unstated assumption. Frankly predictive models used in computer mapping of language data typically apply techniques of spatial statistics to establish the distribution of features among a sample of speakers and thus estimate (and thereby predict) the distribution of features in the population more generally. GIS does not automatically produce good maps, but if the developer employs a consistent, comprehensive model, the resulting GIS can make good maps automatically.
¶ 14 Leave a comment on paragraph 14 1 The first step to develop a GIS is to geocode the data; that is, to assign a location to it. For the Literary Map of Alabama, for example, each Alabama author is labeled (or coded) in the database with the county where the author had an Alabama connection. Clicking on a county on a map of Alabama yields a list of authors connected to it, and each author’s name links to an author information page. In another form of geocoding, the Kansas City Literary Map, prepared by the Johnson County Library, has small numbered icons on a map of Kansas City. Clicking on an icon shows the name of an author and work, with a short passage associated with the location, and allows the users to check on the availability of the book in the library. For the Linguistic Atlas Project, the location of speakers’ communities is geocoded with latitude and longitude coordinates, a common practice for GIS that allows a database to be used with many different kinds of maps, provided that the base map has registration for latitude and longitude (i.e., the ability to locate such coordinates on the map). Geocoding thus uses various methods to associate information with specific locations on maps.
¶ 15 Leave a comment on paragraph 15 0 GIS uses two kinds of map displays, raster and vector maps. Raster maps work like a television or computer monitor by breaking the map region into pixels or into larger identical units (tessellations) that completely occupy the map. Data can be associated with any point on the map, and it is possible to show continuous variation across the region. Satellite images, like those on Google Earth, use pixels, each of which has its own color characteristics. Other information can also be associated with each pixel, such as its status as part of a state or other political entity or its status as part of a lake, road, building, or other natural or man-made feature on the land. A similar effect can be realized by using a spreadsheet’s rows and columns. Paulina Bounds has imported information scanned from paper maps drawn by her research subjects to create figure 7. Each cell of the spreadsheet shows the number of preprinted maps that had a mark (part of a boundary line, part of a shaded region), whether preprinted or drawn on the map by a research subject. The preprinted border of Poland, represented on the maps by a line, shows clearly on the spreadsheet because of the double-digit numbers in the raster display. The spreadsheet also shows clearly the four areas where the research subjects thought there were Polish dialects. Figure 5, LAMSAS’s density-estimation map, also uses a raster model, in which three thousand small boxes, each 0.2 degree of latitude by 0.2 degree of longitude, cover the survey area. The likelihood of eliciting chest of drawers was calculated for each box based on whether the response was found at the nearest neighboring speaker locations, and the result of the calculation determined the shading for the box in the map display. The spreadsheet and density-estimation maps can show continuous variation across the pattern of boxes imposed on the survey region, whether by the native grid from a spreadsheet or deliberate application of tessellation.
¶ 17 Leave a comment on paragraph 17 0 Vector maps, on the other hand, are composed of particular points, lines, or shapes on a map. The Literary Map of Alabama shows counties, each of which is a shape with its own boundaries. We most commonly think of maps as having different kinds of objects on them, such as towns, roads, or states. A Literary Map of Manhattan (fig. 8) geocodes locations on a city street map and associates the book symbols with short passages from a work connected with that place. Sometimes vector maps are produced automatically from data locations, by algorithm. Figure 9 is a map of the occurrences of the word gutter as a variant response in LAMSAS for gully, or “washed out place in a field,” in another homemade GIS, this time programmed in Visual Basic. If gutter was used in a community, a line was represented as reaching out toward each of its nearest neighboring communities. If no nearby communities used gutter, the result looks like a star or asterisk—but the vector technique gives us a visual idea of networks of neighboring communities using gutter in Pennsylvania and West Virginia. Maps with isoglosses, like that in figure 6, are predictive uses of the vector technique.
¶ 20 Leave a comment on paragraph 20 0 The last essential GIS concept is layers. Graphic computer displays depend on the idea of overlays, of superimposing different layers of graphic information to produce a composite picture. The standard PC Windows or Mac display, for example, superimposes user-selectable icons on a user-selectable background and then opens windows on top of one another. Computer mapping uses the layer principle to establish a base map and to add user-selectable layers, each of which contains particular information. The sequence of displays in figure 10 illustrates the layering process, using screenshots taken from a LAMSAS GIS to make self-organizing maps (SOMs, or neural networks).4 The first screenshot shows only the box used to add layers; the second screenshot shows the result of adding the first layer, a base map of eastern states. The next screenshot adds the county boundaries for the states; the final screenshot adds a small box for the location of each LAMSAS community. This sequence shows that the map presented as a composite is actually a set of three layers, any of which may be processed in some way to create a specific visualization.
¶ 22 Leave a comment on paragraph 22 0 To make a graphic display interactive, the information on the layers can be adjusted by program, such as the result of a statistic. The SOM process uses statistics to try to find groups of speakers with similar responses in the LAMSAS survey. Figure 11 shows SOM output that identified a set of speakers, this time with the county boundaries omitted. The set of speakers selected in the SOM output is represented by a lighter shade of gray.
¶ 24 Leave a comment on paragraph 24 0 Information on layers of the graphic display can also be accessed and may be used as a key to retrieve other information. In figure 12, for example, the information tool of the SOM GIS has been used to click on the westernmost community location in Virginia to display information from the LAMSAS database of speaker and interview characteristics. The layers of the map display are thus under the control of the developer and the user, available to be used in complex visualizations of complex data.
¶ 26 Leave a comment on paragraph 26 0 Management of the layered display can thus be used to create different visualizations of the data contained in databases, since that data is processed by statistics or other manipulations of the programmer and user. That is, the GIS system is not identical with any statistic or manipulation that might be programmed into it. There is not just one all-purpose system that provides the right or best or only statistics or manipulations needed by GIS developers and users; developers must customize the sets of graphic tools GIS systems provide.
¶ 27 Leave a comment on paragraph 27 0 Data processing in GIS often consists of implementation of spatial statistics, commonly used by practitioners of technical geography. In the last twenty years statistical techniques for analysis of geographic patterns, such as point pattern analysis and spatial autocorrelation, have developed at a rapid rate: a recent search for “spatial” and “statistics” on Amazon recovered 3,815 relevant titles. Use of spatial statistics has been a regular feature of several different GIS implementations for LAMSAS.
¶ 28 Leave a comment on paragraph 28 0 There are two fundamental approaches to technical geography. First, it is possible statistically to analyze a geographic pattern under study to see if the pattern exhibits the property of complete spatial randomness (CSR). If a pattern is not spatially random, it may be possible to specify whether the pattern is more uniform than CSR (such as the location of the black-and-white squares of a chessboard) or more clustered than CSR (such as the occurrence of a high proportion of modern human populations in urban centers). It is possible to consider either the dispersion of locations with respect to the study area (a regular or clustered pattern across the whole study area) or the arrangement of locations only with respect to one another (a regular or clustered pattern in any part of the study area). The GIS illustrated in figure 6 is an example of spatial autocorrelation; the join-count statistics indicate that the display is more clustered than CSR. Another approach to data processing in GIS involves the calculation of distances between different locations. In real geographic space, a familiar example is the calculation of the most direct route between two places on a map, as on driving directions from MapQuest. Distance can also be interpreted in ways other than miles or kilometers. Various statistics have been used to derive abstract, nongeographic distances for GIS applications, notably multidimensional scaling (MDS) or neural network analysis (as illustrated here with SOM in figure 8). In such analysis, the notion of similarity depends on complex mathematical calculations that position data points in abstract, nonrepresentational space. In such cases analysts must use care not to interpret results just in terms of a visualization in the two or three dimensions that they can immediately perceive but to take account of the mathematics that generate the visualization. Developers need to decide which statistical model or technique has the best fit for the particular project, but GIS systems do not require fancy statistics to create useful visualizations.
¶ 29 Leave a comment on paragraph 29 0 While GIS software is the current gold standard for computer mapping of literary or language (or any other) data, it is certainly possible to make computer maps with many different kinds of software. Perhaps the best-known full-featured commercial GIS systems come from Esri (ArcInfo, ArcView, MapObjects, and others) and MapInfo (MapInfo, MapBasic, and others). MapInfo was bought in 2007 by Pitney Bowes; Atlas GIS, another commercial product formerly in wide use, was absorbed by Esri. One popular, well-supported, full-featured GIS package that is available as freeware is GRASS GIS.5 The learning curve is high for these products—there are full university courses, even sequences of courses, to teach their use—but they will do whatever the user needs to do. Technical geographers routinely write their own statistical programs to work with GIS packages, using programming languages such as Visual Basic or Python (as for the Linguistic Atlas Project). The well-known statistical program SAS offers its own GIS package. There are also many GIS programs that offer fewer features but may be easier to learn and use. Microsoft Map, for example, once distributed with the Office suite, is now the add-on Microsoft MapPoint. There is a much more limited mapping extension for Apache OpenOffice Calc (EuroOffice Chart Map). Google Earth and Bing Maps (formerly Microsoft Virtual Earth) are GIS systems on the Web that have some free services but require developers to purchase additional licensed software (Google Earth Pro, Bing Maps Platform) to make their own GIS applications. Educational and not-for-profit institutions may be able to get such licenses for free. One good example of a literary GIS that employs Google Earth is Mapping the Lakes: A Literary GIS, prepared at Lancaster University as a pilot project with funding from the British Academy.6 Another promising literary GIS site is Google Lit Trips, where contributors have offered their GIS annotations for literary works designed for readers with different educational levels; the “lit trips” require that users download files for manipulation within Google Earth.7
¶ 30 Leave a comment on paragraph 30 0 Unfortunately, all these options either require or are facilitated by programming by the user. The mapping tools available on Linguistic Atlas Project, for example, have been painstakingly programmed as scripts that run on a server below the Web interface and are therefore not portable to other projects. Users will need to invest considerable time and resources in the preparation of GIS tools that really meet their needs. To that end, committed developers may benefit from learning a scripting language like Perl or Python or the high-level language with which to build applications in database packages like Access. The high-level languages offered by database packages are optimized for database management (for example, by use of Structured Query Language [SQL]), which allows users to focus more on analysis and less on foundational programming for how to get the data into a searchable structure. Even mapping data in spreadsheets requires time to associate the data in rows and columns with the mapping utility. In the absence of any user-friendly turn-key system for mapping language and literary data, users must be prepared to customize whatever software they use to display their results.
¶ 31 Leave a comment on paragraph 31 0 One low-tech option for a homemade GIS is to map data in fixed-character-based displays. Lee Pederson, when developing the mapping programs for the Linguistic Atlas of the Gulf States in the 1980s, did not have available the range of computer graphic resources that we now take for granted. He followed the example set by Alan Thomas in Areal Analysis of Dialect Data by Computer and created the graphic plotter grid (Pederson, “Graphic Plotter Grid”). Instead of plotting symbols on a base map, which required graphics software, Pederson made a grid using the regular character locations on the screen or print positions on the printout page. Figure 13 shows the fatwood and rich pine responses from the atlas question about the small pieces of wood that one uses to start a fire, kindling. The plotter grid makes a recognizable picture of the southern United States near the Gulf of Mexico. In the 1980s, computer monitors still primarily used fixed-width character displays (still an alternative to today’s more familiar proportional fonts); thus the computer screen or output to the printer could be thought of as a set of columns and rows of symbols, according to the raster model for a GIS. Pederson’s 70 x 34 grid of character locations provided 2,380 possible points. In the map, each location displayed or printed a dot or letter corresponding to one of the 911 speakers in the survey, and spaces were used to show places where there was no informant—such as the ocean or gaps in the less-dense pattern of interviewing in Florida and Texas. The completely filled eastern area of the region represents the dense interviewing there; individual speakers were shown not at their exact location of residence but as close as the fixed spacing and density of speakers would allow. The arrangement is an elegant solution that does not require complex graphics handling to produce a picture for computer maps. Pederson made his fixed-character-based GIS displays interactive by means of several programs written in BASIC. Text handling and formatting is still easier today than working with computer graphics and thus is the better choice for beginning programmers in the humanities looking to make their own GIS applications.
¶ 33 Leave a comment on paragraph 33 0 One of the most important functions for GIS programs is access to data for visualization. The Web offers good tools for this purpose. A fine example is The Map of Early Modern London, which allows the user to navigate the city and click on red stars to get information on a location. The initial map shows a grid (fig. 14), and one click takes the user into the city, where small stars indicate points of interest (fig. 15). A further click yields a descriptive passage (fig. 16).
¶ 37 Leave a comment on paragraph 37 0 North Carolina Maps, at Carolina Digital Library, is another good example; users can view historical maps of different parts of North Carolina, overlaid on their modern counterparts from Google Earth. The historical maps can be turned on and off or faded, so that users can see what has happened to the location in modern times in comparison to when the historical maps were made.
¶ 38 Leave a comment on paragraph 38 0 The Linguistic Atlas Project offers a suite of tools for access to data. The greatest degree of interactivity is available in the sections devoted to LAMSAS and to African American English and Gullah. Figure 17, for instance, shows a map of New York State with speaker locations identified; each speaker label is clickable and retrieves information about the speaker, as shown in figure 18. Each speaker screen also has clickable parts, for the legends to the different categories of information and for retrieval of the responses offered by the particular speaker to different questions. If users begin with data in tables, whether about the speakers or about what they said, whether historical data for a cultural study or metadata for a linguistic survey, it is possible to turn a map display of the results on and off at will. The GIS provides opportunities for users to see what they want, as they require it.
¶ 41 Leave a comment on paragraph 41 0 Most of the Web functions illustrated here just take advantage of the native linking and display tools of HTML (i.e., the ability to call and open files, CSS for display), and there is no need to describe those. The data is linked from its native storage and formatted as needed for display. They are nonetheless GIS implementations, even without using special GIS software, because they associate information with geographic locations. It is possible to get the impression that computer maps of language and literary data ought to have complex graphic representations, but this need not be so. The worst kind of GIS for literary or linguistic study is the one that the analyst cannot make and so cannot use to look at or show information. The best sort of GIS is the one that the analyst can make and use, whether the maps are low-tech creations or layered graphic images in a commercial GIS. The real hallmark of GIS as a research tool is effective association of information with geographic locations, as the result of using a computer in the best way that the developer can manage. Spatial thinking may be a new way for humanists to imagine culture. Every project in the humanities works with cultural events or products that occurred or were made or talk about activity somewhere. Spatial location is the single aspect of humanities research that can tie together all the rich information from the complexities of culture that we need to talk about. We should be thinking about GIS as a means to help us make those connections.
¶ 43 Leave a comment on paragraph 43 0 2. According to Wikipedia, “The acronym GIS is sometimes used for ‘geographical information science’ or ‘geospatial information studies’ to refer to the academic discipline or career of working with geographic information systems” (“GIS”). In this essay, GIS is used with its mainstream definition, as set forth by Esri (the industry-leading maker of GIS software): “A geographic information system (GIS) integrates hardware, software, and data for capturing, managing, analyzing, and displaying all forms of geographically referenced information. GIS allows us to view, understand, question, interpret, and visualize data in many ways that reveal relationships, patterns, and trends in the form of maps, globes, reports, and charts. A GIS helps you answer questions and solve problems by looking at your data in a way that is quickly understood and easily shared” (“What Is GIS?”).
¶ 48 Leave a comment on paragraph 48 1 7. As described on the Google Lit Trips site, “Google Lit Trips are free downloadable files that mark the journeys of characters from famous literature on the surface of Google Earth. At each location along the journey there are placemarks with pop-up windows containing a variety of resources including relevant media, thought provoking discussion starters, and links to supplementary information about ‘real world’ references made in that particular portion of the story” (Burg). While the original intent of Google Lit Trips is pedagogical, nothing prevents interested users from making a more scholarly presentation.
Bounds, Paulina. “Perception of Dialects in Poland.” Diss. U of Georgia, forthcoming.
Burg, Jerome. “About GLT.” Google Lit Trips. Google Lit Trips, n.d. Web. 11 Dec. 2012.
Dale, Mark R. T. Spatial Pattern Analysis in Plant Ecology. Cambridge: Cambridge UP, 1999. Print.
Daniels, Stephen, Dydia DeLyser, J. Nicholas Entrikin, and Douglas Richardson, eds. Envisioning Landscapes, Making Worlds: Geography and the Humanities. London: Routledge, 2011. Print.
Dear, Michael, Jim Ketchum, Sarah Luria, and Douglas Richardson, eds. GeoHumanities: Art, History, Text at the Edge of Place. London: Routledge, 2011. Print.
“GIS.” Wikipedia. Wikimedia, n.d. Web. 11 Dec. 2012.
“Introduction.” Mapping the Lakes: A Literary GIS. Lancaster U, n.d. Web. 11 Dec. 2012.
Kretzschmar, William A., Jr. “Isoglosses and Predictive Modeling.” American Speech 67 (1992): 227–49. Print.
Kurath, Hans. A Word Geography of the Eastern United States. Ann Arbor: U of Michigan P, 1949. Print.
Kurath, Hans, et al. Linguistic Atlas of New England. 3 vols. in 6. Providence: Brown U–Amer. Council of Learned Socs., 1939–43. Print.
Kurath, Hans, and Raven Ioor McDavid, Jr. The Pronunciation of English in the Atlantic States. 1961. Tuscaloosa: U of Alabama P, 1982. Print.
Labov, William. A National Map of the Regional Dialects of American English. Linguistics Laboratory, Dept. of Linguistics, U of Pennsylvania, 15 July 1997. Web. 11 Dec. 2012. <http://www.ling.upenn.edu/phono_atlas/NationalMap/NationalMap.html>.
Lawson, Andrew B. Statistical Methods in Spatial Epidemiology. New York: Wiley, 2001. Print.
Pederson, Lee. “A Graphic Plotter Grid.” Journal of English Linguistics 19 (1986): 25–41. Print.
———, ed. Linguistic Atlas of the Gulf States. 7 vols. Athens: U of Georgia P, 1986–92. Print.
Thomas, Alan R. Areal Analysis of Dialect Data by Computer: A Welsh Example. Cardiff: U of Wales P, 1980. Print.
“What Is GIS?” Esri. Esri, n.d. Web. 11 Dec. 2012. <www.esri.com/what-is-gis/overview>.
Wheatley, David, and Mark Gillings. Spatial Technology and Archaeology. London: Taylor, 2002. Print.