2013 Daniel Powell, with Constance Crompton and Ray Siemens, “Glossary of Terms, Tools, and Methods”
Daniel Powell, with Constance Crompton and Ray Siemens
¶ 1 Leave a comment on paragraph 1 0 A social-networking platform for academics to share, track, and communicate research. Founded in 2008, Academia.edu has over 1.4 million users and contains nearly 1.4 million papers. The site allows for the real-time access of research relevant to users’ interests in an open-source format.
¶ 2 Leave a comment on paragraph 2 0 A proprietary, browser-independent, vector-graphics animation platform. Using the Player plug-in, Flash content will appear identically across various browsers and devices.
¶ 3 Leave a comment on paragraph 3 0 A language game developed by Warren Sack. Agonistics performs linguistic analysis in real time to construct a visualization of interlocutors’ discourse positions in relation to one another using avatars. The program places the person who is most central to the discussion in the middle of the visualization. Agonistics is designed to provide an interface with what Sack refers to as “Very Large Scale Conversations,” such as electronic discussion lists, newsgroups, or comments. For original proposal materials, see http://people.ucsc.edu/~wsack/Agonistics/Proposal/; for the Whitney Museum of American Art’s online gallery space devoted to Agonistics, see http://artport.whitney.org/gatepages/artists/sack/.
¶ 5 Leave a comment on paragraph 5 0 A free concordance program available for Windows, Mac OS X, and Linux operating systems. AntConc has evolved from a simple concordance program into a powerful tool for textual analysis. It is able to perform the following types of linguistic analyses: concordance, concordance plot, clusters, n-grams, collocates, word frequency, keyword list. Available at http://www.antlab.sci.waseda.ac.jp/antconc_index.html.
¶ 6 Leave a comment on paragraph 6 0 A public domain, open-source, Web-server software package. This software allows a user’s computer and a Web server to communicate with each other. The Apache HTTP Server is the most widely used Web server in the world.
API (application programming interface)
¶ 8 Leave a comment on paragraph 8 0 A proprietary, full-featured GIS system produced by Esri. ArcInfo is now part of the ArcGIS product line and refers to the highest level of functionality in the ArcGIS for Desktop line of software, including Desktop Basic (formerly ArcView), Desktop Standard (formerly ArcEditor), and Desktop Advanced (formerly ArcInfo). ArcGIS for Desktop Advanced includes the ability to view and manipulate a wide variety of GIS data and geographic features, as well as highly advanced functionality in the areas of spatial analysis, geoprocessing, data management, and display. See GIS.
ARTFL (Project for American and French Research on the Treasury of the French Language)
¶ 9 Leave a comment on paragraph 9 0 A collaborative project between the University of Chicago and the French National Center for Scientific Research (CNRS) laboratory Analyse et Traitement Informatique de la Langue Francaise (ATILF). Its collection of full-text databases consists of several distinct modules, both public and subscriber-based, including Diderot’s Encyclopédie and Supplément à l’Encyclopédie, ARTFL-FRANTEXT, ECCO-TCP, Bibliothèque Bleu de Troyes, Dictionnaires d’autrefois, Perseus under PhiloLogic, and Opera del Vocabolario Italiano.
¶ 10 Leave a comment on paragraph 10 0 An ARTFL database of French-language texts. Consisting of over 3,500 texts, ranging from classic works of French literature to technical writing and spanning the eighteenth, nineteenth, and twentieth centuries, FRANTEXT is the largest collection of digitized French resources in North America.
BASIC (beginner’s all-purpose symbolic instruction code)
¶ 13 Leave a comment on paragraph 13 0 The application of computing and information technologies to the study and preservation of biological data. An inherently interdisciplinary field, bioinformatics research originates in computer science and more traditional scientific fields. Major bioinformatics research fields include sequence analysis of DNA, databases and data mining of scientific literature, 3-D visualization, and genome annotation.
¶ 14 Leave a comment on paragraph 14 0 A type of search relying on Boolean logic. Boolean searching allows users to combine words and phrases into more precise search statements. Typical Boolean operators include AND, NOT, and OR; when used between keywords, they limit search results.
Brown University Women Writers Project (Women Writers Project, WWP)
¶ 15 Leave a comment on paragraph 15 0 An established and long-term digital research and archiving project devoted to early modern women’s writing and its electronic preservation and encoding. Founded in 1988, the WWP has had a great influence on the development of both the TEI Guidelines and the planning of long-term digital projects. WWP also publishes Women Writers Online (WWO), a full-text collection of early women’s writing in English.
¶ 16 Leave a comment on paragraph 16 0 A software application allowing users to locate and retrieve information from networked information services. Now most frequently used to refer to a Web browser, the term refers to a specialized computer program for viewing, interacting with, and navigating Web pages. These programs use HTTP to implement HTML.
¶ 17 Leave a comment on paragraph 17 0 Within the Voyant set of textual analysis tools, a visualization designed to display patterns of word repetition in one or more documents. See the explanation at Hermeneuti.ca (http://hermeneuti.ca/voyeur/tools/Bubblelines).
¶ 18 Leave a comment on paragraph 18 0 A University of North Carolina initiative designed to build and preserve digital collections in North Carolina while developing standards and practices to guide libraries and archives in promoting scholarly communication. The CDLA publishes content and serves as an institutional repository for work originating at the University of North Carolina, Chapel Hill.
¶ 19 Leave a comment on paragraph 19 0 Provides a comprehensive digital directory of bibliographic and biographical content about women writers, as well as access to full-text transcriptions of works by women. A Celebration of Women Writers is a sister site of The Online Books Page.
¶ 20 Leave a comment on paragraph 20 0 A way of analyzing data that classifies a set of information into two or more mutually exclusive groups based on combinations of internal variables. Cluster analysis is useful for discovering structures and patterns within data based solely on a selected category of similarity and difference. In practice, cluster analysis of a corpus of texts usually groups them together according to the similarities and differences of the frequencies of the most frequent words. Cluster analysis has been shown to be highly reliable in authorship attribution and genre identification. The statistical software program Minitab facilitates cluster analysis.
¶ 21 Leave a comment on paragraph 21 0 An open-source set of tools designed to aid students and scholars working in networked archives and federated repositories of humanities materials. It allows users to collect, annotate, and tag online objects; produce interlinked exhibits and scholarly essays; and share such collections and exhibits with other users. Housed at NINES at the University of Virginia, Collex builds on MIT’s SIMILE project to bring scholarly and reliable folksonomic tagging into archival practice. The Collex software underlies both NINES and 18th Connect federated research systems.
¶ 22 Leave a comment on paragraph 22 0 An open-source theme and plug-in for the WordPress content management system that allows readers to comment paragraph by paragraph in the margin of a text. It can be applied to a fixed document (e.g., essay, book) or to a constantly updated blog. Recently, CommentPress has evolved into Digress.it, a more robust version of the application. Users of CommentPress must have a WordPress Web site, and users of Digress.it must register on Digress.it for a hosted account.
¶ 23 Leave a comment on paragraph 23 0 An ontology creation and visualization tool designed to allow users to define and link concepts and resources relevant to a particular area of study. ConceptVISTA facilitates user creation of a conceptual universe that connects data, knowledge, people, methods, organizations, and so on.
¶ 24 Leave a comment on paragraph 24 0 A proprietary concordance program. Concordance is a comprehensive application with a number of powerful features, including multiple language support, user-definable alphabets, user-definable contexts, multiple-pane viewing, the ability to statistically analyze selected texts, and the ability to export concordance results as text, HTML, or Web Concordance files.
content management system
¶ 25 Leave a comment on paragraph 25 0 A software program or suite of applications designed to enable the creation, editing, review, organization, and publication of content to the Web from a central interface. Popular content management systems include WordPress, Drupal, and Joomla!.
¶ 26 Leave a comment on paragraph 26 0 A popular, intermediate-level programming language developed in the late 1970s. C++ greatly influenced the development of C# (“C sharp”) and Java, two programming languages widely used today.
CPU (central processing unit)
¶ 27 Leave a comment on paragraph 27 0 The principal operating part of a computer, the CPU carries out the instructions of a computer program and performs the basic mathematic, logical, and input-output operations of a computer system.
cross-archive data mining
¶ 28 Leave a comment on paragraph 28 0 The ability to search for information simultaneously across multiple digital archives, especially if those archives provide access to full-text materials. NINES and 18thConnect model how federated databases can facilitate cross-archival search.
CSR (complete spatial randomness)
¶ 29 Leave a comment on paragraph 29 0 In statistics and probability theory, a process in which events occur within a given study area in an entirely indiscriminant fashion. In complete spatial randomness, events are distributed independently at random and uniformly over the area in question; there are thus no regions of the area where events are less or more likely to occur. CSR is a basic test undertaken in geographic analysis to determine how widely a given data set varies from a mathematically random norm.
CSS (cascading style sheets)
¶ 30 Leave a comment on paragraph 30 0 A way of specifying appearance of HTML or XML in a browser. CSS allows the separation of structural content from presentation. For a further introduction to CSS, see Eric A. Meyer, Cascading Style Sheets: The Definitive Guide (2nd ed.; Sebastopol: O’Reilly, 2004; print) 1–22.
¶ 31 Leave a comment on paragraph 31 0 A collection of information organized in such a way that a computer program can quickly select desired data. The structure of a database is dependent on the type of relationship being described. A database differs from a file of that same information in that it describes how the data relate to one another instead of presenting an unordered collection of the same content.
¶ 32 Leave a comment on paragraph 32 0 A folksonomic Web service for storing, sharing, and discovering bookmarked content on the Web. Users tag bookmarks with freely chosen index terms, leading to a folksonomic system of user-generated metadata. Delicious is an example of a Web 2.0 application.
¶ 33 Leave a comment on paragraph 33 0 A news Web site on which stories are posted by users-contributors who rank stories. Much like Delicious, Digg is a Web 2.0 application oriented toward the sharing and discovery of online content.
¶ 34 Leave a comment on paragraph 34 0 A weeklong digital humanities training and discussion institute consisting of intensive course work, seminars, lectures, and colloquia. Taking place at the University of Victoria, the DHSI brings together faculty and staff members; students from the arts, humanities, library, and archives communities; as well as independent scholars and cultural institution workers.
¶ 35 Leave a comment on paragraph 35 0 A digital collection encompassing the entire corpus of Barth’s publications, in both the original German and numerous translations. Texts in the collection have been transcribed by hand, tagged with appropriate metadata, and integrated into a robust research and display environment.
DiRT (Digital Research Tools)
¶ 36 Leave a comment on paragraph 36 0 A collaborative wiki designed to collect information about tools and resources that can help scholars conduct research more effectively or creatively. DiRT has been folded into Project Bamboo, a multi-institutional, interdisciplinary effort bringing together arts and humanities scholars to develop new methodologies through the sharing of computational tools and applications. The beta Bamboo DiRT site divides tools, services, and collections into various categories such as “Analyze Texts,” “Manage Bibliographic Information,” and “Visualize Data.” Each category lists applications, platforms, or resources with a short description and URL at which they can be found.
¶ 37 Leave a comment on paragraph 37 0 A semantic markup language and schema designed especially for use with texts concerned with computer hardware and software. Much like the TEI Guidelines, DocBook is a semantic language; that is, it encodes what content is rather than how it is formatted. DocBook and the TEI Guidelines are the two most widespread standards for the encoding of textual information in XML.
Documenting the American South (DocSouth)
¶ 38 Leave a comment on paragraph 38 0 A digital publishing initiative designed to provide online access to texts, images, and audio files related to southern history, literature, and culture. Sponsored by the library system of the University of North Carolina, Chapel Hill, DocSouth includes sixteen thematic collections of primary sources, among them “The Church and the Southern Black Community,” “North American Slave Narratives,” and “North Carolinians and the Great War.”
¶ 39 Leave a comment on paragraph 39 0 The term given to the collapse of the dot-com bubble in 2000–01. The dot-com bubble, a rapid rise in equity markets fueled by investments in Internet-based companies, was driven by market confidence that Web companies would turn future profits, individual speculation and investment in stocks, and widely available venture capital. The collapse of the bubble caused many Internet-based companies to fail and dramatically reduced the value of others.
¶ 40 Leave a comment on paragraph 40 0 A free and open-source content management system distributed under the GNU Public License. Drupal, WordPress, and Joomla! are the most common content management systems used to manage Web content.
D2K (Data to Knowledge)
¶ 41 Leave a comment on paragraph 41 0 A platform developed by the National Center for Supercomputing Application (NCSA) as a rapid, flexible, data-mining machine-learning system that integrates data-mining methods for prediction, discovery, and deviation detection with information-visualization tools.
¶ 42 Leave a comment on paragraph 42 0 A standard set of vocabulary terms used to describe a wide range of resources. This set of elements comprises a basic, standardized, shared system of metadata widely used by libraries, governments, international organizations, and businesses. See metadata.
¶ 43 Leave a comment on paragraph 43 0 A publishing tool designed to publish SGML and XML on the Web. Introduced in 1990, DynaWeb was among the first platforms able to present markup language in easily accessible, print-like form on the Web. Use of DynaWeb has steadily dropped since the corporation went out of business in 2002.
EEBO-TCP (Early English Books Online-Text Creation Partnership)
¶ 44 Leave a comment on paragraph 44 0 A partnership between EEBO and ProQuest to create standardized, accurate XML-encoded electronic editions of early print books. The EEBO corpus consists of the works represented in the English Short Title Catalogue, the Thomason Tracts, and the Early English Books Tract Supplement. EEBO-TCP seeks to provide accurate, publicly accessible full-text transcriptions of these early printed texts. EEBO also exists as a stand-alone project that provides access to the same texts through PDF images of microfilmed pages. Thus all EEBO texts are available as PDF images; those that have been transcribed through EEBO-TCP are available as both PDF images and full-text documents.
¶ 45 Leave a comment on paragraph 45 0 Much like EEBO, ECCO is a digital collection consisting of all significant English and foreign language titles published in the United Kingdom during the long eighteenth century (1660–1815). ECCO makes available over 200,000 volumes of content as PDF images; ECCO-TCP also makes available full-text transcriptions of over 2,000 texts, drawn from ECCO.
¶ 46 Leave a comment on paragraph 46 0 Born-digital, first-generation digital objects created on a computer and usually meant to be read on one; alternatively, literature that takes advantage of the capabilities and contexts provided by stand-alone or networked computing devices. This broad collection of work often leverages the capabilities of hypertext linking, interactivity, game play, and multimedia presented by executable code. See N. Katherine Hayles, Electronic Literature: What Is It?.
¶ 47 Leave a comment on paragraph 47 0 A collection of Microsoft Excel spreadsheets with macros designed for textual analysis. Available analyses include John F. Burrows’s Delta, Zeto, and Iota; Hugh Craig’s Zeta; parallel word list; and cluster analysis.
¶ 49 Leave a comment on paragraph 49 0 An interactive drama created by Michale Mateas and Andrew Stern. In the program, the user plays the longtime friend of Grace and Trip, a married couple in their early thirties. During an evening of drinks, a domestic conflict develops, and users interacting with them by typing to speak. Using incorporated language-processing software, the story develops based on user responses and actions. The narrative proceeds differently each time the game is played.
¶ 50 Leave a comment on paragraph 50 0 A social-networking service launched in 2004. Users, once registered, can create a personal profile, add other users as friends, exchange messages, join groups, and post and share images. Facebook is the most popular social-networking site in the English-speaking world.
Facebook markup language (FBML)
¶ 51 Leave a comment on paragraph 51 0 A deprecated markup language used to build custom applications within Facebook and to integrate Facebook materials and information into outside applications. FBML was a subset of HTML that also contains several Facebook-specific tags and elements. See the Facebook development page for Facebook markup language.
¶ 52 Leave a comment on paragraph 52 0 An interface designed to explore and visualize features in text collections. FeatureLens allows users to interpret the results of text-mining algorithms run on individual texts through a visual exploration of patterns occurring between texts in a collection.
¶ 53 Leave a comment on paragraph 53 0 A database system in which several databases appear to function as a single database to users. Federated databases allow users to access numerous individual databases through a central interface. NINES is an example of a federated database.
FedoraCommons (Fedora Extensible Digital Object Repository Architecture)
¶ 54 Leave a comment on paragraph 54 0 A modular, digital-assets-management architecture for storing, managing, and accessing digital objects. Not to be confused with the Linux operating system named Fedora, FedoraCommons provides an extremely flexible underlying architecture for the formation of digital repositories containing any type of digital content.
¶ 55 Leave a comment on paragraph 55 0 A Web site for sharing photos and videos created by Ludicorp in 2004. Widely used to host images embedded in blogs, Flickr holds more than 6 billion images. Flickr, much like Delicious and Digg, is a Web 2.0 application that uses folksonomic tagging to organize content for collection and discovery.
¶ 56 Leave a comment on paragraph 56 0 A user-defined system for classifying and organizing information, derived from the terms folk and taxonomy. A folksonomic system allows users to create and assign tags to information; these tags serve as user-derived metadata for digital objects and are usually searchable by users for purposes of discovery. See Delicious and Collex.
GIS (geographic information system)
¶ 57 Leave a comment on paragraph 57 0 Tools used to gather, transform, manipulate, analyze, and produce all types of geographic data. Data may be expressed as three-dimensional models, two-dimensional displays, tables, or lists. See Jo Guldi’s series on the spatial turn in the humanities.
¶ 58 Leave a comment on paragraph 58 0 The full text of books and other print materials are scanned by Google, converted to text using optical character recognition, stored in its database, and made available for searching. Materials in the public domain are available in full and for download; for materials in copyright, various access levels are available. Google Books currently contains over 20 million items.
¶ 59 Leave a comment on paragraph 59 0 A graphing tool, developed by Google to chart the yearly count of selected n-grams (letter combinations), words, or phrases as found in over 5.2 million books digitized by Google through 2008. Results are displayed as a normalized line chart, with only matches found in over 40 books indexed in the database.
¶ 60 Leave a comment on paragraph 60 0 Google’s free, Web-based office suite and data storage service. Now known as Google Drive, the service allows users to create, edit, and share documents, spreadsheets, and slide-based presentations. Google Drive incorporates Google Docs and also allows users to store, share, and sync any file on Google servers.
¶ 61 Leave a comment on paragraph 61 0 A virtual globe and map application that allows users to view satellite imagery, maps, terrain, 3-D buildings, and so on. Images and data are updated regularly. Since its release in 2005, Google Earth has been downloaded more than 1 billion times.
GRASS GIS (Geographic Resources Analysis Support System)
¶ 62 Leave a comment on paragraph 62 0 Free GIS software used for geographic data management and analysis, map production, spatial modeling, and visualization. Released under a GNU Public License and available for multiple operating systems, GRASS has been under continual development since 1982.
HTML (hypertext markup language)
¶ 63 Leave a comment on paragraph 63 0 An authoring language used to create documents on the World Wide Web. HTML defines the structure and layout of a document using a variety of tags and attributes. Web browsers read HTML documents and transform them into the Web pages users encounter online; HTML is not displayed directly but is used by a browser to interpret the content of a page. See Chuck Musciano and Bill Kennedy, HTML: The Definitive Guide (3rd ed.; Sebastopol: O’Reilly, 1998; print) 1–15.
HTTP (hypertext transfer protocol)
¶ 64 Leave a comment on paragraph 64 0 This protocol governs communications between a Web server and users running browsers to view Web pages. HTTP defines how communications between servers and browsers are formatted and transmitted, as well as what actions Web servers and browsers should take in response to various commands. HTTP is fundamental to data communications on the World Wide Web.
¶ 65 Leave a comment on paragraph 65 0 A user-friendly, Web-based program for text exploration and text analysis. HyperPo creates hypertextual links between different representations of the same text. Users can compile frequency lists of words, characters, or series of words; use color coding to reflect repeating patterns; and look at keywords in context, collocations, and distribution lists.
¶ 66 Leave a comment on paragraph 66 0 The New York Times visualization of United States presidential inaugural addresses since 1789. Users can browse through word clouds of each address, click on particular words to see them in context of their original address, or compare them across historical addresses. Full-text transcriptions of the addresses are also available.
¶ 67 Leave a comment on paragraph 67 0 A Java-based software application used for text analysis, developed by the University of Newcastle’s Centre for Literary and Linguistic Computing. Intelligent Archive facilitates the management of individual text in a local or virtual repository, supports user-created collections, and performs word-frequency analysis for chosen texts. Intelligent Archive was designed as an organizational tool for research, to be used before intensive computational analysis is undertaken.
¶ 68 Leave a comment on paragraph 68 0 A graphic representation designed to be manipulated by human users. In an interactive visualization, computer-generated graphic illustrations of information can change with user input. A very basic example is the movement of a mouse cursor on a computer screen that occurs when a user moves the physical mouse device.
¶ 69 Leave a comment on paragraph 69 0 A not-for-profit, open-access digital library. It contains over three million books that are in the public domain, as well as music, moving images, audio files, software, and archived Web pages. Digital material can be downloaded and uploaded by users. Internet Archive oversees one of the largest book-digitization projects in the world.
¶ 70 Leave a comment on paragraph 70 0 A publicly accessible electronic scholarly edition of selected poetry by the dadaist artist, performer, and poet Elsa von Freytag-Loringhoven. The edition consists of digital surrogates and transcriptions of several manuscript versions of twelve of her poems. The project relies on text encoded in XML using the TEI Guidelines, high-resolution digital images, and the Versioning Machine interface.
IRC (Internet relay chat)
¶ 71 Leave a comment on paragraph 71 0 A protocol for real-time text messaging using the Internet. IRC was the first online process whereby users were able to engage in real-time text exchange over the Internet.
¶ 72 Leave a comment on paragraph 72 0 A pedagogical environment for interpreting cultural materials, particularly texts. Within the collaborative play space, players take on alternative identities to expand and alter a discursive on-screen field, the documentary manifestation of a set of ideas. Ivanhoe is text-based, and players rewrite texts in a variety of ways and in interaction with one another. The game is not compatible with current browsers, but numerous scholarly articles have documented and theorized its play.
JGAAP (Java Graphical Authorship Attribution Program)
¶ 74 Leave a comment on paragraph 74 0 A Java-based, modular program for textual analysis, categorization, and authorship attribution. JGAAP allows individuals unfamiliar with machine learning and quantitative analysis to investigate stylometric questions. JGAAP is freely available and open-source. For more information, visit http://evllabs.com/jgaap/w/index.php/Main_Page.
JiTR (Just in Time Research)
¶ 75 Leave a comment on paragraph 75 0 A recombinant research environment for document management, large-scale linguistic research, and cultural analysis. Building on the Web 2.0 phenomena of “remixing” content, JiTR allows users to “mash up” digital content for research purposes. With JiTR, texts can be collected in a central location, organized, remixed, and processed.
JSP (Java server pages)
¶ 76 Leave a comment on paragraph 76 0 A server-side extension to Java. Java server pages dynamically generate Web pages based on HTML and XML to preserve design and display elements across new instantiations even as content changes. See The Java EE 5 Tutorial at http://docs.oracle.com/javaee/5/tutorial/doc/bnagx.html.
¶ 77 Leave a comment on paragraph 77 0 A project of the Johnson County Library in Kansas, this map contains annotations keyed to literary works mentioning a particular location. Kansas City Literary Map users can read short passages from a given work and check whether the work is available at the Johnson County Library.
KWIC (keyword in context)
¶ 78 Leave a comment on paragraph 78 0 A type of concordance output that sorts and aligns words within a textual sample alphabetically and in conjunction with surrounding text. Instead of isolating search terms in a list of individual words, KWIC allows users to see the results of a search within a limited context, providing a fuller meaning. KWIC is also the name of a concordance program (KWIC Concordance for Windows) designed to analyze texts and provide word frequency lists, concordance, and collocation tables (http://www.chs.nihon-u.ac.jp/eng_dpt/tukamoto/kwic_e.html).
LAMP (Linux, Apache HTTP Server, MySQL, and PHP)
¶ 80 Leave a comment on paragraph 80 0 A set of free, open-source software programs used to build a general-purpose Web server. Linux is the operating system, Apache is the Web server, MySQL is the relational database management system, and PHP (or Perl or Python) is the programming language.
¶ 81 Leave a comment on paragraph 81 0 In GIS systems using digital images, a level at which you can place an object or image file. Each layer may contain a certain type of visual information, such as roads, bodies of water, topography, satellite imagery, and so forth. These layers are combined to form an interactive composite image.
¶ 82 Leave a comment on paragraph 82 0 In language studies, the citational form of a set of words; a headword. Fly, flew, flying, and flies, for example, are forms of the same lexeme, with fly as the lemma.
Linguistic Atlas of New England
¶ 83 Leave a comment on paragraph 83 0 A project undertaken from 1931 to 1933 to document linguistic patterns in New England, edited by the linguist Hans Kurath. In the absence of electronic recording, responses to a 750-item questionnaire were recorded in phonetic transcription. These responses were then overlaid on a map of New England, providing a visualization of language patterns with two layers—the geographic map and language responses.
Linguistic Atlas of the Gulf States
¶ 84 Leave a comment on paragraph 84 0 A project undertaken from 1968 to 1983 to document linguistic patterns in Florida, Georgia, Tennessee, Alabama, Mississippi, Louisiana, Arkansas, and Texas. A total of 914 speakers were recorded to produce a layered visualization of dialect patterns displayed on geographic maps.
¶ 85 Leave a comment on paragraph 85 0 A social-networking site for professional networking. LinkedIn profiles summarize work history, education, and professional achievements. LinkedIn also allows users to develop “connections” with colleagues, clients, and partners.
LION (Literature Online)
¶ 86 Leave a comment on paragraph 86 0 A virtual library containing over 350,000 literary texts, full-text journals, author biographies, and other reference and critical sources related to the study of English-language literature. Launched in 1996 and owned by ProQuest/Chadwyck-Healey, LION is available by subscription.
¶ 87 Leave a comment on paragraph 87 0 A project of the Alabama Center for the Book highlighting the state’s literary heritage. An important component of the project is a map of the state divided into counties; clicking on each county results in a list of authors, with the name of each leading to the author’s biographical profile, central works, and connection to Alabama. Authors are selected by specialists in Alabama literature, with research on each undertaken by volunteers and project staff.
¶ 88 Leave a comment on paragraph 88 0 A project of the New York Times to map where fictional New Yorkers went about their activities. Short quotations, keyed to particular literary characters and locations, were submitted by readers of the New York Times Book Review section; those submissions were then pinned to their corresponding locations on a map of Manhattan. Read the project announcement and review.
¶ 89 Leave a comment on paragraph 89 0 A combination blog and social-networking site founded in 1999. Users are able to write entries for their personal journal, restrict visibility, upload multimedia, customize the appearance of their journal via HTML and CSS, “friend” other users, join communities based on common interests, and comment on the entries of other users. LiveJournal has over 1.8 million active users and was an early example of a Web 2.0 site.
¶ 90 Leave a comment on paragraph 90 0 A free and open-source information retrieval software library supported by the Apache Software Foundation. Lucene facilitates full-text indexing and search of any Web content but is primarily used for searching local, single-site Web applications such as Twitter. Lucene is file-format agnostic and works with PDFs, HTML, and word processors as long as their textual information is able to be extracted.
¶ 91 Leave a comment on paragraph 91 0 A way of programming computers that allows for the evolution of computational behavior based on empirical data or past experience. Machine learning focuses particularly on the ability of computers to learn to recognize complex patterns and make intelligent decisions based on those patterns, an ability that is especially valuable in computational textual analysis. See Ethem Alpaydin, Introduction to Machine Learning (Cambridge: MIT P, 2004; print).
¶ 92 Leave a comment on paragraph 92 0 A rich-prospect browsing interface that is designed to explore any XML document or collection of documents. Mandala allows users to construct visual queries drawing on underlying textual data. Within the interface, all the documents or document sections loaded into the system appear as dots around the periphery; users create colorful magnets with assigned values that then draw outside dots into the center space. A separate window provides text that matches any particular chosen dot.
¶ 93 Leave a comment on paragraph 93 0 An IBM-developed Web site where users can upload data, create interactive or static visualizations, and carry on discussions. The site is designed to facilitate not only individual discovery through data visualization but also to spur discussion and collaboration between individuals engaged in similar types of knowledge production. Many Eyes provides numerous types of visualizations, divided into categories, including scatter plots, network diagrams, bar charts, bubble charts, line graphs, word trees, tag clouds, and tree maps. Along with Voyant Tools, Many Eyes is one of the most useful Web-based visualization and analysis platforms publicly available.
¶ 94 Leave a comment on paragraph 94 0 A proprietary GIS software package for geospatial data analysis. MapInfo Professional supports GIS layering, Web publication, 3-D visualization, geographic database integration, and drag and drop of maps into other applications. Along with ArcGIS, MapInfo Professional is one of the most popular proprietary GIS systems.
¶ 95 Leave a comment on paragraph 95 0 A GIS project devoted to mapping the streets, sites, and significant boundaries of early modern London. The project uses a sixteenth-century map as its platform, onto which encyclopedia-style articles, scholarly commentary, editions, and excerpts are mapped.
MARC (machine-readable cataloging)
¶ 97 Leave a comment on paragraph 97 0 An international set of standards for the representation and communication of bibliographic information in machine-readable form. Developed by the Library of Congress in the 1960s, MARC standards constitute the foundation of most library cataloging systems in use today. For another set of standards, see Dublin Core.
¶ 98 Leave a comment on paragraph 98 0 A term coined by Vannevar Bush to refer to a mechanized device to store, access, and organize massive amounts of information. Bush formulated his idea of the memex in a 1945 article published in The Atlantic Monthly (“As We May Think”). The idea of the memex influenced the development of hypertext, personal computing, the Internet, the World Wide Web, and online knowledge collections such as Wikipedia.
¶ 99 Leave a comment on paragraph 99 0 A desktop and Web-based program for managing and sharing research papers, discovering research, and engaging in collaboration. Mendeley allows users to generate citations and bibliographic information from locally stored materials; read and annotate PDFs; import and organize documents and bibliographic information; share documents, annotations, notes, and bibliographies with colleagues; and back up local information to the Web.
¶ 100 Leave a comment on paragraph 100 0 Data describing other data. Metadata provide information about one or more aspects of data, such as type, date, creator, location, and so on. Most often encountered in library and archival contexts, metadata facilitate the organization, discovery, and use of a wide range of resources. For further information, consult the National Information Standards Organization’s publication Understanding Metadata.
¶ 101 Leave a comment on paragraph 101 0 In Willard McCarty’s formulation, a set of computational techniques shared among the disciplines of the humanities and related social sciences, including database design, text analysis, numerical analysis, imaging, music information retrieval, and communication. For an illustration of this methodological commons, as well as further analysis of the role humanities computing has to play in such a system, see Willard McCarty, “Humanities Computing” (Encyclopedia of Library and Information Science; New York: Dekker, 2003; print).
Microsoft Office Document Imaging (MODI)
¶ 102 Leave a comment on paragraph 102 0 For users running Microsoft Word on a Windows operating system, MODI provides the ability to scan paper documents and convert them to digital images in either TIFF or MODI format. As part of the process, MODI can perform OCR on these images to extract text from them. MODI is unavailable on Microsoft Word for Mac operating systems. Read more about MODI here: http://office.microsoft.com/en-us/help/about-microsoft-office-document-imaging-HP001077103.aspx.
¶ 103 Leave a comment on paragraph 103 0 Also known as the Universal Digital Library or the Universal Library. A book digitization project led by Carnegie Mellon University with several research partners in India and China. The project’s goals were to scan, enable full-text searching and indexing through OCR, and make available for distribution over one million books. In 2007, the project achieved its goal of digitizing over one million volumes; since that time, the initiative has been largely integrated into Google Books and Internet Archive, each of which hosts a portion of the project.
¶ 104 Leave a comment on paragraph 104 0 A proprietary and well-established statistical analysis program developed in the 1970s. Minitab allows for basic statistical calculations, as well as regression analyses, table and graph production, multivariate analyses, forecasting tools, and variation analysis.
MMOG (massively multiplayer online game)
¶ 105 Leave a comment on paragraph 105 0 A term that describes online role-playing games that usually feature a persistent and evolving virtual world and allow online cooperation and competition on a large scale. World of Warcraft is one of the largest and most popular MMOGs in the world.
MONK project (Metadata Offer New Knowledge)
¶ 107 Leave a comment on paragraph 107 0 An easy-to-use concordance program. In addition to providing full-text search capability for uploaded texts, MonoConc Pro enables textual analysis such as regular expression searches, tag searches, and the ability to compare corpuses based on chosen variables.
¶ 108 Leave a comment on paragraph 108 0 A set of analytic techniques used to visualize similarities or dissimilarities in data. Multidimensional scaling is increasingly used to represent nonspatial information in spatial terms, often within GIS applications.
¶ 109 Leave a comment on paragraph 109 0 A social-networking service founded in 2003. During the period 2005–08, Myspace was the most popular social-networking site in the world. It was surpassed in popularity by Facebook in 2008 and has seen a steady decline in users since then. Myspace has historically allowed users to upload multimedia content such as images, video, and music and is now best known for its role as a musical entertainment and connection hub.
¶ 110 Leave a comment on paragraph 110 0 An open-source relational database management system (RDMS). MySQL is the most widely used RDMS in the world. Many of the World Wide Web’s most heavily used Web sites and applications use MySQL.
¶ 111 Leave a comment on paragraph 111 0 A free program for visualizing social networks for those working in the Windows operating system. NetDraw accepts several kinds of data sets, such as UCInet, Pajek, and VNA text files. It also allows for the manipulation of nodal attributes, layout and appearance, and the exporting of images.
¶ 112 Leave a comment on paragraph 112 0 A broad term used to refer to the digital creation, distribution, and execution of content, as well as interactive user feedback and communities that form around such content. New media creation and criticism have often been identified with artistic production and the social democratization and justice movements. See The New Media Reader, edited by Noah Wardrip-Fruin and Nick Montfort (Cambridge: MIT P, 2003; print), 3–25.
¶ 113 Leave a comment on paragraph 113 0 In linguistics, a sequence of n items from a given sequence of text or speech. N-grams can be any combination of letters, phonemes, syllables, words, or letters. A bigram sequence of the phrase “to be or not to be,” for instance, would break down as follows: to be, be or, or not, not to, to be. N-grams are regularly used in natural language processing and speech recognition.
OCR (optical character recognition)
¶ 114 Leave a comment on paragraph 114 0 The use of computer technologies to convert scanned images of typewritten, printed, or handwritten text into machine-readable text. This conversion allows for the computerization of material texts into formats for digital storage, search, and display. Adobe Acrobat Professional supports OCR processes, as does Microsoft Office for Windows (see Microsoft Office Document Imaging).
OHCO (ordered hierarchy of content objects)
¶ 115 Leave a comment on paragraph 115 0 A phrase coined to answer the question, “What is text?” Texts are, in this view, composed of objects (e.g., chapters, paragraphs, sentences) organized hierarchically so that they “nest” within one another. These objects do not overlap, and they organize text into units based on meaning and communication. This concept is integral to TEI encoding with XML. See Steven J. DeRose, David G. Durand, and Allen H. Renear, “What Is Text, Really?” (Journal of Computing in Higher Education 1.2 : 3–26; print).
¶ 116 Leave a comment on paragraph 116 0 A free, open-source Web-publishing platform for the display of library, museum, archives, and scholarly collections and exhibitions. It is available as either a hosted application (omeka.net) or as a content management system (CMS) downloaded and installed on an outside server (omeka.org). Developed at the Center for History and New Media at George Mason University, Omeka is designed to help nonspecialists digitally present collections-based research. Omeka uses Dublin Core metadata standards to organize content.
¶ 117 Leave a comment on paragraph 117 0 A regularly updated index of electronic texts available on the Web. The Online Books Page does not host the texts themselves; instead, it provides links to sites where users can view and download entire texts.
¶ 118 Leave a comment on paragraph 118 0 A free and open-source office productivity suite providing programs for word processing, spreadsheets, presentations, graphics, and databases. Its interface is similar to that of the Microsoft Office suite and is largely interoperable with Word, Excel, PowerPoint, and Access. Currently released under the Apache 2.0 License, OpenOffice is one of the most popular alternatives to the Microsoft Office suite.
¶ 119 Leave a comment on paragraph 119 0 A highly developed electronic textbase for research on and discovery of women’s writing in the British Isles. Orlando contains more than 8 million words of text and seeks to produce a full scholarly history of women’s writing in the British Isles by integrating biographical entries, bibliographic listings, contextual historical material, among other materials. Collaboratively written and in a state of constant growth, Orlando is a powerful tool for navigating an impressive amount of information related to women’s writing in the British Isles.
¶ 120 Leave a comment on paragraph 120 0 In machine learning, a type of algorithm allowing machines to detect patterns in given input. In the digital humanities, pattern-recognition analytics often take the form of algorithms that facilitate the classification, clustering, regression, and sequence labeling of textual input. The application of pattern-recognition applications to text has proven to be useful in author studies and stylometrics. See cluster analysis and machine learning, as well as Christopher Bishop, Pattern Recognition and Machine Learning (New York: Springer, 2006; print).
¶ 121 Leave a comment on paragraph 121 0 A hosted wiki space founded in 2005. PBWorks allows the setup of a collaborative wiki site that may be public or private and is available at a basic level of functionality at no cost to the user.
¶ 122 Leave a comment on paragraph 122 0 An online interface for experimenting with principal component analysis (PCA) on Shakespearean dramatic texts. The tool provides a ready-made apparatus for computational-stylistic exploration. The texts can be analyzed as whole plays, sections of plays, or character parts. See principal component analysis.
¶ 123 Leave a comment on paragraph 123 0 A widely used, high-level, general-purpose programming language known for its flexibility and power. Perl is used in a variety of Web, text-based, and database applications. For further information, as well as education materials, see the official Perl Web site.
¶ 124 Leave a comment on paragraph 124 0 A large-scale digital library project focused on the preservation of and access to classical materials. Perseus contains over thirteen million words of Greek and Roman primary materials, as well as an extensive collection of reference works, translations, commentaries, and dictionaries. The current version of the Perseus project uses the PhiloLogic textual search tool developed by the ARTFL project for exploration and discovery of resources. See PhiloLogic and ARTFL.
¶ 125 Leave a comment on paragraph 125 0 A robust full-text search, retrieval, and analysis tool developed by the ARTFL project and the University of Chicago. PhiloLogic is free and supports a wide variety of textual content. The application treats a textbase as a set of coordinated or related database modules, a system that makes PhiloLogic fast and resilient. PhiloLogic is used by Perseus, the ARTFL databases, the Chicago Online Encyclopedia of Mamluk Studies, and several other projects.
¶ 126 Leave a comment on paragraph 126 0 A Java-based, zoomable image browser that allows users to view multiple directories of images in a dynamic environment. Images can be sorted by available metadata, while their display is governed by several algorithms designed to maximize use of available space in a 2-D grid. See Benjamin B. Bederson, “PhotoMesa: A Zoomable Image Browser Using Quantum Treemaps and Bubblemaps” (Proceedings of the Fourteenth Annual ACM Symposium on User Interface Software and Technology; New York: Assn. for Computing Machinery, 2001; 71–80; print).
PHP (PHP: hypertext preprocessor)
¶ 128 Leave a comment on paragraph 128 0 An IRC bot designed to infer and visualize the social network of a particular set of IRC channels. This network visualization shows the relative strength of users’ connections to one another. Although designed to monitor IRC channels, PieSpy has been adapted to visualize network relationships in several literary works, notably the plays of Shakespeare.
Plain Vanilla ASCII
¶ 129 Leave a comment on paragraph 129 0 A phrase used by Project Gutenberg to describe their philosophy of preserving texts in the simplest, easiest-to-use form available. In practice, this means that Project Gutenberg uses a basic form of the American Standard Code for Information Interchange (ASCII) to preserve and disseminate texts. Nearly all software programs and applications are able to interpret and display ASCII characters, ensuring the longevity and usability of Project Gutenberg texts. See Michael Hart, “The History and Philosophy of Project Gutenberg” (Project Gutenberg; Project Gutenberg, 1992; Web).
point pattern analysis
¶ 130 Leave a comment on paragraph 130 0 A set of analytic techniques used to study the spatial arrangement of points in space within a defined area. Point pattern analysis can indicate whether a set of data is clustered, regular, or random in a given space. Point pattern analysis is often used in GIS to detect geographic patterns. See CSR.
¶ 131 Leave a comment on paragraph 131 0 A set of nine principles outlined in 1987 at a conference in Poughkeepsie, New York, sponsored by the Association for Computers and the Humanities, the Association for Computational Linguistics, and the Association for Literary and Linguistic Computing. This declaration of principles is a founding document of the Text Encoding Initiative (TEI).
principal components analysis
¶ 132 Leave a comment on paragraph 132 0 An analytic technique designed to identify patterns in data and express that data in a way that highlights similarity and difference within the data. It is based on the principle of reducing the differences inherent in a set of interrelated variables while retaining as much variation as possible. See PCA Online and I. T. Jolliffe, Principle Component Analysis (New York: Springer, 2002).
¶ 133 Leave a comment on paragraph 133 0 A volunteer-based project founded in 1971 to digitize and archive literary texts. Digitized texts are freely available for download in a variety of formats. Many of these items are full-text transcriptions of books in the public domain. The project is the oldest single collection of free electronic texts.
¶ 134 Leave a comment on paragraph 134 0 A general-purpose programming language emphasizing readability and easy debugging. Python has found wide use in a variety of Web applications. For further information, as well as tutorials designed for different levels of knowledge, see the official Python Web site.
¶ 135 Leave a comment on paragraph 135 0 A graphic stored as a bitmap. Bitmapped images are representations in which each item corresponds to one or more bits of information. When referring to graphics, these representations take the form of rows and columns of dots (pixels); the value of each dot in the matrix is stored as a bit of data. The most basic value of each dot is 1 or 0, creating a black-and-white image in which each pixel corresponds to either value. Most notably, raster graphics are images that may be stored in various image formats and that can only be processed simultaneously, as opposed to vector graphics, which consist of objects that can be managed individually within a display. Raster graphics are also difficult to scale and become pixilated when shrunken or enlarged. See vector graphics.
RoSE (Research-Oriented Social Environment)
¶ 136 Leave a comment on paragraph 136 0 A Web-based knowledge exploration system for tracking and integrating humanities bibliographic resources with social-network technologies. RoSE users can track their movements through networks of relationships between documents, ideas, and individuals to discover new resources. Registered users are able to create personal profiles, document profiles, and collections spanning both.
¶ 137 Leave a comment on paragraph 137 0 A digital archive of the entire artistic output of the pre-Raphaelite poet and painter Dante Gabriel Rossetti. Texts have been transcribed and encoded for search and analysis, and most of them are accompanied by high-quality digital images. The archive also contains a substantial body of critical commentary, notes, and glosses. The Rossetti Archive is one of the oldest and most established digital humanities archives currently available on the Web and has influenced the development of numerous other projects.
SAS (Statistical Analysis System)
¶ 138 Leave a comment on paragraph 138 0 Originally, a software set that allowed users to store, manage, apply, and analyze data. Currently, SAS is one of a widely used set of multimodal programs used to undertake statistical analyses.
¶ 139 Leave a comment on paragraph 139 0 A component of the Shakespeare Suite CD published by the Internet Shakespeare Editions site at the University of Victoria. Scenario enables users to explore scene blocking. Users select a short passage of dramatic text, select characters and props from a set menu, and drag them to their appropriate location onstage. In this way, users create frames of blocking that they can then move through temporally. See SET.
SEASR (Software Environment for the Advancement of Scholarly Research)
¶ 140 Leave a comment on paragraph 140 0 A digital research environment for the creation and sharing of humanities research tools. SEASR consists of a series of text-analysis and organization modules that can be used individually or in sequence to work with texts. Researchers can create tools or workflows from various modules for use by the SEASR community.
¶ 141 Leave a comment on paragraph 141 0 An online virtual world launched in 2003. Second Life users are able to explore their virtual environment freely and interact with other users through avatars. Second Life has developed an internal economy and currency, and there are numerous examples of organizations creating virtual spaces affiliated with or mirroring their real-world instantiations.
¶ 142 Leave a comment on paragraph 142 0 A combination of hardware and software that carries out a specialized service for other programs connected to it through a network. There are a wide variety of servers, including Web servers, which receive requests from browsers for Web pages; database servers, which respond to requests for data corresponding to a search query; and FTP (file transfer protocol) servers, which enable users to employ FTP software to upload and retrieve files. Server can refer to either hardware, software, or the combination of the two.
SET (Simulated Environment for Theatre)
¶ 143 Leave a comment on paragraph 143 0 A 3-D environment for reading, exploring, and directing plays. SET grew out of the project Watch the Script. SET allows for three perspectives on a play: a small column denoting the length of the section under consideration, a reading pane, and a 3-D view of the stage as action unfolds.
SGML (standardized general markup language)
¶ 144 Leave a comment on paragraph 144 0 A markup language designed to format, store, and access large corpora of documents. The language is declarative, meaning that it describes source documents instead of specifying the particulars of their future display. These descriptive tags can then be processed in a variety of ways. SGML is the parent language of HTML, XHTML, and XML.
Simile (Semantic Interoperability of Metadata and Information in Unlike Environments)
¶ 145 Leave a comment on paragraph 145 0 A project funded by the Andrew W. Mellon Foundation, focused on developing and disseminating tools to enhance interoperability among digital assets, schemata and metadata, and services across several research communities. Simile focused heavily on developing resource description framework (RDF) and semantic Web techniques. Visit the updated site and repository.
¶ 146 Leave a comment on paragraph 146 0 A popular, interactive, virtual-world simulation game. Players direct individual virtual persons’ (or “Sims’”) daily activities in a virtual suburban setting. The game also contains an artificial-intelligence engine that provides avatars with a certain amount of free will.
SNAC project (Social Networks and Archival Context project)
¶ 147 Leave a comment on paragraph 147 0 A project addressing the ongoing challenge of improving access to and discovery of primary historical sources and individuals through aggregating and interconnecting them. The project disentangles description of historical individuals from archival records describing materials, using the Encoded Archival Context–Corporate Bodies, Persons, and Families (EAC-CPF) standard, to regularize the description of individuals as stand-alone pieces of information that can then be associated with a variety of content.
¶ 148 Leave a comment on paragraph 148 0 A social-network-analysis tool that integrates visualization and statistics. SocialAction users manipulate filters and parameters to explore networks through a real-time network visualization.
SOM (self-organizing map)
¶ 149 Leave a comment on paragraph 149 0 A technique of data visualization relying on the training of an artificial neural network to reduce the dimensions of a data set. SOMs are first trained using input examples and then use those examples to reformulate the visualization. See Teuvo Kohonen, Self-Organizing Maps (3rd ed.; New York: Springer, 2001; print).
¶ 150 Leave a comment on paragraph 150 0 The instructions for a program in their original form. These instructions are written in a particular programming language, usually in the form of text. This source code is compiled into machine code that can then be executed by a computer. Most applications are distributed as executable files, not as source code. Source code is also the only format of computer code that human beings can read.
¶ 151 Leave a comment on paragraph 151 0 A measure of the degree to which a set of spatial features and their associated data values tend to be clustered together in space or dispersed. This is a measure of the dependency among observations in a given geographic space. Values clustered together in space exhibit positive spatial autocorrelation, while those that are dispersed exhibit negative spatial autocorrelation. See Daniel A. Griffith, Spatial Autocorrelation: A Primer (Washington: Assn. of Amer. Geographers, 1987; print).
¶ 152 Leave a comment on paragraph 152 0 A branch of statistics and geography dealing with the analysis of spatial distributions, patterns, processes, and relationships. Most techniques used in spatial statistics were developed particularly for use with geographic data; as such, they incorporate space directly into their mathematics. For more information, see http://blogs.esri.com/esri/arcgis/2010/07/13/spatial-statistics-resources/.
SpecLab (Speculative Computing Laboratory)
¶ 153 Leave a comment on paragraph 153 0 A digital humanities laboratory founded at the University of Virginia in 2000. Focused on “speculative computing” rather than the digitization and classification of existing texts, SpecLab focused on exploratory research that used humanities tools in a digital context rather than digital tools in humanities contexts. SpecLab incubated several digital projects which have outlasted its three-year existence, including NINES, the Rossetti Archive, Ivanhoe, and Temporal Modelling. See Johanna Drucker, SpecLab: Digital Aesthetics and Projects in Speculative Computing (Chicago: U of Chicago P, 2009; print).
¶ 154 Leave a comment on paragraph 154 0 An application designed to visualize time. SpiraClock uses a spiral form and colored sectors to show events as they approach. It is meant to be read clockwise and from outside to inside.
¶ 155 Leave a comment on paragraph 155 0 A proprietary software platform for managing and displaying data. Spotfire facilitates the analysis and exploration of data with interactive visualizations, the production of graphic expressions of statistical data, and the exportation of information in various common formats.
¶ 156 Leave a comment on paragraph 156 0 A tool for visualizing dramatic structure; in particular, StageGraph facilitates explorations of location and movement. See Stephen Ramsay, “In Praise of Pattern” (Faculty Publications—Department of English; U of Nebraska, Lincoln, 2005; paper 57; Web).
¶ 157 Leave a comment on paragraph 157 0 An information visualization, developed by the New York Times, illustrating the frequency of various words in the State of the Union addresses of President George W. Bush. The visualization displays key terms as relatively sized bubbles, as well as the location of an individual word within the bodies of the speeches.
¶ 158 Leave a comment on paragraph 158 0 A visualization of information that contains no interactive elements. Static visualizations such as print graphics are often contrasted with digital, interactive visualizations that change according to user input. Conventional pie charts, bar graphs, and scatter plots are examples of this type of information visualization.
TAPoR (Text Analysis Portal for Research)
¶ 159 Leave a comment on paragraph 159 0 A project designed to develop a network of human and computing infrastructure by establishing regional centers to develop electronic textual storage and analysis. Since its inception, TAPoR has evolved into a centralized portal for Web-based textual analysis tools such as Wordle, the Voyant suite of tools, and the TAPoRWare suite of tools.
¶ 160 Leave a comment on paragraph 160 0 A collection of online and desktop tools designed to assist users in performing computational textual analysis on XML, HTML, and plain text files. Users are able to develop concordances, tokenize texts, analyze collocates, and extract metadata using the TAPoRWare set of tools. Although the tool set is available through the original TAPoRWare site, it is best accessed through the maintained TAPoR site.
¶ 161 Leave a comment on paragraph 161 0 A broad term used to define a subfield of geography dealing with the use of technical skills and methods to undertake geographic analysis. As currently constructed, technical geography emphasizes the use of digital tools such as remote sensing, GPS, and computational statistical analysis of geographic data.
TEI (Text Encoding Initiative)
¶ 162 Leave a comment on paragraph 162 0 A consortium that collectively develops and maintains standards for the representation of texts in digital form. In practice, the organization is chiefly concerned with producing and maintaining the TEI Guidelines for encoding texts in the humanities, social sciences, and linguistics. The TEI Guidelines, unlike other formats for preserving text, are a primarily semantic system; textual units are encoded according to what they are rather than how they appear.
¶ 163 Leave a comment on paragraph 163 0 A textual visualization application designed to show the distribution of words in texts. TextArc represents the entire text as two concentric spirals. Each line of the text is displayed in very small font around the outside; each word is displayed inside that spiral in a more readable size. Every word appearing more than once also appears within these two circles, with its position governed by its frequency.
¶ 164 Leave a comment on paragraph 164 0 Broadly considered, the process of putting text in a special format for preservation or dissemination. In the digital humanities, textual encoding nearly always refers to the practice of transforming plain text content into XML. The TEI Guidelines are often followed when encoding textual materials in the arts, humanities, and social sciences. See TEI.
¶ 165 Leave a comment on paragraph 165 0 The process of automatically deriving previously unknown information from written texts using computational techniques. Textual-mining tools facilitate researchers’ discovery of patterns within structured data.
thematic research collection
¶ 166 Leave a comment on paragraph 166 0 A term coined by John Unsworth in 2000 to describe a new genre of scholarly production centered on the collecting of digital scholarly resources. According to Unsworth, TRCs exhibit eight qualities: necessarily electronic; constituted of heterogeneous datatypes (multimedia); extensive but thematically coherent; structured but open-ended; designed to support research; authored (usually by many authors); interdisciplinary; collections of digital primary resources (see http://people.lis.illinois.edu/~unsworth/MLA.00/).
¶ 167 Leave a comment on paragraph 167 0 A visualization application designed to help users identify time-related patterns, trends, and relationships across large collections of documents. Parameters widen or narrow as the visualization is read from left to right (corresponding to time passing), and individual parameters are represented as colored swathes of the entire visualization flow.
¶ 168 Leave a comment on paragraph 168 0 A participatory manuscript-transcription project based at University College London. Through the Transcribe Bentham interface, volunteers can transcribe the original and unstudied papers of the philosopher and reformer Jeremy Bentham. The project makes available high-quality digital images of manuscripts, which are then used to produce the transcriptions. These transcriptions are in turn encoded with basic TEI markup by volunteers. Transcribe Bentham is a well-regarded experiment in crowd-sourced academic production.
¶ 169 Leave a comment on paragraph 169 0 An online social networking and microblogging service launched in 2006. Users are able to send and read text-based posts (“tweets”) of up to 140 characters. The Twitter Web site is one of the most popular sites in the world, with hundreds of millions of tweets generated daily.
¶ 170 Leave a comment on paragraph 170 0 A company providing software services to track, measure, and understand online social and organizational networks. Uberlink sells the VOSON software package, a Web-based application to collect, analyze, and visualize online network data.
¶ 171 Leave a comment on paragraph 171 0 A digital archive that develops, collects, catalogs, and preserves electronic literary and linguistic resources. Founded in 1976 by Oxford University Computing Services, it is thought to be the oldest archive of digital academic textual resources. Access to the OTA is free, as is the downloading of all resources, although some require permission to be downloaded, requested either from OTA or the original depositors.
¶ 172 Leave a comment on paragraph 172 0 A graphic stored as a series of mathematical instructions that are then used to form an image. Since vector graphics are stored as mathematical formulas, their file sizes are smaller than bitmap image files. Because they are mathematically created objects, users can resize and stretch vector graphics without reducing their clarity. See raster graphics.
¶ 173 Leave a comment on paragraph 173 0 A framework and interface for displaying multiple versions of the same text that have been encoded according to the TEI Guidelines. The display environment in some ways mirrors the features of print volumes, including annotation, notes, and introductory materials. However, the Versioning Machine environment also allows for the easy comparison and manipulation of multiple versions and transcriptions. The application can be used locally or installed on a server for public access on the Web.
¶ 174 Leave a comment on paragraph 174 0 A well-established archive designed to facilitate research and discovery of information related to the Victorian period in England. The Victorian Web offers links to primary sources, peer-reviewed critical commentary, biographical information, and historical contexts for a variety of literary, historical, and artistic topics. Anyone is free to contribute, and the project currently contains over sixty thousand documents and images.
¶ 175 Leave a comment on paragraph 175 0 Designed by Microsoft, a programming language and environment based on BASIC. Visual Basic was one of the first products to provide a graphic environment for developing user interfaces simply by dragging and dropping controls (i.e., buttons or dialogue boxes) and then defining their behavior.
¶ 176 Leave a comment on paragraph 176 0 Broadly conceived, any graphic expression meant to represent a certain set of information. In the digital humanities, visualization usually refers to data visualization, or the graphic expression of large-scale collections of nonnumerical information such as textual elements, network relationships, or frequency analyses. See Martyn Jessop, “Data Visualization as Scholarly Activity” (Literary and Linguistic Computing 23.3 : 281–93; print).
VMP (Vocabulary Management Profile)
¶ 177 Leave a comment on paragraph 177 0 A Web-based textual analysis and visualization tool. Using a computational algorithm developed for the project, VMP generates a visualization of how new vocabulary ebbs and flows as a text progresses. This type of discourse analysis allows insight into the structures of organized textual output. See Gilbert Youmans, “A New Tool for Discourse Analysis: The Vocabulary-Management Profile” (Linguistic Society of America 67.4 : 763–89; print).
Voice of the Shuttle (VoS)
¶ 178 Leave a comment on paragraph 178 0 A Web resource founded by Alan Liu in 1994 as a suite of static Web sites that has grown into a large digital database of humanities and humanities-related content. VoS organizes content into several areas, including religious studies, media studies, dance, literature, and architecture. VoS still serves as a well-regarded directory of Web content tailored for humanities scholars.
¶ 179 Leave a comment on paragraph 179 0 A Web-based suite of textual-analysis tools, intended to be user-friendly, flexible, and powerful. It contains numerous modules able to analyze and visualize text in a variety of ways, including a document reader, a term-frequencies generator, a collocation visualizer, a word cloud visualization, and a scatterplot generator. Users can upload plain text into Voyant or cut and paste text into Voyant’s on-screen input field. Results are exportable, as are some visualizations. For a guide to Voyant, visit http://hermeneuti.ca/voyeur.
VRA Core (Visual Resource Association Core)
¶ 180 Leave a comment on paragraph 180 0 A data standard for the description of works of visual culture and the images that document them. VRA Core is an internationally recognized standard for generating metadata for visual objects.
¶ 181 Leave a comment on paragraph 181 0 An interactive visualization environment designed to facilitate reading, exploring, and directing plays. The environment integrates the text of the script along with a theatrical model of actor location on a stage. The application is especially concerned with visualizing blocking.
¶ 182 Leave a comment on paragraph 182 0 A loosely defined term used to describe second-generation Web sites that facilitate participatory collaboration, interoperability, and information sharing. Web 2.0 highlights user-generated content and dynamic applications built on the Web rather than static contents being presented to users. This transition is largely cultural rather than technical, as reflected in the centrality of virtual communities, social media, and remix culture to the phenomenon. See Tim O’Reilly, What Is Web 2.0.
¶ 183 Leave a comment on paragraph 183 0 A Web site whose content can be added to, modified, and deleted by users employing a simplified markup language or text editor within a Web browser. Wikis have become increasingly prevalent on many levels, ranging from small private wikis to collaborative wikis to large collections of wikis such as Wikipedia. Wikis often feature a discussion page where changes can be debated.
¶ 184 Leave a comment on paragraph 184 0 A Web archive devoted to the life and works of the English author Wilkie Collins. The site contains nearly all of Collins’s novels, short stories, and plays, as well as photographs, personal letters, and geographic information.
¶ 185 Leave a comment on paragraph 185 0 An online open-access archive of the literary work of William Blake. Founded in 1996, the archive contains digitized images of Blake’s work, as well as full-text electronic editions of many of his illuminated works, commercial books, drawings and paintings, and manuscripts. Encoded in XML, the site is a hybrid catalog, database, and series of editions.
¶ 186 Leave a comment on paragraph 186 0 A visualization of word frequencies. Usually, the more frequently a word appears in a given text, the larger its size in the resulting visualization. Programs designed to create word clouds are easily accessible; two of the most used are Wordle and the Many Eyes tag cloud.
¶ 187 Leave a comment on paragraph 187 0 A text-analysis environment containing several categories of preloaded texts, including those of Chaucer, Spenser, the early Greeks, and Shakespeare. For this chosen group of canonical texts, users can perform a variety of analyses, including full-text searching, concordance building, and finding collocates.
¶ 188 Leave a comment on paragraph 188 0 A simple text-visualization tool that produces a word cloud, where the size of individual words corresponds to frequency of appearance in a given corpus. The font, layout, and color scheme of the resulting display can be altered by a user. Wordle is also accessible through TAPoR.
¶ 189 Leave a comment on paragraph 189 0 A free and open-source blogging tool and CMS (content management system) based on PHP and MySQL. WordPress refers to both the CMS software used to manage materials on Web servers and to the blogging service available at wordpress.com, a free blogging platform.
¶ 190 Leave a comment on paragraph 190 0 A proprietary software application for analyzing patterns in texts. WordSmith includes three modules: Concord (a concordancer program), KeyWords (an application that identifies the keywords in one or more text by analyzing frequency), and WordList (a generator of word lists that are alphabetized or organized by frequency). WordSmith also supports the ability to compare these qualities across multiple texts.
¶ 191 Leave a comment on paragraph 191 0 A graphic representation of a KWIC method of analyzing text. This representation allows for the rapid and interactive exploration of a body of text. A word-tree visualization places a tree-like structure onto a corpus to reflect the frequency of particular terms occurring in particular sequences, allowing users to interact spatially with analytic results that would otherwise be given in table form. See Martin Wattenberg and Fernanda B. Viegas, “The Word Tree, an Interactive Visual Concordance” (IEEE Transactions on Visualization and Computer Graphics 14.6 : 1221–28; print). The word-tree visualization is also available on Many Eyes.
XML (extensible markup language)
¶ 192 Leave a comment on paragraph 192 0 A markup language designed to encode documents in a format that is both human and machine-readable. XML separates content from structure and is highly customizable. For further information and to learn how to use XML, see Benoît Marchal, XML by Example (Indianapolis: Que, 2000; print).
XSL (extensible style sheet language)
¶ 193 Leave a comment on paragraph 193 0 A family of languages used to transform and render XML documents. Extensible style sheet language transformations (XSLT) is an XML language that transforms an XML document into another format; extensible style sheet language formatting objects (XSL-FO) specifies the visual formatting of an XML document.
XSLT (extensible style sheet langauge transformations)
¶ 194 Leave a comment on paragraph 194 0 An XML-based language used to transform XML documents into another format or structure, usually other XML documents or HTML documents, PDF documents, or word processor files.
¶ 195 Leave a comment on paragraph 195 0 The world’s largest video-sharing Web site, created in 2005. YouTube is the world’s third most visited Web site and uses Adobe Flash and HTML5 to display a wide variety of user-generated content.
¶ 196 Leave a comment on paragraph 196 0 A free and open-source application designed to manage bibliographic references and materials. Developed by the Center for History and New Media at George Mason University, Zotero has numerous features designed to facilitate integration with online research environments, including integration with major Web browsers to automatically detect bibliographic information and import it on command; online syncing; exporting formatted reference lists into major word-processing programs; and sharing collections and items with other registered users. It is available as a browser plug-in (Zotero for Firefox) and as a stand-alone product that is able to interface with several browsers (Zotero Standalone).