Semantic Web Technologies in Cultural History:
The State of the Art

Kimberley Wilcox, MLIS
First published June 15, 2005; updated June 1, 2006
IS 277: Information Retrieval Systems: User-Centered Design
taught by Dr. Phil Agre
UCLA Graduate School of Education and Information Studies

Note:
This paper contains extensive hyperlinks to relevant resources. The links were all functional as of June 15, 2005, although no guarantees are made about their persistence as their creators are not associated with this project in any way.

The author welcomes feedback on this document and may be contacted by email at kwilcox@apu.edu.

Outline of Contents

Introduction

As envisioned by Tim Berners-Lee, the semantic web represents the next iteration of the World Wide Web, in which networked computers will be able to extract “meaning” (or, more precisely, meaningful data) from each other instead of crude strings of code. Essentially, the semantic web will serve as a medium by which computers can more effectively locate, organize, share, and display information, “with the result of making the Web more useful for humans.” (The Semantic Web: A Primer) This paper will examine the semantic web’s applications and future potential in the domain of cultural history and, when necessary, of the humanities in general.1 After a brief overview of the semantic web and some of its related technologies, this paper will examine the state of ontology development and assess the potential usefulness of semantic web technologies in cultural history. An analysis of the issues associated with the development of cultural history ontologies will show that full ontology development in this domain will be difficult to achieve, despite the potential value of completing such work. After discussing the need for ontology development, this paper will go on to evaluate several current and recent semantic web projects in the humanities, and will, finally, offer some predictions about the future of the semantic web in the domain of cultural history.

An overview of the semantic web

As stated above, the semantic web is intended to provide networked computers with mechanisms for finding and extracting inferences from what would otherwise be meaningless strings of data. Despite the fact that it has interesting implications for the development of artificial intelligence, Berners-Lee makes it quite clear that the semantic web is not AI. Semantic web languages will give computers neither true intelligence nor the ability to interpret meaning or significance. To be useful, semantic web technologies will require the explicit enumeration of data elements, properties, and relationships, since computers can only make inferences about this data within specifically defined limits. To a computer, for example, the sentence, “A dog is a type of mammal” is nothing more than a string of characters and spaces; no logical inferences about animals or biological classifications can be drawn from such a string. Using a semantic language such as RDF (Resource Description Framework, which shall be discussed shortly), however, will make the content of this string machine-processable by establishing hierarchies and relationships. The responsibility for assigning meaningful metadata and intended inferences will still fall to human beings; the semantic web simply provides a standardized framework to enable the seamless integration and exchange of this human-vetted information. This semantic framework and its associated components and languages have only recently begun to fulfill their potential usefulness, and it is imperative that humanists grasp the significance of semantic web technologies at this early point in their development.

Although the semantic web is still in its formative stages, programmers have created a large number of tools and standardized languages designed to demonstrate the potential of semantic technologies. This somewhat bewildering alphabet soup of standards includes, among other things, XML, XML Schema, XSL and XSLT, XHTML, RDF, RDF Schema, RSS, DAML+OIL, and OWL; each of these standards is designed to articulate and expand some aspect of the semantic web. In a nutshell, XML (eXtensible Markup Language) and RDF (Resource Description Framework) provide the grammar for these standards; the schema languages prescribe rules for these grammars; XHTML expresses the wildly popular HTML in XML format; XSL and XSLT provide for the visual (human-interfaced) representation of XML documents; RSS allows for the simple syndication and sharing of web content using semantic techniques; and DAML+OIL and OWL represent attempts to explicitly define classes, properties, and relationships of elements to be used in ontology development (which shall be discussed shortly).

Predictably, each standard has also had its own level of success. DAML+OIL, for example, has been superseded by OWL, while XML, XHTML, and RSS in particular have proven to be quite popular and widely adaptable to a number of different circumstances. Although applications of semantic web technologies in the humanities generally and in cultural history specifically will be discussed later, some observations can be made about the current applications of each of these technologies. RSS has been the most successful semantic web technology, largely because of its role in the spread of blogs and news feeds. Because RSS is based on XML, the eXtensible Markup Language has also met with widespread practical success as web developers realize the benefits of a customizable, self-describing markup framework. XML has gained particular power through its combination with the eXtensible Stylesheet Language, which translates XML into human-friendly visual web documents. XHTML, meanwhile, is predicted to eventually replace HTML as the World Wide Web becomes increasingly semantic.

Perhaps the most interesting and important of these emerging semantic web technologies is OWL, the Web Ontology Language. OWL uses RDF as the basis for establishing the properties and relationships of elements, which can then be used to represent both generalized and domain-specific ontologies. These ontologies, which can broadly be defined as theories of being, form the entire conceptual framework of the semantic web. In the context of the semantic web and its emphasis on the interpretation of meaningful data, ontologies serve as the means of enumerating all of the possible relationships between elements which might be used to organize and classify data. In simple terms, semantic web ontologies explicitly define the structure and hierarchy of a particular domain; when two systems use identical or overlapping ontologies, they are able to recognize connections between data elements despite the fact that computers are not yet capable of recognizing the true meaning of these connections. Recall our earlier example about dogs and mammals: using RDF to define classes and relationships allows the computer to make some inferences about the classification of animals. Consider the following RDF code:

<rdfs:Class rdf:ID=“Animal”>
<rdfs:comment>Animals form a class.</rdfs:comment>
</rdfs:Class>

<rdfs:Class rdf:ID=“Mammal”>
<rdfs:comment>Mammals are a type of animal.</rdfs:comment>
<rdfs:subClassOf rdf:resource=“#Animal”/>
</rdfs:Class>

<rdfs:Class rdf:ID=“Canine”>
<rdfs:comment>Canines are a type of mammal.</rdfs:comment>
<rdfs:subClassOf rdf:resource=“#Mammal”/>
</rdfs:Class>

This code instructs the computer to infer that members of the canine class represent a type of mammal, while mammals in turn are a type of animal. Using this markup, elements identified as belonging to the “Canine” class (within a document that references this RDF code, of course), can then be interpreted and grouped with or disambiguated from other classes of animals.

The need to enumerate all of these potential relationships means, of course, that ontology designers must consider all possible aspects, dimensions, and details of a particular domain before completing a fully-formed ontology. Such planning and organization is far easier in domains which are already fairly ontological and whose practitioners agree upon standardized vocabularies and classifications; this need for agreement and pre-existing structure explains why semantic web ontologies are far more developed in scientific and technical fields than in the humanities and social sciences. Quite simply, the humanities and social sciences have proven too messy and non-standardized for substantial semantic web ontology development to occur.

So are the humanities and social sciences doomed to fall far behind more “structured” domains in the semantic web revolution? What must be done to promote ontology development in these “loose” disciplines? The following section will attempt to identify the steps needed to promote the growth of semantic web technologies in the humanities, using cultural history as a salient example of the complexity and scope of such an endeavor.

The state of ontology development in cultural history

In spite of the growth of a field called humanities computing, the application of semantic web technologies in the domain of history—and especially cultural history—has been limited at best. The semantic web has begun to have a significant impact in a number of related domains, including archives, museums, and digital libraries, as professionals in those fields are beginning to recognize the scholarly potential of semantic web technologies. TEI (the Text Encoding Initiative) is by far the most well-known application of semantic principles in the domain of cultural heritage institutions, although it is not technically a semantic web technology since it is built on SGML (Standard Generalized Markup Language, the precursor to both HTML and XML). Use of the TEI specification to digitize manuscript materials has familiarized librarians, archivists, and museum curators with the application of specialized markup languages, which has facilitated their acceptance of more truly semantic technologies like XHTML and RDF (see, for example, an RDF-formatted archival finding aid created by King’s College London and the RDF Schema for archival description created by the Research Support Libraries Programme).

Like many academic disciplines, history tends to intersect with other domains and knowledge systems. This intersection is particularly apparent in historical specialization like history of science, political history, military history, women's history, and cultural history, each of which shares knowledge with domains like science, government, or anthropology. While these overlapping ideas and domains encourage interdisciplinary study and represent the highest ideal of scholarship, they can make the development of workable ontologies for history a nearly impossible task. A surface-level analysis of the necessary ontologies for full semantic web integration makes the process seem deceptively simple: at first glance, all that would be needed to create an ontology for history is the enumeration of names, dates, and events which would be used to mark up research papers and would the allow computers to identify topically-related resources. Theoretically, this information could be gathered from scholars working in multiple sub-disciplines of history, and any necessary specialized subject knowledge could be borrowed from existing ontologies in other domains.

A relatively simple system like this, consistently applied to current and past books, articles, and papers would no doubt be of great assistance to researchers seeking materials related to their field of study. But such a system barely scratches the surface of what the semantic web could do for the professional historians, nor does it reflect the extensive ambitions of the semantic web's creators. Semantic web development within a discipline requires consideration of “four kinds of ontologies: document ontologies (e.g., the chapters of a book or the footnotes of a paper), metadata ontologies (e.g., the format of a file or the copyright status of a document), domain ontologies (e.g., the components of an automobile or the entries of a schedule), and service ontologies (e.g., the inputs of a software module, the steps of a transaction, or the formats of messages that are passed back and forth between a client and server).” (Agre, Information Studies 277 Course Syllabus) Together, these four types of ontologies are intended to make all of the processes and outputs of a discipline semantically useful. This paper will now consider the specific issues associated with ontology development in each of these four areas, using the domain of cultural history as an example of the complexity involved in such an endeavor. Essentially, the discussion of these four ontology types--especially domain ontologies--will attempt to represent all of the complex processes and outputs of cultural history which could benefit from the application of semantic web technologies. The process of actually developing these four types of ontologies will undoubtedly be fraught with disagreement and ideological objections, and this paper will attempt to identify some of these potential problems.

Document ontologies

At a superficial level, document ontologies for cultural history (which would represent the structure and status of documents produced in the course of historical research) seem to be the least problematic of the four ontology types. In fact, both formal and informal document ontologies have already been developed which could be adapted for use by cultural historians, the most popular being the Text Encoding Initiative. TEI tags can be used to identify atomic-level document features, such as quotations or footnotes, thereby allowing sophisticated document retrieval. Widespread use of TEI or TEI-like tags in the domain queries like, “show me all of the documents on the web which quote Jacob Burckhardt and include references to Italian Renaissance sculpture in their footnotes.”

Although such granular document markup tags would be useful by themselves, a fully evolved document ontology for cultural history would require quite a few more “high level” tags. Specifically, a document ontology would have to make clear distinctions between document formats (e.g., books, journal articles, conference proceedings, working papers, web pages, etc.) and between primary and secondary sources. To some extent, document format metadata has been employed in library catalogs for decades, and it would be relatively simple to incorporate such distinctions into a document ontology. There are, however, some problems associated with distinguishing primary and secondary sources; namely, historians sometimes disagree about whether something is a primary or secondary source. This disagreement would need to be resolved in the ontology development stage, before confusion is created by mis-labeled documents.

Another, and more fundamental, problem exists with the use of a document ontology in cultural history: who should be required to undertake the enormous task of retroactively applying these markup tags to the millions of primary and secondary sources already available on the World Wide Web,2 and how would such a process be conducted and monitored to ensure the optimal application of semantic web technologies? There are no easy answers to these practical problems, although dividing the work of ontology application between libraries, archives, and museums (which would be responsible for adding document ontology tags to existing electronic primary source documents) and publishers (which would semantically mark up electronic books and journal articles, which tend to be secondary sources) could be a possible solution.

Metadata Ontologies

In a sense, the entire idea of the semantic web is based on the application of metadata (simply, data about data). In both the current World Wide Web and the future semantic web, metadata consists of non-displayed descriptive markup tags which provide computers with instructions for the formatting and processing of information. Currently, the most common applications of metadata within the domain of cultural history have been in the design of electronic article databases and web pages. Both of these applications use markup languages such as HTML or XML to organize, display, and retrieve information which can then be used in the service of historical research.

The semantic web has the potential to revolutionize the state of metadata ontologies used in cultural history, especially in making non-textual information accessible and searchable. Although history in general remains a text-based field, cultural historians have always placed a special emphasis on the study of cultural artifacts such as art, music, domestic objects, and the spoken word which are often difficult to store and access. Would-be metadata ontology designers would do a great service to cultural historians by agreeing upon audio, video, and image metadata standards for the consistent interchange of non-textual media. These ontology developers would also do well to develop metadata which optimizes citation linking and analysis, as citation-following has long been integral to the historical research process. Finally, any metadata ontology for cultural history will also need to accommodate special markup tags for each aspect of the domain ontologies which will be discussed presently.

Service Ontologies

Of the four ontology types discussed in this paper, the area of service ontologies for cultural history have the greatest potential to be impacted by semantic web technologies. Currently, it is difficult to think of online services designed to facilitate the historical research process beyond, perhaps, article databases and blogging services. All of that could change dramatically, however, with the implementation of ontologies and semantic web markup. The semantic web could stimulate the development of all sorts of customized research services, as well as multimedia, mapping, and timeline services (such as the prototype Historical Event Markup and Linking project described below). With the standardization of ontologies for history, publishers and research providers could develop a variety of highly specific software programs or online services designed to help scholars find, organize, and store information from millions of potential sources.

For example, a standardized ontology for cultural history could allow a service to be developed which allows users to search for a name, date, concept, or event and retrieve a set of results that includes oral histories in MP3 format, JPEG-formatted images, digitized manuscripts, hyperlinked maps, customized timelines, encyclopedia entries, term definitions, electronic books and scholarly research articles. Such a service would undoubtedly enrich learning experiences, and could even encourage new forms of inquiry and reveal new historical connections. Similar online services, such as the Perseus Digital Library (described below), are already being developed, although their impact is not yet widespread enough to significantly change the current practice of cultural history. The developers of these services have a unique opportunity to introduce the semantic web to a wide audience, thereby increasing the ubiquity of cultural history ontologies and facilitating the acceptance of semantic web technologies.

Domain Ontologies

The domain of cultural history already includes a number of specialized, tacitly accepted ontologies which guide its practitioners in their research and publication activities. These ontologies have not yet been explicitly enumerated or agreed upon, although in many cases they are maintained by self-enforcement and professional codes of conduct. As with other subdisciplines of history, the practice of cultural history is guided by specialized process ontologies which establish accepted protocols for research, publication, and interpretation, as well as by generally agreed-upon ontologies of time and evidence. These ontologies will be discussed below, with special attention given to the ways in which cultural history differs from other types of history.

Process Ontologies

Like other historical disciplines—and, in fact, most other academic disciplines—cultural history operates under several generally accepted protocols for the processes of research, publication, and interpretation. Currently, these protocols are enforced by a number of different bodies, including publishers and university academic departments. Most of these protocols, however, have traditionally been self-enforced through the acceptance of professional codes of conduct and ethics. Responsibility for the widespread awareness of these codes has traditionally fallen to national and international professional societies, such as the American Historical Association. The continuation of these protocols is self-enforced by individual scholars who agree to follow certain practical and ethical procedures in their work. In general terms, these process ontologies cover three distinct aspects of the scholarly process: research, publication, and interpretation.

The basic research process for cultural historians is similar to that of other disciplines. Professional cultural historians generally begin by identifying some discrete or fairly specific subject which has not yet been extensively researched. After conducting extensive literature searches to ascertain the current state of research on their subject, cultural historians begin the often laborious process of uncovering primary source material relevant to their topic. The real work of historical research then takes place as cultural historians subject these primary source materials to close reading, scrutiny, and analysis, in the hope that they will reveal something new about the historical context of an event or time period. After synthesizing their findings into a thesis, cultural historians document their research and conclusions by writing books, articles, and dissertations. The research and writing process is voluntarily constrained by a number of self-enforced principles, namely that all conclusions must be supported by evidence and that proper credit must be given for all citations of previously published material. Any ontology developed for cultural history should address the research and writing process; doing so would allow scholars to organize and share their ideas at all stages of the research process, as well as providing clear distinctions between working documents and finished research.

The publication and interpretation processes of cultural history are also generally constrained by tacit ontologies. The publication process for cultural history research is well-established and is essentially the same as that for all humanities disciplines. The process of submission, editorial review, printing, and peer review is widely accepted in scholarly communities and is currently enforced by book and journal publishers, although self-publication on the web and in weblogs is becoming increasingly common. An ontology of scholarly publication for cultural history could be borrowed or adapted from nearly any other domain, and would be invaluable in ensuring the seamless distribution and sharing of research findings.

Similarly, an ontology of cultural history processes should address the time-honored and widely accepted processes of scholarly interpretation, feedback, and collaboration. Traditionally, these processes have been facilitated by conferences, which allow for the presentation of new research and cross-subject collaboration with other scholars. These collaborative process encourage new scholarly research and encourage cross-disciplinary citations of research. As is the case with scholarly publishing, the World Wide Web is continuing to facilitate this collaborative process by connecting scholars from around the globe. An ontology of the processes of presentation, interpretation, and collaboration within the domain of cultural history would be useful in communicating the status of research, as well as drawing connections between related fields of inquiry.

Ontologies of Time

Perhaps the most complex ontologies which guide the practice of cultural history are those which relate to our understanding of time. Cultural historians’ ontologies of time are actually quite variable and complicated, especially because different cultures tend to have different conceptions of time and history. Beyond the admittedly simplistic distinction between the modern, “Western” conception of time as linear and the ancient, “Eastern” concentric views of time, cultural historians also must deal with non-discrete nature of historical periods and movements. Whereas political and military historians generally maintain highly detailed accounts of historical events which can be identified with discrete units of time (e.g., millennia, centuries, decades, years, months, weeks, days, hours, minutes, and seconds), cultural historians are often concerned with more conceptual units of time such as eras, movements, and periods. Additionally, there is considerable chronological uncertainty in the study of cultures for which calendar systems have not been developed. Occasionally, cultural history overlaps with archaeology in the use of absolute and relative scientific dating methods to reveal the historical context for an object or location. A prospective cultural history ontology must make provisions for both discrete and conceptual units of time, as well as culturally-based calendar systems. Such an ontology would be large and complex, but could probably be developed without significant scholarly disagreement.

Ontologies of Evidence

The same could probably not be said, however, for an ontology of evidence. Although historians agree upon the idea that conclusions and claims must be supported by relevant and convincing evidence, they tend to disagree about the nature of that evidence. An ontology of evidence for cultural history would be difficult to develop, since different schools of history value different kinds of evidence differently. Traditional history tends to value the evidentiary status of written records and documentation over artifacts and material culture since the latter’s meaning is less explicit. In general, though, postmodern cultural history seeks to establish the rich social context for events and to record individual experiences, guided by the idea that “truth” is subjective and depends upon the social and cultural context in which an event takes place. This fundamental difference occasionally leads to friction between historians over the evidentiary value of a particular resource; developers of an ontology of evidence for cultural history would need to work with scholars of all persuasions in order to create a standardized set of classes and properties for the nature of different types of evidence.

What will ontology development in cultural history require?

Would-be developers of ontologies for the processes and ideas of cultural history will have to overcome a number of obstacles before the semantic web gains full and widespread acceptance in their domain. As previously discussed, ontology developers will have to overcome multiple and differing concepts of time, place, and evidence. Ontology development will also require a great deal of cross-discipline communication and cooperation, which will be difficult given the time-consuming nature of regular historical work. For better or worse, historical scholarship remains a largely individual activity; individual scholars define their own research agendas, devote considerable time and energy to tracking down primary source materials in libraries and archives, and eventually become experts in narrow subject fields. This emphasis on individual inquiry and achievement is part of the appeal of the discipline for many history scholars, and it can impede domain-wide collaboration. Similarly, historians value input from other scholars but historical research and writing is generally not a communal activity.

As a discipline, history is also fairly receptive to a certain amount of revisionism and iconoclasm. This regular re-evaluation of traditional ideas is necessary to keep the discipline “fresh,” but it can problematic for the development of ontologies for a number of reasons. The reinterpretation of evidence often leads to new ideas which necessitate the revision and restructuring of ontologies. Revisionists also tend to reject rigid definitions of movements or time periods or classification systems, which might impede the successful introduction of domain ontologies.

Additionally, ontology developers will need to mediate between the differing needs of different disciplines of history. Some of these differing subject needs can be addressed by piggy-backing historical ontologies onto ontologies from other domains, although this introduces additional complexity to the process: political historians will need ontologies of politics and government, while military history will need to include organizational and tactical ontologies, and so on. However, each of these disciplines will eventually face the problem of dealing with the enormous diversity of political, cultural, military, geographic, and calendrical systems throughout all of human history—a problem which makes ontology development in cultural history such a difficult endeavor. Put simply, in order to be truly effective on a global scale an ontology for cultural history must take into account all of the peculiarities of human societies and activities throughout history. The only way to deal with this enormous complexity is to distribute the work of ontology development among communities of scholars from around the world, a solution which may be just as complex as the problem given the linguistic and philosophical barriers to the semantic web’s acceptance posed by differing research agendas.

Notable semantic web projects

There are many historical projects on the web, such as MATRIX, H-NET, the Durham Liber Vitae Project, and HistoricalVoices.org, which use semantic web technologies like XHTML, RSS, and XML+XSLT. Many similar projects are listed on the Humanities Scholarship web site. However, the majority of these projects use XML-based technologies for the display and dissemination of textual data or for the presentation of digitized manuscripts rather than exploring the true potential of ontology development to revolutionize historical research. Thus, it is difficult to find well-planned and innovative projects on the World Wide Web which demonstrate the true value of semantic web technologies in any historical domain, and especially in cultural history.

There exists no completed ontology of cultural history—at least none that could be found in my extensive searching of the World Wide Web—so the projects described here may only be partially related to the domain described in this paper. Each project, however, deals with some sub-discipline of the humanities; therefore, the issues and ideas raised by each project may be relevant to semantic web development efforts in cultural history. The state of ontology development ranges quite a bit among then seven projects described below, from pure speculation to preliminary planning to working demonstration. Each project has been selected from the potential universe of all semantic web projects for its unique contribution to ontology development in humanities disciplines. The discussion of these projects will follow a rough (and rather subjective) trajectory of practicality, completeness, currency and relevance, which is to say that the projects are described in order from least to most semantically significant.

Suda On-Line

This online lexicography of 10 th-century Byzantine Greek writings functions both as a digital library of primary sources, encyclopedia entries, translations, annotations, and bibliographies and as “a place where you can see the business of translation and annotation going on right before your eyes.” The SOL uses a distributed network of translators and editors to create an XML-encoded database of searchable primary source texts in a variety of languages. Although the true semantic nature of this resource is questionable (the XML source code is not published), the SOL includes some intriguingly semantic features which demonstrate the real-world potential of semantic web technologies in the humanities. Two particularly useful semantic features include the ability to instantly switch display character sets within a record and the helpful Cross Project Resource Discovery, which allows users to search for additional materials on their topic in 28 additional online bibliographic and ancient/medieval history collections.

Path to the Present

This brand-new (and apparently still-in-development) training tool for U.S. History teachers in Michigan’s public school system demonstrates the variety of semantic web applications in the domain of history education. The site’s home page integrates a number of useful features, all customized for a specialized audience: a dynamically updated calendar, RSS and Atom news feeds, and links to databases of images and bibliographic citations. These databases have not yet been populated, but it appears that the images and citations entered will be marked up with semantic tags to allow search filtering. Although the Path to the Present site is very much an unfinished and untested resource, it represents the rich potential of semantic web technologies to enhance scholarly collaboration and the customized distribution of resources.

Virtual Humanities Lab Weblog

This weblog, maintained by the Brown University Department of Italian Studies, documents the efforts of Italian studies scholars seeking to create a digital library of “user-oriented annotated digital editions, commentaries and interpretations of key texts of the Italian humanist tradition.” (VHL New Texts) The blog is part of Brown’s Virtual Humanities Lab, an NEH-funded experiment in international scholarly collaboration, primary source encoding, and online instruction. While the bulk of the project is still in development (funding was instituted less than a year ago), the project’s weblog is publicly available and provides a unique insight into the difficult process of designing and applying a semantic markup scheme. Most posts and comments are written by project members from around the world, and the topics illuminate the administrative and workflow issues associated with a real-life semantic web project. In true semantic web tradition, each blog post has been tagged with keywords which allow the blog to be filtered by subject. Of particular interest is the filter pertaining to semantic encoding, which chronicles the Virtual Humanities Lab’s ongoing efforts to mark up Italian works in XML.

Humor Ontology

Co-authored by three professors at the University of Alcalá in Barcelona Madrid,3 this recent (2005) paper describes a proposed semantic web ontology for the “multifaceted communication phenomenon” of humor, particularly within the context of political cartoons. The ontology is based on the CIDOC Conceptual Reference Model (discussed below), which allows for the semantic description of cultural heritage information. However, the creators of the humor ontology have gone beyond the basic CIDOC CRM framework in order to develop a unique domain-specific ontology which captures the essential techniques and characteristics of contemporary graphical humor. Although there is no evidence that the humor ontology has yet been applied in a practical situation, it is nonetheless a valuable contribution to semantic web research in cultural history because of its potential application to non-textual cultural history materials.

Project Notes at Open Blogged History

Although it is purely speculative and consists merely of notes and hand-drawn diagrams, this proposed “open history content management tool” represents the kind of structure which the semantic web might bring to the domain of history. The project’s creator envisions a chemistry-like system in which atomic elements like text, images, and links can be grouped into a variety of “molecular” descriptions and complex molecular structures. Essentially, the proposed system would enable the semantic identification and recombination of related elements regardless of their original context or format—a powerful tool for historical research. Such a system would be especially useful in the study of cultural history, since it would bring together previously dispersed threads of research in a variety of formats. This post was last updated in May 2004 and there is no evidence that the system was ever developed, but the project notes alone serve as an important exploration of the potential of the semantic web to revolutionize historical research.

Perseus Digital Library

As one of the most well-known and comprehensive digital library projects in the humanities, the Perseus Digital Library gives users access to a database of primary and secondary source historical materials. The XHTML-compliant Perseus site is regularly revised and updated, and it is gradually becoming more and more semantic. The newest version of the Perseus site offers automatic extraction and display of XML/TEI encoded primary source materials, such as this chapter of Aeschylus’ Prometheus Bound. Specialized tags like “speaker” and “milestone” allow for the semantic searching of texts, while XML language tags enable quick switching between English and Greek display texts. Overall, the Perseus Digital Library is one of the most successful digital libraries on the World Wide Web, and its shift toward more semantic features demonstrates the practicality and importance of semantic web projects in the domain of history.

HEML (Historical Event Markup and Linking project)

The Historical Event Markup and Linking project is an intriguing experiment based on the idea that keyword web searches can retrieve information about an event but cannot easily identify evidence of that event. The goal of the HEML project is to find and mark up documents and non-textual materials related to historical events and display them in unique and useful ways. Essentially, HEML provides a specialized set of tags used to semantically identify dates, locations, names, keywords, chronologies, participants, and evidence; the HEML engine then uses those tags to generate detailed, multi-lingual maps and timelines of events complete with links to primary source material. The creation of these maps and timelines requires the synthesis of disparate source material, and this synthesis is intended to encourage new forms of scholarship and new interpretations of historical events. The HEML project also represents a significant effort at historical ontology development: the project used ontologies of time to distinguish absolute and relative dates, while ontologies of place allow for the identification of cities, states, countries, continents, and GIS coordinates. Additionally, the HEML ontology recognizes the fact that the same person may have different roles in different events (e.g., Napoleon as emperor vs. Napoleon as military leader), and provides filtering functions which allow for even more specialized searching. The HEML project’s intensive focus on events may diminish its usefulness in the study of social history or historical movements, but it nonetheless demonstrates the potential value of ontology development in the domain of history. Unfortunately, the project appears to have suffered from a lack of funding and has not been updated since 2003. 6/1/06: Project founder Bruce Robertson reports that HEML is in fact actively being updated and will soon “employ OWL reasoning to visualize data from various cultural markup schemes, in particular cidoc-crm.”

The future of semantic web technologies in cultural history

Frankly, the future of semantic web technologies in the domain of cultural history is uncertain at best, despite the efforts of a number of forward-thinking scholars to envision more sophisticated methods of information exchange. Many of these writers observe (correctly, I would add) that semantic technologies are currently most often used to mark up and display digitized manuscripts and other text-based materials (cf. “Exhibition: A problem for conceptual modeling in the humanities”), a technique which does not explore the potential usefulness of full semantic web integration. There are, however, a number of fascinating treatises available on the web which do offer a glimpse at the future of the semantic web as it pertains to the study and exchange of cultural information. These treatises, such as David Laurie’s description the semantic markup of the Alberta (Canada) railway atlas, often focus on the semantic web’s ability to bring together documents and digital objects scattered across the World Wide Web and then display those diverse objects in a manner that promotes new methods of scholarly inquiry.

This type of thinking is especially prevalent in the museum world, which has long struggled with the issue of making physical collections available in digital formats for web audiences. Paolo Galluzzi’s paper titled “The Virtual Museum of the Future,” for example, emphasizes the semantic web’s ability to enable new juxtapositions and interpretations of cultural heritage objects, which will no longer be limited to collections of items grouped by format or time period. Galluzzi also encourages the development of subject ontologies for culture and history, which he foresees as “content-sensitive search engines” and “effective representations of the structures of knowledge embedded in web repositories.” This emphasis on ontology development is also emphasized by Kim Veltman in a paper titled “Towards a Semantic Web for Culture.” Veltman discusses the semantic web’s ability to trace meaning and knowledge organization across times and cultures through the use of “mapping devices” which provide connections across languages, vocabularies, time systems, and mapping systems. The best real-life example of such an ontology is the CIDOC-Conceptual Reference Model, which is currently being developed by the International Council of Museums. The CIDOC-CRM is intended to bridge the gap between computer science and the humanities and to provide a common semantic language for cultural heritage institutions for the markup of non-textual materials. The ontology is primarily intended to address the domains of natural history, ethnography, archaeology, historic monuments, and fine/applied arts; it will also provide the necessary historical, geographical, and theoretical context for the rich description and exchange of information from these domains. Thus, the CIDOC-CRM has significant potential for cultural history, especially if it is combined with a document ontology for text-based materials. But even the CIDOC-CRM’s potential is limited by the fact that semantic web technologies are not yet widely understood or accepted within the cultural history profession.

So how should would-be ontology developers address this lack of understanding? What is the real potential of the semantic web in the domain of cultural history? Any answer at this early date in the semantic web’s development is likely to be incomplete, but I will offer an estimate nonetheless. Full domain-wide agreement on the structure of ontologies for cultural history is unlikely, if only because of the occasionally iconoclastic and revisionist character of the discipline itself. Thus, development of a comprehensive ontology covering all of the document, metadata, service, and domain aspects of cultural history is highly unlikely to occur in the near future, particularly because history by its very nature is focused on the work of interpreting the past rather than adapting to the future; many cultural historians will be content to go about their work using tacit ontologies of process, time, and evidence instead of working to develop formalized ontologies for the networked exchange of knowledge. At the same time, the use of XML-based technologies in digital library projects will probably continue to grow as institutions begin to recognize the unique search and display options these markup systems afford. Eventually, this widespread use of XML (along with the continuing popularity of RSS) will begin to affect the pace of semantic web development in the domain of cultural history. In the long run, the semantic web will become more widely used across all domains of human activity, and the motivated yet currently small group of humanities computing scholars will convince their colleagues of the rich potential promised by the semantic web. At that point, we may finally begin to see the development of working ontologies which will enable the widespread implementation of semantic resources like HEML; as these technologies gain acceptance, the very nature of cultural history will no doubt begin to change in dramatic and exciting ways.

Bibliography of sources consulted

Papers, Abstracts, and Blog Posts

Agre, Phil. 2005. “Information Studies 277—Information Retrieval Systems: User-Centered Designs” [course syllabus] (http://polaris.gseis.ucla.edu/pagre/is277.html). Accessed June 4, 2005.

Ben Porat, Ziva and Wernher Behrendt. 2004. “What can the semantic web do for cultural heritage—today, tomorrow, or never?” (http://eculture.salzburgresearch.at/2004/en/behrendt_E_fin_230904.pdf). Accessed June 3, 2005.

Berners-Lee, Tim. 1998. “ Semantic Web Road Map” (http://www.w3.org/DesignIssues/Semantic.html). Accessed June 3, 2005.

---. 1998. “What the Semantic Web can Represent” (http://www.w3.org/DesignIssues/RDFnot.html). Accessed June 3, 2005.

Bruns, Axel. “Semantic Web on Steroids” (http://snurb.info/index.php?q=node/85). Accessed June 3, 2005.

Dumbill, Edd. 2000. “The Semantic Web: A Primer” (http://www.xml.com/pub/a/2000/11/01/semanticweb/). Accessed June 4, 2005.

Galluzzi, Paolo. 2003. The virtual museum of the Future (http://www.imss.fi.it/mesmuses/galluzzi.html). Accessed June 3, 2005.

García-Barriocanal, Elena, Miguel-Angel Sicilia, and David Palomar. “A Graphical Humor Ontology for Contemporary Cultural Heritage Access” (http://www.cc.uah.es/msicilia/papers/Garcia_ECIS_2005.pdf). Accessed June 3, 2005.

Hall, Michael. 2005. “Humanities Scholarship” (http://www.wam.umd.edu/~mlhall/scholarly.html). Accessed June 4, 2005.

Laurie, David. n.d. “XML Projects in the Humanities” (http://www.ualberta.ca/CNS/news/xml.html). Accessed June 4, 2005.

Liu, Alan. “ Transcendental Data: Toward a Cultural History and Aesthetics of the New Encoded Discourse” (http://www.uchicago.edu/research/jnl-crit-inq/features/artsstatements/arts.liu.htm).Accessed June 4, 2005.

Open Blogged History. 2004. “Trailblazer & BareBasics” (http://pzwart2.wdka.hro.nl/~fsnelting/open/archives/000275.html). Accessed June 3, 2005.

Pushkin, Dave. 2001. “The Relevance of Thomas Kuhn to Teaching and Intellectualism” (http://www.ucalgary.ca/hic/hic/website/2001vol1no1/forums/pushkin_forum_2001.pdf). Accessed June 4, 2005.

Renear, Allen H., Jin Ha Lee, Yunseon Choi, and Xin Xiang. 2005. “Exhibition: A Problem for Conceptual Modeling in the Humanities” (http://mustard.tapor.uvic.ca/cocoon/ach_abstracts/xq/pdf.xq?id=205). Accessed June 4, 2005.

Unsworth, John. n.d. “What is Humanities Computing (and What is Not)?” (http://www3.isrl.uiuc.edu/~unsworth/texas-hc.html). Accessed June 4, 2005.

Veltman, Kim. 2004. “Towards a Semantic Web for Culture” (http://jodi.tamu.edu/Articles/v04/i04/Veltman/). Accessed June 4, 2005.

Projects

AHRB Centre for North-East England History and the Centre for Computing in the Humanities, King's College, London. 2003. The Durham Liber Vitae Project (http://www.kcl.ac.uk/humanities/cch/dlv/index.html). Accessed June 3, 2005.

Brown University. 2005. Virtual Humanities Lab (http://www.brown.edu/Departments/Italian_Studies/vhl/). Accessed June 4, 2005.

Crane, Gregory. 2005. Perseus Digital Library (http://www.perseus.tufts.edu/). Accessed June 3, 2005.

H-Net. Humanities and Social Sciences Online (http://www.h-net.org/). Accessed June 3, 2005.

International Council of Museums. 2004. The CIDOC CRM (http://cidoc.ics.forth.gr/). Accessed June 3, 2005.

Michigan State University. 2004. MATRIX—The Center for Humane Arts, Letters, and Social Sciences Online (http://www.matrix.msu.edu/). Accessed June 3, 2005.

NINCH. 2003. National Initiative for a Networked Cultural Heritage (http://www.ninch.org/). Accessed June 4, 2005.

Okemos ( Mich.) Public School District. 2005. Path to the Present (http://www.matrix.msu.edu/~okemostah/). Accessed June 3, 2005.

Robertson, Bruce. 2003. Historical Event Markup and Linking Project (http://heml.mta.ca/heml-cocoon/). Accessed June 4, 2005.

Stoa Consortium. 2001. The Suda On Line (http://www.stoa.org/sol/). Accessed June 3, 2005.

General ontology development resources

DARPA Agent Markup Language ( DAML) Program. 2004. DAML Ontology Library (http://www.daml.org/ontologies/). Accessed June 3, 2005.

SchemaWeb. 2005. RDF Schemas Directory (http://www.schemaweb.info/default.aspx). Accessed June 3, 2005.

Footnotes

1: A more generalized domain focus may be necessary in cases where semantic web applications have not been developed for a discipline as specific as cultural history.

2: Semantic markup would only need to be applied to documents in electronic form and any electronic records which refer to physical objects, since the semantic web is logically confined to the realm of electronic information.

3: 1/29/07: Miguel Ángel Bernabé reported that the location of the University of Alcalá was incorrectly stated and should be changed to Madrid.