The volume of archaeological reports being produced since the introduction of PG161
has
significantly increased, as a result of the increased volume of archaeological investigations
conducted by academic and commercial archaeology. It is highly desirable to be able to
search effectively within and across such reports in order to find information that promotes
quality research. A potential dissemination of information via semantic technologies offers
the opportunity to improve archaeological practice, not only by enabling access to
information but also by changing how information is structured and the way research is
conducted.
This thesis presents a method for automatic semantic indexing of archaeological greyliterature
reports using rule-based Information Extraction techniques in combination with
domain-specific ontological and terminological resources. This semantic annotation of
contextual abstractions from archaeological grey-literature is driven by Natural Language
Processing (NLP) techniques which are used to identify “rich” meaningful pieces of text,
thus overcoming barriers in document indexing and retrieval imposed by the use of natural
language. The semantic annotation system (OPTIMA) performs the NLP tasks of Named
Entity Recognition, Relation Extraction, Negation Detection and Word Sense
disambiguation using hand-crafted rules and terminological resources for associating
contextual abstractions with classes of the ISO Standard (ISO 21127:2006) CIDOC
Conceptual Reference Model (CRM) for cultural heritage and its archaeological extension,
CRM-EH, together with concepts from English Heritage thesauri and glossaries.
The results demonstrate that the techniques can deliver semantic annotations of
archaeological grey literature documents with respect to the domain conceptual models.
Such semantic annotations have proven capable of supporting semantic query, document
study and cross-searching via web based applications. The research outcomes have
provided semantic annotations for the Semantic Technologies for Archaeological
Resources (STAR) project, which explored the potential of semantic technologies in the
integration of archaeological digital resources. The thesis represents the first discussion on
the employment of CIDOC CRM and CRM-EH in semantic annotation of grey-literature
documents using rule-based Information Extraction techniques driven by a supplementary
exploitation of domain-specific ontological and terminological resources. It is anticipated
that the methods can be generalised in the future to the broader field of Digital Humanities