63 research outputs found

    Negation detection and word sense disambiguation in digital archaeology reports for the purposes of semantic annotation

    Get PDF
    The paper presents the role and contribution of Natural Language Processing Techniques, in particular Negation Detection and Word Sense Disambiguation in the process of Semantic Annotation of Archaeological Grey Literature. Archaeological reports contain a great deal of information that conveys facts and findings in different ways. This kind of information is highly relevant to the research and analysis of archaeological evidence but at the same time can be a hindrance for the accurate indexing of documents with respect to positive assertion

    A knowledge-based approach to information extraction for semantic interoperability in the archaeology domain

    Get PDF
    The paper presents a method for automatic semantic indexing of archaeological grey-literature reports using empirical (rule-based) Information Extraction techniques in combination with domain-specific knowledge organization systems. Performance is evaluated via the Gold Standard method. The semantic annotation system (OPTIMA) performs the tasks of Named Entity Recognition, Relation Extraction, Negation Detection and Word Sense disambiguation using hand-crafted rules and terminological resources for associating contextual abstractions with classes of the standard ontology (ISO 21127:2006) CIDOC Conceptual Reference Model (CRM) for cultural heritage and its archaeological extension, CRM-EH, together with concepts from English Heritage thesauri and glossaries.Relation Extraction performance benefits from a syntactic based definition of relation extraction patterns derived from domain oriented corpus analysis. The evaluation also shows clear benefit in the use of assistive NLP modules relating to word-sense disambiguation, negation detection and noun phrase validation, together with controlled thesaurus expansion.The semantic indexing results demonstrate the capacity of rule-based Information Extraction techniques to deliver interoperable semantic abstractions (semantic annotations) with respect to the CIDOC CRM and archaeological thesauri. Major contributions include recognition of relevant entities using shallow parsing NLP techniques driven by a complimentary use of ontological and terminological domain resources and empirical derivation of context-driven relation extraction rules for the recognition of semantic relationships from phrases of unstructured text. The semantic annotations have proven capable of supporting semantic query, document study and cross-searching via the ontology framework

    Semantic Indexing via Knowledge Organization Systems: Applying the CIDOC-CRM to Archaeological Grey Literature

    Get PDF
    The volume of archaeological reports being produced since the introduction of PG161 has significantly increased, as a result of the increased volume of archaeological investigations conducted by academic and commercial archaeology. It is highly desirable to be able to search effectively within and across such reports in order to find information that promotes quality research. A potential dissemination of information via semantic technologies offers the opportunity to improve archaeological practice, not only by enabling access to information but also by changing how information is structured and the way research is conducted. This thesis presents a method for automatic semantic indexing of archaeological greyliterature reports using rule-based Information Extraction techniques in combination with domain-specific ontological and terminological resources. This semantic annotation of contextual abstractions from archaeological grey-literature is driven by Natural Language Processing (NLP) techniques which are used to identify “rich” meaningful pieces of text, thus overcoming barriers in document indexing and retrieval imposed by the use of natural language. The semantic annotation system (OPTIMA) performs the NLP tasks of Named Entity Recognition, Relation Extraction, Negation Detection and Word Sense disambiguation using hand-crafted rules and terminological resources for associating contextual abstractions with classes of the ISO Standard (ISO 21127:2006) CIDOC Conceptual Reference Model (CRM) for cultural heritage and its archaeological extension, CRM-EH, together with concepts from English Heritage thesauri and glossaries. The results demonstrate that the techniques can deliver semantic annotations of archaeological grey literature documents with respect to the domain conceptual models. Such semantic annotations have proven capable of supporting semantic query, document study and cross-searching via web based applications. The research outcomes have provided semantic annotations for the Semantic Technologies for Archaeological Resources (STAR) project, which explored the potential of semantic technologies in the integration of archaeological digital resources. The thesis represents the first discussion on the employment of CIDOC CRM and CRM-EH in semantic annotation of grey-literature documents using rule-based Information Extraction techniques driven by a supplementary exploitation of domain-specific ontological and terminological resources. It is anticipated that the methods can be generalised in the future to the broader field of Digital Humanities

    Knowledge-Based Named Entity Recognition of Archaeological Concepts in Dutch

    Get PDF
    The advancement of Natural Language Processing (NLP) allows the process of deriving information from large volumes of text to be automated, making text-based resources more discoverable and useful. The attention is turned to one of the most important, but traditionally difficult to access resources in archaeology; the largely unpublished reports generated by commercial or “rescue” archaeology, commonly known as “grey literature”. The paper presents the development and evaluation of a Named Entity Recognition system of Dutch archaeological grey literature targeted at extracting mentions of artefacts, archaeological features, materials, places and time entities. The role of domain vocabulary is discussed for the development of a KOS-driven NLP pipeline which is evaluated against a Gold Standard, human-annotated corpus

    A pilot investigation of Information Extraction in the semantic annotation of archaeological reports

    Get PDF
    The paper discusses a prototype investigation of semantic annotation, a form of metadata assigning conceptual entities to textual instances; in the case of archaeological grey literature. The use of Information Extraction (IE), a Natural Language Processing (NLP) technique, is central to the annotation process while the use of Knowledge Organization System (KOS) is explored for the association of semantic annotation with both ontological and terminological references. The annotation process follows a rule-based information extraction approach using the GATE NLP toolkit, together with the CIDOC CRM ontology, its CRM-EH archaeological extension and English Heritage thesauri and glossaries. Results are reported from an initial evaluation, which suggest that these information extraction techniques can be applied to archaeological grey literature reports. Further work is discussed drawing on the evaluation and consideration of the characteristics of the archaeology domain. Copyright © 2012 Inderscience Enterprises Ltd

    Automatic metadata generation in an archaeological digital library: Semantic annotation of grey literature

    Get PDF
    . This paper discusses the automatic generation of rich metadata from excavation reports from the Archaeological Data Service library of grey literature (OASIS). The work is part of the STAR project, in collaboration with English Heritage. An extension of the CIDOC CRM ontology for the archaeological domain acts as a core ontology. Rich metadata is automatically extracted from grey literature, directed by the CRM, via a three phase process of semantic enrichment employing the GATE toolkit augmented with bespoke rules and knowledge resources. The paper demonstrates the potential of combining knowledge based resources (ontologies and thesauri) in information extraction, and techniques for delivering the automatically extracted metadata as XML annotations coupled with the grey literature reports and as RDF graphs decoupled from content. Examples from two consuming applications are discussed, the Andronikos web portal which serves the annotated XML files for visual inspection and the STAR project, research demonstrator which offers unified search across of archaeological excavation data and grey literature via the core ontology CRM-EH

    D16.4: Final Report on Natural Language Processing

    Get PDF
    This document is a deliverable (D16.4) of the ARIADNE project (“Advanced Research Infrastructure for Archaeological Dataset Networking in Europe”), which is funded under the European Community's Seventh Framework Programme. It presents the final results of the work carried out in Tasks 16.2 “Natural Language Processing (NLP)”. The report presents one of the most important, but traditionally difficult to access resources in archaeology; the largely unpublished reports generated by commercial or “rescue” archaeology, commonly known as “grey literature”, exploring both rule-based and machine learning NLP methods, the use of archaeological thesauri in NLP, and various Information Extraction (IE) methods in their own language

    Enabling European Archaeological Research: The ARIADNE E-Infrastructure

    Get PDF
    In the last 20 years, e-infrastructures have become ever more important for the conduct and progress of research in all branches of scientific enterprise. Increasingly collaborative, distributed and data-intensive research requires the sharing of resources (data, tools, computing facilities) via e-infrastructure as well as support for effective co-operation among research groups (ESF 2011; ESFRI 2016). Moreover there is the expectation that with large datasets ('big data'), e-infrastructure and advanced computing techniques, new scientific questions can be tackled. The archaeological research community has been an early adopter of various digital methods and tools for data acquisition, organisation, analysis and presentation of research results of individual projects. The provision of e-infrastructure and services for data sharing, discovery, access and re-use for the heritage sector is, however, lagging behind other research fields, such as the natural and life sciences. The consequence is a high level of fragmentation of archaeological data and limited capability for collaborative research across institutional and national as well as disciplinary boundaries (Aspöck and Geser 2014). This situation is being addressed by ARIADNE: the Advanced Research Infrastructure for Archaeological Dataset Networking in Europe. This e-infrastructure initiative is being promoted by a consortium of archaeological institutes, data archives and technology developers, and funded under the European Commission's Seventh Framework Programme (ARIADNE 2014a; Niccolucci and Richards 2013). ARIADNE enables archaeological data providers, large and small, to register and connect their resources (datasets, collections) to the e-infrastructure, and a data portal provides search, access and other services across the integrated resources. The portal puts into operation a proof of concept exemplar first developed under the ARENA (Archaeological Records of Europe Networked Access) project (Kenny and Richards 2005; Kilbride 2004), itself inspired by a proposal made by Hansen (1993). ARIADNE integrates resource discovery metadata using various controlled vocabularies, e.g. the W3C Data Catalogue Vocabulary (adapted for describing archaeological datasets), subject thesauri, gazetteers, chronologies, and the CIDOC Conceptual Reference Model (CRM). Based on this integration the data portal offers several ways to search and access resources made available by data providers located in different countries. ARIADNE thus acts as a broker between data providers and users and offers additional web services for products such as high-resolution images, Reflectance Transformation Imaging (RTI), 3D objects and landscapes. Employing such services in research projects or for content deposited in digital archives will greatly enhance the ability of researchers to publish, access and study archaeological content online. ARIADNE therefore represents a substantial advance for archaeology; in particular it provides a common platform where dispersed data resources can be uniformly described, discovered and accessed. It is also an essential step towards the even more ambitious goal of offering archaeologists integrated data, tools and computing resources for web-based research that creates new knowledge (e-archaeology). The next section describes the current landscape of data repositories and services for archaeologists in Europe, and the issues that make interoperability between them difficult to realise. The results of the ARIADNE user surveys undertaken to match expectations and requirements for the e-infrastructure and data portal services are then presented. The main part of the article describes ARIADNE's overall architecture, core services (data registration, discovery and access) and other extant or experimental services. A further section presents the on-going evaluation of the data integration and set of services. Finally, the article summarises some lessons already learned in the integration of data resources and services, and considers the prospects for the wider engagement of the archaeological research community in sharing data through the ARIADNE e-infrastructure and portal

    Enabling European archaeological research: The ARIADNE E-infrastructure

    Get PDF
    Research e-infrastructures, digital archives and data services have become important pillars of scientific enterprise that in recent decades has become ever more collaborative, distributed and data-intensive. The archaeological research community has been an early adopter of digital tools for data acquisition, organisation, analysis and presentation of research results of individual projects. However, the provision of einfrastructure and services for data sharing, discovery, access and re-use has lagged behind. This situation is being addressed by ARIADNE: the Advanced Research Infrastructure for Archaeological Dataset Networking in Europe. This EUfunded network has developed an einfrastructure that enables data providers to register and provide access to their resources (datasets, collections) through the ARIADNE data portal, facilitating discovery, access and other services across the integrated resources. This article describes the current landscape of data repositories and services for archaeologists in Europe, and the issues that make interoperability between them difficult to realise. The results of the ARIADNE surveys on users' expectations and requirements are also presented. The main section of the article describes the architecture of the einfrastructure, core services (data registration, discovery and access) and various other extant or experimental services. The ongoing evaluation of the data integration and services is also discussed. Finally, the article summarises lessons learned, and outlines the prospects for the wider engagement of the archaeological research community in sharing data through ARIADNE
    • …
    corecore