197 research outputs found

    Information Extraction Techniques for the Purposes of Semantic Indexing of Archaeological Resources

    Get PDF
    The paper describes the use of Information Extraction (IE), a Natural Language Processing (NLP) technique to assist ‘rich’ semantic indexing of diverse archaeological text resources. Such unpublished online documents are often referred to as ‘Grey Literature’. Established document indexing techniques are not sufficient to satisfy user information needs that expand beyond the limits of a simple term matching search. The focus of the research is to direct a semantic-aware 'rich' indexing of diverse natural language resources with properties capable of satisfying information retrieval from on-line publications and datasets associated with the Semantic Technologies for Archaeological Resources (STAR) project in the UoG Hypermedia Research Unit. The study proposes the use of knowledge resources and conceptual models to assist an Information Extraction process able to provide ‘rich’ semantic indexing of archaeological documents capable of resolving linguistic ambiguities of indexed terms. CRM CIDOC-EH, a standard core ontology in cultural heritage, and the English Heritage (EH) Thesauri for archaeological concepts are employed to drive the Information Extraction process and to support the aims of a semantic framework in which indexed terms are capable of supporting semantic-aware access to on-line resources. The paper describes the process of semantic indexing of archaeological concepts (periods and finds) in a corpus of 535 grey literature documents using a rule based Information Extraction technique facilitated by the General Architecture of Text Engineering (GATE) toolkit and expressed by Java Annotation Pattern Engine (JAPE) rules. Illustrative examples demonstrate the different stages of the process. Initial results suggest that the combination of information extraction with knowledge resources and standard core conceptual models is capable of supporting semantic aware and linguistically disambiguate term indexing

    Migration on request, a practical technique for preservation

    Get PDF
    Maintaining a digital object in a usable state over time is a crucial aspect of digital preservation. Existing methods of preserving have many drawbacks. This paper describes advanced techniques of data migration which can be used to support preservation more accurately and cost effectively. To ensure that preserved works can be rendered on current computer systems over time, “traditional migration” has been used to convert data into current formats. As the new format becomes obsolete another conversion is performed, etcetera. Traditional migration has many inherent problems as errors during transformation propagate throughout future transformations. CAMiLEON’s software longevity principles can be applied to a migration strategy, offering improvements over traditional migration. This new approach is named “Migration on Request.” Migration on Request shifts the burden of preservation onto a single tool, which is maintained over time. Always returning to the original format enables potential errors to be significantly reduced

    Geographical information retrieval with ontologies of place

    Get PDF
    Geographical context is required of many information retrieval tasks in which the target of the search may be documents, images or records which are referenced to geographical space only by means of place names. Often there may be an imprecise match between the query name and the names associated with candidate sources of information. There is a need therefore for geographical information retrieval facilities that can rank the relevance of candidate information with respect to geographical closeness of place as well as semantic closeness with respect to the information of interest. Here we present an ontology of place that combines limited coordinate data with semantic and qualitative spatial relationships between places. This parsimonious model of geographical place supports maintenance of knowledge of place names that relate to extensive regions of the Earth at multiple levels of granularity. The ontology has been implemented with a semantic modelling system linking non-spatial conceptual hierarchies with the place ontology. An hierarchical spatial distance measure is combined with Euclidean distance between place centroids to create a hybrid spatial distance measure. This is integrated with thematic distance, based on classification semantics, to create an integrated semantic closeness measure that can be used for a relevance ranking of retrieved objects

    Natural Language Processing for Under-resourced Languages: Developing a Welsh Natural Language Toolkit

    Get PDF
    Language technology is becoming increasingly important across a variety of application domains which have become common place in large, well-resourced languages. However, there is a danger that small, under-resourced languages are being increasingly pushed to the technological margins. Under-resourced languages face significant challenges in delivering the underlying language resources necessary to support such applications. This paper describes the development of a natural language processing toolkit for an under-resourced language, Cymraeg (Welsh). Rather than creating the Welsh Natural Language Toolkit (WNLT) from scratch, the approach involved adapting and enhancing the language processing functionality provided for other languages within an existing framework and making use of external language resources where available. This paper begins by introducing the GATE NLP framework, which was used as the development platform for the WNLT. It then describes each of the core modules of the WNLT in turn, detailing the extensions and adaptations required for Welsh language processing. An evaluation of the WNLT is then reported. Following this, two demonstration applications are presented. The first is a simple text mining application that analyses wedding announcements. The second describes the development of a Twitter NLP application, which extends the core WNLT pipeline. As a relatively small-scale project, the WNLT makes use of existing external language resources where possible, rather than creating new resources. This approach of adaptation and reuse can provide a practical and achievable route to developing language resources for under-resourced languages

    Knowledge-Based Named Entity Recognition of Archaeological Concepts in Dutch

    Get PDF
    The advancement of Natural Language Processing (NLP) allows the process of deriving information from large volumes of text to be automated, making text-based resources more discoverable and useful. The attention is turned to one of the most important, but traditionally difficult to access resources in archaeology; the largely unpublished reports generated by commercial or “rescue” archaeology, commonly known as “grey literature”. The paper presents the development and evaluation of a Named Entity Recognition system of Dutch archaeological grey literature targeted at extracting mentions of artefacts, archaeological features, materials, places and time entities. The role of domain vocabulary is discussed for the development of a KOS-driven NLP pipeline which is evaluated against a Gold Standard, human-annotated corpus

    An agent-directed-marine navigation simulator

    Get PDF

    Excavating grey literature: A case study on the rich indexing of archaeological documents via natural language‐processing techniques and knowledge‐based resources

    Get PDF
    PURPOSE: This paper sets out to discuss the use of information extraction (IE), a natural language‐processing (NLP) technique to assist “rich” semantic indexing of diverse archaeological text resources. The focus of the research is to direct a semantic‐aware “rich” indexing of diverse natural language resources with properties capable of satisfying information retrieval from online publications and datasets associated with the Semantic Technologies for Archaeological Resources (STAR) project. DESIGN/METHODOLOGY/APPROACH: The paper proposes use of the English Heritage extension (CRM‐EH) of the standard core ontology in cultural heritage, CIDOC CRM, and exploitation of domain thesauri resources for driving and enhancing an Ontology‐Oriented Information Extraction process. The process of semantic indexing is based on a rule‐based Information Extraction technique, which is facilitated by the General Architecture of Text Engineering (GATE) toolkit and expressed by Java Annotation Pattern Engine (JAPE) rules. FINDINGS: Initial results suggest that the combination of information extraction with knowledge resources and standard conceptual models is capable of supporting semantic‐aware term indexing. Additional efforts are required for further exploitation of the technique and adoption of formal evaluation methods for assessing the performance of the method in measurable terms. ORIGINALITY/VALUE: The value of the paper lies in the semantic indexing of 535 unpublished online documents often referred to as “Grey Literature”, from the Archaeological Data Service OASIS corpus (Online AccesS to the Index of archaeological investigationS), with respect to the CRM ontological concepts E49.Time Appellation and P19.Physical Object

    D16.4: Final Report on Natural Language Processing

    Get PDF
    This document is a deliverable (D16.4) of the ARIADNE project (“Advanced Research Infrastructure for Archaeological Dataset Networking in Europe”), which is funded under the European Community's Seventh Framework Programme. It presents the final results of the work carried out in Tasks 16.2 “Natural Language Processing (NLP)”. The report presents one of the most important, but traditionally difficult to access resources in archaeology; the largely unpublished reports generated by commercial or “rescue” archaeology, commonly known as “grey literature”, exploring both rule-based and machine learning NLP methods, the use of archaeological thesauri in NLP, and various Information Extraction (IE) methods in their own language

    Coral record of Younger Dryas Chronozone warmth on the Great Barrier Reef

    Get PDF
    Author Posting. © American Geophysical Union, 2020. This article is posted here by permission of American Geophysical Union for personal use, not for redistribution. The definitive version was published in Paleoceanography and Paleoclimatology 35(12), (2020): e2020PA003962, doi:10.1029/2020PA003962.The Great Barrier Reef (GBR) is an internationally recognized and widely studied ecosystem, yet little is known about its sea surface temperature (SST) evolution since the Last Glacial Maximum (LGM) (~20 kyr BP). Here, we present the first paleo‐application of Isopora coral‐derived SST calibrations to a suite of 25 previously published fossil Isopora from the central GBR spanning ~25–11 kyr BP. The resultant multicoral Sr/Ca‐ and δ18O‐derived SST anomaly (SSTA) histories are placed within the context of published relative sea level, reef sequence, and coralgal reef assemblage evolution. Our new calculations indicate SSTs were cooler on average by ~5–5.5°C at Noggin Pass (~17°S) and ~7–8°C at Hydrographer's Passage (~20°S) (Sr/Ca‐derived) during the LGM, in line with previous estimates (Felis et al., 2014, https://doi.org/10.1038/ncomms5102). We focus on contextualizing the Younger Dryas Chronozone (YDC, ~12.9–11.7 kyr BP), whose Southern Hemisphere expression, in particular in Australia, is elusive and poorly constrained. Our record does not indicate cooling during the YDC with near‐modern temperatures reached during this interval on the GBR, supporting an asymmetric hemispheric presentation of this climate event. Building on a previous study (Felis et al., 2014, https://doi.org10.1038/ncomms5102), these fossil Isopora SSTA data from the GBR provide new insights into the deglacial reef response, with near‐modern warming during the YDC, since the LGM.This work was funded by National Science Foundation (NSF) award OCE 13‐56948 to B. K. L, with NSF GRFP support DGE‐11‐44155 to L. D. B., and the Australian Research Council (grant no. DP1094001) and ANZIC IODP. Partial support for B. K. L's work on this project also came from the Vetlesen Foundation via a gift to the Lamont‐Doherty Earth Observatory. T. F. received funding from the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation)—Project number 180346848, through Priority Program 527 “IODP.” A. T. received support from the UK Natural Environment Research Council (NE/H014136/1 and NE/H014268/1). M. T. thanks Ministry of Earth Sciences for support (NCPOR contribution no. J‐84/2020‐21). L. D. B. would also like to thank Kassandra Costa for her input regarding error analysis.2021-06-1
    corecore