Search CORE

1,378 research outputs found

A Faceted Query Engine Applied to Archaeology

Author: Doerr
Giddy
Ross
Wynar
Publication venue: 'Council for British Archaeology'
Publication date: 01/01/2007
Field of study

Can BERT Dig It? -- Named Entity Recognition for Information Retrieval in the Archaeology Domain

Author: Brandsen Alex
Lambers Karsten
Verberne Suzan
Wansleeben Milco
Publication venue
Publication date: 14/06/2021
Field of study

The amount of archaeological literature is growing rapidly. Until recently, these data were only accessible through metadata search. We implemented a text retrieval engine for a large archaeological text collection (

\sim 658

Million words). In archaeological IR, domain-specific entities such as locations, time periods, and artefacts, play a central role. This motivated the development of a named entity recognition (NER) model to annotate the full collection with archaeological named entities. In this paper, we present ArcheoBERTje, a BERT model pre-trained on Dutch archaeological texts. We compare the model's quality and output on a Named Entity Recognition task to a generic multilingual model and a generic Dutch model. We also investigate ensemble methods for combining multiple BERT models, and combining the best BERT model with a domain thesaurus using Conditional Random Fields (CRF). We find that ArcheoBERTje outperforms both the multilingual and Dutch model significantly with a smaller standard deviation between runs, reaching an average F1 score of 0.735. The model also outperforms ensemble methods combining the three models. Combining ArcheoBERTje predictions and explicit domain knowledge from the thesaurus did not increase the F1 score. We quantitatively and qualitatively analyse the differences between the vocabulary and output of the BERT models on the full collection and provide some valuable insights in the effect of fine-tuning for specific domains. Our results indicate that for a highly specific text domain such as archaeology, further pre-training on domain-specific data increases the model's quality on NER by a much larger margin than shown for other domains in the literature, and that domain-specific pre-training makes the addition of domain knowledge from a thesaurus unnecessary

arXiv.org e-Print Archive

Leiden University Scholary Publications

A knowledge-based approach to information extraction for semantic interoperability in the archaeology domain

Author: Tudhope Douglas
Vlachidis Andreas
Publication venue: 'Wiley'
Publication date: 01/01/2015
Field of study

The paper presents a method for automatic semantic indexing of archaeological grey-literature reports using empirical (rule-based) Information Extraction techniques in combination with domain-specific knowledge organization systems. Performance is evaluated via the Gold Standard method. The semantic annotation system (OPTIMA) performs the tasks of Named Entity Recognition, Relation Extraction, Negation Detection and Word Sense disambiguation using hand-crafted rules and terminological resources for associating contextual abstractions with classes of the standard ontology (ISO 21127:2006) CIDOC Conceptual Reference Model (CRM) for cultural heritage and its archaeological extension, CRM-EH, together with concepts from English Heritage thesauri and glossaries.Relation Extraction performance benefits from a syntactic based definition of relation extraction patterns derived from domain oriented corpus analysis. The evaluation also shows clear benefit in the use of assistive NLP modules relating to word-sense disambiguation, negation detection and noun phrase validation, together with controlled thesaurus expansion.The semantic indexing results demonstrate the capacity of rule-based Information Extraction techniques to deliver interoperable semantic abstractions (semantic annotations) with respect to the CIDOC CRM and archaeological thesauri. Major contributions include recognition of relevant entities using shallow parsing NLP techniques driven by a complimentary use of ontological and terminological domain resources and empirical derivation of context-driven relation extraction rules for the recognition of semantic relationships from phrases of unstructured text. The semantic annotations have proven capable of supporting semantic query, document study and cross-searching via the ontology framework

Crossref

University of South Wales Research Explorer

UWE Bristol Research Repository

UCL Discovery

Googling the Grey: Open Data, Web Services, and Semantics

Author: Andrew Baines
Christine Borgman
Cindy Stankowski
Cori Hayden
David Schloen
Dean R. Snow
Eric C. Kansa
Eric C. Kansa
Eric C. Kansa
Eric Kansa
Eric Kansa
Francis P. McManamon
Geoffrey C. Bowker
George P. Nicholas
Jennifer Trant
Karl-Heinz Lampe
Keith Kintigh
Kimberly Christen
Margie M. Burton
Martin Doerr
Martin Doerr
Michael Brown
Robin Boast
Sarah Whitcher Kansa
Sergey Brin
Tim Brody
Timothy J. Barringer
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

HEALTH GeoJunction: place-time-concept browsing of health publications

Author: MacEachren Alan M
Pezanowski Scott
Stryker Michael S
Turton Ian J
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background The volume of health science publications is escalating rapidly. Thus, keeping up with developments is becoming harder as is the task of finding important cross-domain connections. When geographic location is a relevant component of research reported in publications, these tasks are more difficult because standard search and indexing facilities have limited or no ability to identify geographic foci in documents. This paper introduces <it><smcaps>HEALTH</smcaps> GeoJunction</it>, a web application that supports researchers in the task of quickly finding scientific publications that are relevant geographically and temporally as well as thematically. Results <it><smcaps>HEALTH</smcaps> GeoJunction </it>is a geovisual analytics-enabled web application providing: (a) web services using computational reasoning methods to extract place-time-concept information from bibliographic data for documents and (b) visually-enabled place-time-concept query, filtering, and contextualizing tools that apply to both the documents and their extracted content. This paper focuses specifically on strategies for visually-enabled, iterative, facet-like, place-time-concept filtering that allows analysts to quickly drill down to scientific findings of interest in PubMed abstracts and to explore relations among abstracts and extracted concepts in place and time. The approach enables analysts to: find publications without knowing all relevant query parameters, recognize unanticipated geographic relations within and among documents in multiple health domains, identify the thematic emphasis of research targeting particular places, notice changes in concepts over time, and notice changes in places where concepts are emphasized. Conclusions PubMed is a database of over 19 million biomedical abstracts and citations maintained by the National Center for Biotechnology Information; achieving quick filtering is an important contribution due to the database size. Including geography in filters is important due to rapidly escalating attention to geographic factors in public health. The implementation of mechanisms for iterative place-time-concept filtering makes it possible to narrow searches efficiently and quickly from thousands of documents to a small subset that meet place-time-concept constraints. Support for a <it>more-like-this </it>query creates the potential to identify unexpected connections across diverse areas of research. Multi-view visualization methods support understanding of the place, time, and concept components of document collections and enable comparison of filtered query results to the full set of publications.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Terminology web services

Author: Binding Ceri
Tudhope Douglas
Publication venue
Publication date: 01/01/2010
Field of study

University of South Wales Research Explorer

Searching Data: A Review of Observational Data Retrieval Practices in Selected Disciplines

Author: Aloia N.
Beran B.
Borgman C.L.
Carlson J.
Fielding N.G.
Honor L.B.
Ingwersen P.
Maier D.
Meyer E.T.
Pasquetto I.V.
Zimmerman A.S.
Publication venue: 'Wiley'
Publication date: 03/04/2019
Field of study

A cross-disciplinary examination of the user behaviours involved in seeking and evaluating data is surprisingly absent from the research data discussion. This review explores the data retrieval literature to identify commonalities in how users search for and evaluate observational research data. Two analytical frameworks rooted in information retrieval and science technology studies are used to identify key similarities in practices as a first step toward developing a model describing data retrieval

arXiv.org e-Print Archive

Maastricht University Research Portal

Crossref

International Migration, Integration and Social Cohesion online publications

UvA-DARE

Methods, data and tools for facilitating a 3D cultural heritage space

Author: Farella E. M.
Ioannidis G.
Maietti F.
Medici M.
Münster S.
Remondino F.
Rigon S.
Sánchez A.
Stan A.
Publication venue
Publication date: 01/01/2024
Field of study

In recent years, the cultural heritage (CH) sector has experienced a rapid evolution due to the introduction of increasingly powerful digital technologies and ICT (Information and Communication Technologies) solutions. As for many other domains, digital data, Artificial Intelligence (AI), and Extended Reality (XR) are opening up extraordinary opportunities for expanding heritage knowledge capabilities while boosting the research on innovative solutions for its valorisation and preservation. Being aware of the fundamental and strategic role of CH in the history and identity of the European countries, the European Commission has assumed a central role in fuelling the policy debate and putting together stakeholders to take a step forward in CH digitization and use, primarily through initiatives, programs, and recommendations. Within this framework, the ongoing European 5DCulture project (https://www.5dculture.eu/) has been funded to enrich the offer of 3D CH digital assets in the common European Data Space for Cultural Heritage by creating high-quality 3D contents and to foster their re-use in several sectors, from tourism to education. Through 8 re-use scenarios around historic buildings and cityscapes, archaeology, and fashion, the project aims to deliver a set of digital tools and increase the capacity of CH institutions to more effectively re-use their 3D digital assets

Archivio della ricerca - Fondazione Bruno Kessler

Archivio istituzionale della ricerca - Università di Ferrara

Usability evaluation for online professional search in the Dutch archaeology domain

Author: Brandsen A.
Lambers K.
Verberne S.
Wansleeben M.
Publication venue
Publication date: 01/01/2021
Field of study

Digital Archaeolog

arXiv.org e-Print Archive

Leiden University Scholary Publications