Search CORE

135 research outputs found

Spatial Analysis of Late Mesolithic and Neolithic surface scatters a test case of the Roerstreek (Middle Limburg)

Author: Wansleeben M.
Publication venue: 'Leiden University Press'
Publication date: 01/01/1987
Field of study

Leiden University Scholary Publications

The Meuse Valley Project: GIS and site location statistics

Author: Verhart L.B.M.
Wansleeben M.
Publication venue: 'Leiden University Press'
Publication date: 01/01/1992
Field of study

Leiden University Scholary Publications

Setting a Standard for the Exchange of Archaeological Data in the Netherlands

Author: Sueur C.
Verhagen J.W.H.P.
Wansleeben M.
Publication venue: Archaeolingua
Publication date: 01/01/2011
Field of study

VU Research Portal

Knowledge-Based Named Entity Recognition of Archaeological Concepts in Dutch

Author: Tudhope D
Vlachidis A
Wansleeben M
Publication venue: Metadata and Semantic Research MTSR 2020
Publication date: 18/03/2021
Field of study

The advancement of Natural Language Processing (NLP) allows the process of deriving information from large volumes of text to be automated, making text-based resources more discoverable and useful. The attention is turned to one of the most important, but traditionally difficult to access resources in archaeology; the largely unpublished reports generated by commercial or “rescue” archaeology, commonly known as “grey literature”. The paper presents the development and evaluation of a Named Entity Recognition system of Dutch archaeological grey literature targeted at extracting mentions of artefacts, archaeological features, materials, places and time entities. The role of domain vocabulary is discussed for the development of a KOS-driven NLP pipeline which is evaluated against a Gold Standard, human-annotated corpus

UCL Discovery

Can BERT Dig It? -- Named Entity Recognition for Information Retrieval in the Archaeology Domain

Author: Brandsen Alex
Lambers Karsten
Verberne Suzan
Wansleeben Milco
Publication venue
Publication date: 14/06/2021
Field of study

The amount of archaeological literature is growing rapidly. Until recently, these data were only accessible through metadata search. We implemented a text retrieval engine for a large archaeological text collection (

\sim 658

Million words). In archaeological IR, domain-specific entities such as locations, time periods, and artefacts, play a central role. This motivated the development of a named entity recognition (NER) model to annotate the full collection with archaeological named entities. In this paper, we present ArcheoBERTje, a BERT model pre-trained on Dutch archaeological texts. We compare the model's quality and output on a Named Entity Recognition task to a generic multilingual model and a generic Dutch model. We also investigate ensemble methods for combining multiple BERT models, and combining the best BERT model with a domain thesaurus using Conditional Random Fields (CRF). We find that ArcheoBERTje outperforms both the multilingual and Dutch model significantly with a smaller standard deviation between runs, reaching an average F1 score of 0.735. The model also outperforms ensemble methods combining the three models. Combining ArcheoBERTje predictions and explicit domain knowledge from the thesaurus did not increase the F1 score. We quantitatively and qualitatively analyse the differences between the vocabulary and output of the BERT models on the full collection and provide some valuable insights in the effect of fine-tuning for specific domains. Our results indicate that for a highly specific text domain such as archaeology, further pre-training on domain-specific data increases the model's quality on NER by a much larger margin than shown for other domains in the literature, and that domain-specific pre-training makes the addition of domain knowledge from a thesaurus unnecessary

arXiv.org e-Print Archive

Leiden University Scholary Publications

Setting a Standard for the Exchange of Archaeological Data in the Netherlands

Author: Sueur C.
Verhagen J.W.H.P.
Wansleeben M.
Publication venue: Budapest
Publication date: 01/01/2011
Field of study

The introduction and growth of a commercial market for archaeology has enormously increased the amount of archaeological fieldwork done in the Netherlands. This is combined with an increasing use of digital techniques to record, store and analyse excavation and survey data. The result has been a proliferation of data formats: the various companies doing archaeological fieldwork all have developed their own databases and GIS/CAD-systems for daily use. Because of this, a national metadata standard for describing archaeological data storage was introduced in 2007. However, this standard does not yet solve the problems of data exchange between archaeological companies, heritage managers and non-archaeological parties. In this paper, we will sketch the potential of exchange standards for three main categories of data: borehole data, the national sitesand monuments records, and finds that are submitted for storage in repositories

VU Research Portal

Usability evaluation for online professional search in the Dutch archaeology domain

Author: Brandsen A.
Lambers K.
Verberne S.
Wansleeben M.
Publication venue
Publication date: 01/01/2021
Field of study

Digital Archaeolog

arXiv.org e-Print Archive

Leiden University Scholary Publications

Creating a Dataset for Named Entity Recognition in the Archaeology Domain

Author: Brandsen A.
Lambers K.
Verberne S.
Wansleeben M.
Publication venue
Publication date: 01/01/2020
Field of study

In this paper, we present the development of a training dataset for Dutch Named Entity Recognition (NER) in the archaeology domain. This dataset was created as there is a dire need for semantic search within archaeology, in order to allow archaeologists to find structured information in collections of Dutch excavation reports, currently totalling around 60,000 (658 million words) and growing rapidly. To guide this search task, NER is needed. We created rigorous annotation guidelines in an iterative process, then instructed five archaeology students to annotate a number of documents. The resulting dataset contains ~31k annotations between six entity types (artefact, time period, place, context, species & material). The inter-annotator agreement is 0.95, and when we used this data for machine learning, we observed an increase in F1 score from 0.51 to 0.70 in comparison to a machine learning model trained on a dataset created in prior work. This indicates that the data is of high quality, and can confidently be used to train NER classifiersDigital ArchaeologyComputer Science

Leiden University Scholary Publications

User Requirement Solicitation for an Information Retrieval System Applied to Dutch Grey Literature in the Archaeology Domain

Author: Alex Brandsen
Karsten Lambers
Milco Wansleeben
Suzan Verberne
Publication venue: 'Ubiquity Press, Ltd.'
Publication date: 01/03/2019
Field of study

In this paper, we present the results of user requirement solicitation for a search system of grey literature in archaeology, specifically Dutch excavation reports. This search system uses Named Entity Recognition and Information Retrieval techniques to create an effective and effortless search experience. Specifically, we used Conditional Random Fields to identify entities, with an average accuracy of 56%. This is a baseline result, and we identified many possibilities for improvement. These entities were indexed in ElasticSearch and a user interface was developed on top of the index. This proof of concept was used in user requirement solicitation and evaluation with a group of end users. Feedback from this group indicated that there is a dire need for such a system, and that the first results are promising

Directory of Open Access Journals

Leiden University Scholary Publications

User Requirement Solicitation for an Information Retrieval System Applied to Dutch Grey Literature in the Archaeology Domain

Author: Brandsen A.
Lambers K.
Verberne S.
Wansleeben M.
Publication venue: 'Ubiquity Press, Ltd.'
Publication date: 18/03/2019
Field of study

Leiden University Scholary Publications