Search CORE

15,296 research outputs found

Good Applications for Crummy Entity Linkers? The Case of Corpus Selection in Digital Humanities

Author: Beelen Kaspar
Kamps Jaap
Marx Maarten
Olieman Alex
van Lange Milan
Publication venue
Publication date: 01/01/2017
Field of study

Over the last decade we have made great progress in entity linking (EL) systems, but performance may vary depending on the context and, arguably, there are even principled limitations preventing a "perfect" EL system. This also suggests that there may be applications for which current "imperfect" EL is already very useful, and makes finding the "right" application as important as building the "right" EL system. We investigate the Digital Humanities use case, where scholars spend a considerable amount of time selecting relevant source texts. We developed WideNet; a semantically-enhanced search tool which leverages the strengths of (imperfect) EL without getting in the way of its expert users. We evaluate this tool in two historical case-studies aiming to collect a set of references to historical periods in parliamentary debates from the last two decades; the first targeted the Dutch Golden Age, and the second World War II. The case-studies conclude with a critical reflection on the utility of WideNet for this kind of research, after which we outline how such a real-world application can help to improve EL technology in general.Comment: Accepted for presentation at SEMANTiCS '1

arXiv.org e-Print Archive

International Migration, Integration and Social Cohesion online publications

UvA-DARE

Entity-Centric Text Mining for Historical Documents

Author: Coll Ardanuy Maria
Publication venue
Publication date: 07/07/2017
Field of study

Georg-August-University Göttingen

Semantic Enrichment of a Multilingual Archive with Linked Open Data

Author: De Wilde Max
Hengchen Simon
Publication venue
Publication date: 01/01/2017
Field of study

This paper introduces MERCKX, a Multilingual Entity/Resource Combiner & Knowledge eXtractor. A case study involving the semantic enrichment of a multilingual archive is presented with the aim of assessing the relevance of natural language processing techniques such as named-entity recognition and entity linking for cultural heritage material. In order to improve the indexing of historical collections, we map entities to the Linked Open Data cloud using a language-independent method. Our evaluation shows that MERCKX outperforms similar tools on the task of place disambiguation and linking, achieving over 80% precision despite lower recall scores. These results are encouraging for small and medium-size cultural institutions since they demonstrate that semantic enrichment can be achieved with limited resources.Peer reviewe

DI-fusion

Helsingin yliopiston digitaalinen arkisto

DARIAH and the Benelux

Author: Backes Marianne
Chambers Sally
Hoogerwerf Maarten
Van der West Jan
Publication venue: Department of Applied Linguistics, Translators and Interpreters, University of Antwerp
Publication date: 01/01/2015
Field of study

Ghent University Academic Bibliography

Data Centric Domain Adaptation for Historical Text with OCR Errors

Author: Lladós J.
Lopresti D.
Marz Luisa
Poerner Nina
Roth Benjamin
Schweter Stefan
Schütze Hinrich
Uchida S.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 02/09/2021
Field of study

We propose new methods for in-domain and cross-domain Named Entity Recognition (NER) on historical data for Dutch and French. For the cross-domain case, we address domain shift by integrating unsupervised in-domain data via contextualized string embeddings; and OCR errors by injecting synthetic OCR errors into the source domain and address data centric domain adaptation. We propose a general approach to imitate OCR errors in arbitrary input data. Our cross-domain as well as our in-domain results outperform several strong baselines and establish state-of-the-art results. We publish preprocessed versions of the French and Dutch Europeana NER corpora

Open Access LMU

Automated speech and audio analysis for semantic access to multimedia

Author: Huijbregts Marijn
Jong Franciska de
Ordelman Roeland
Publication venue: Springer Verlag
Publication date: 01/01/2006
Field of study

The deployment and integration of audio processing tools can enhance the semantic annotation of multimedia content, and as a consequence, improve the effectiveness of conceptual access tools. This paper overviews the various ways in which automatic speech and audio analysis can contribute to increased granularity of automatically extracted metadata. A number of techniques will be presented, including the alignment of speech and text resources, large vocabulary speech recognition, key word spotting and speaker classification. The applicability of techniques will be discussed from a media crossing perspective. The added value of the techniques and their potential contribution to the content value chain will be illustrated by the description of two (complementary) demonstrators for browsing broadcast news archives

University of Twente Research Information

DHBeNeLux : incubator for digital humanities in Belgium, the Netherlands and Luxembourg

Author: Chambers Sally
Jones Catherine
Kestemont Mike
Koolen Marijn
van Zundert Joris
Publication venue
Publication date: 01/01/2017
Field of study

Digital Humanities BeNeLux is a grass roots initiative to foster knowledge networking and dissemination in digital humanities in Belgium, the Netherlands, and Luxembourg. This special issue highlights a selection of the work that was presented at the DHBenelux 2015 Conference by way of anthology for the digital humanities currently being done in the Benelux area and beyond. The introduction describes why this grass roots initiative came about and how DHBenelux is currently supporting community building and knowledge exchange for digital humanities in the Benelux area and how this is integrating regional digital humanities in the larger international digital humanities environment

Ghent University Academic Bibliography

Institutional Repository Universiteit Antwerpen

From media crossing to media mining

Author: Jong Franciska de
Publication venue
Publication date: 01/01/2006
Field of study

This paper reviews how the concept of Media Crossing has contributed to the advancement of the application domain of information access and explores directions for a future research agenda. These will include themes that could help to broaden the scope and to incorporate the concept of medium-crossing in a more general approach that not only uses combinations of medium-specific processing, but that also exploits more abstract medium-independent representations, partly based on the foundational work on statistical language models for information retrieval. Three examples of successful applications of media crossing will be presented, with a focus on the aspects that could be considered a first step towards a generalized form of media mining

University of Twente Research Information

PoliMedia - Improving Analyses of Radio, TV & Newspaper Coverage of Political Debates

Author: D. Juric
J. Bignell
M. Kemman
R.A. Santen Van
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

Abstract. Analysing media coverage across several types of media-outlets is a challenging task for academic researchers. The PoliMedia project aimed to showcase the potential of cross-media analysis by linking the digitised transcriptions of the debates at the Dutch Parliament (Dutch Hansard) with three media-outlets: 1) newspapers in their original layout of the historical newspaper archive at the National Library, 2) radio bulletins of the Dutch National Press Agency (ANP) and 3) newscasts and current affairs programs from the Netherlands Institute for Sound and Vision. In this paper we describe generally how these links were created and we introduce the PoliMedia search user interface developed for scholars to navigate the links. In evaluation it was found that the linking algorithm had a recall of 67% and precision of 75%. Moreover, in an eye tracking evaluation we found that the interface enabled scholars to perform known-item and exploratory searches for qualitative analysis

Crossref

EUR Research Repository

Erasmus University Digital Repository

Open Repository and Bibliography - Luxembourg