15,296 research outputs found
Good Applications for Crummy Entity Linkers? The Case of Corpus Selection in Digital Humanities
Over the last decade we have made great progress in entity linking (EL)
systems, but performance may vary depending on the context and, arguably, there
are even principled limitations preventing a "perfect" EL system. This also
suggests that there may be applications for which current "imperfect" EL is
already very useful, and makes finding the "right" application as important as
building the "right" EL system. We investigate the Digital Humanities use case,
where scholars spend a considerable amount of time selecting relevant source
texts. We developed WideNet; a semantically-enhanced search tool which
leverages the strengths of (imperfect) EL without getting in the way of its
expert users. We evaluate this tool in two historical case-studies aiming to
collect a set of references to historical periods in parliamentary debates from
the last two decades; the first targeted the Dutch Golden Age, and the second
World War II. The case-studies conclude with a critical reflection on the
utility of WideNet for this kind of research, after which we outline how such a
real-world application can help to improve EL technology in general.Comment: Accepted for presentation at SEMANTiCS '1
Semantic Enrichment of a Multilingual Archive with Linked Open Data
This paper introduces MERCKX, a Multilingual Entity/Resource Combiner & Knowledge eXtractor. A case study involving the semantic enrichment of a multilingual archive is presented with the aim of assessing the relevance of natural language processing techniques such as named-entity recognition and entity linking for cultural heritage material. In order to improve the indexing of historical collections, we map entities to the Linked Open Data cloud using a language-independent method. Our evaluation shows that MERCKX outperforms similar tools on the task of place disambiguation and linking, achieving over 80% precision despite lower recall scores. These results are encouraging for small and medium-size cultural institutions since they demonstrate that semantic enrichment can be achieved with limited resources.Peer reviewe
Data Centric Domain Adaptation for Historical Text with OCR Errors
We propose new methods for in-domain and cross-domain Named Entity Recognition (NER) on historical data for Dutch and French. For the cross-domain case, we address domain shift by integrating unsupervised in-domain data via contextualized string embeddings; and OCR errors by injecting synthetic OCR errors into the source domain and address data centric domain adaptation. We propose a general approach to imitate OCR errors in arbitrary input data. Our cross-domain as well as our in-domain results outperform several strong baselines and establish state-of-the-art results. We publish preprocessed versions of the French and Dutch Europeana NER corpora
Automated speech and audio analysis for semantic access to multimedia
The deployment and integration of audio processing tools can enhance the semantic annotation of multimedia content, and as a consequence, improve the effectiveness of conceptual access tools. This paper overviews the various ways in which automatic speech and audio analysis can contribute to increased granularity of automatically extracted metadata. A number of techniques will be presented, including the alignment of speech and text resources, large vocabulary speech recognition, key word spotting and speaker classification. The applicability of techniques will be discussed from a media crossing perspective. The added value of the techniques and their potential contribution to the content value chain will be illustrated by the description of two (complementary) demonstrators for browsing broadcast news archives
DHBeNeLux : incubator for digital humanities in Belgium, the Netherlands and Luxembourg
Digital Humanities BeNeLux is a grass roots initiative to foster knowledge networking and dissemination in digital humanities in Belgium, the Netherlands, and Luxembourg. This special issue highlights a selection of the work that was presented at the DHBenelux 2015 Conference by way of anthology for the digital humanities currently being done in the Benelux area and beyond. The introduction describes why this grass roots initiative came about and how DHBenelux is currently supporting community building and knowledge exchange for digital humanities in the Benelux area and how this is integrating regional digital humanities in the larger international digital humanities environment
From media crossing to media mining
This paper reviews how the concept of Media Crossing has contributed to the advancement of the application domain of information access and explores directions for a future research agenda. These will include themes that could help to broaden the scope and to incorporate the concept of medium-crossing in a more general approach that not only uses combinations of medium-specific processing, but that also exploits more abstract medium-independent representations, partly based on the foundational work on statistical language models for information retrieval. Three examples of successful applications of media crossing will be presented, with a focus on the aspects that could be considered a first step towards a generalized form of media mining
PoliMedia - Improving Analyses of Radio, TV & Newspaper Coverage of Political Debates
Abstract. Analysing media coverage across several types of media-outlets is a
challenging task for academic researchers. The PoliMedia project aimed to
showcase the potential of cross-media analysis by linking the digitised transcriptions
of the debates at the Dutch Parliament (Dutch Hansard) with three
media-outlets: 1) newspapers in their original layout of the historical newspaper
archive at the National Library, 2) radio bulletins of the Dutch National Press
Agency (ANP) and 3) newscasts and current affairs programs from the Netherlands
Institute for Sound and Vision. In this paper we describe generally how
these links were created and we introduce the PoliMedia search user interface
developed for scholars to navigate the links. In evaluation it was found that the
linking algorithm had a recall of 67% and precision of 75%. Moreover, in an
eye tracking evaluation we found that the interface enabled scholars to perform
known-item and exploratory searches for qualitative analysis
- …