4 research outputs found
Good Applications for Crummy Entity Linkers? The Case of Corpus Selection in Digital Humanities
Over the last decade we have made great progress in entity linking (EL)
systems, but performance may vary depending on the context and, arguably, there
are even principled limitations preventing a "perfect" EL system. This also
suggests that there may be applications for which current "imperfect" EL is
already very useful, and makes finding the "right" application as important as
building the "right" EL system. We investigate the Digital Humanities use case,
where scholars spend a considerable amount of time selecting relevant source
texts. We developed WideNet; a semantically-enhanced search tool which
leverages the strengths of (imperfect) EL without getting in the way of its
expert users. We evaluate this tool in two historical case-studies aiming to
collect a set of references to historical periods in parliamentary debates from
the last two decades; the first targeted the Dutch Golden Age, and the second
World War II. The case-studies conclude with a critical reflection on the
utility of WideNet for this kind of research, after which we outline how such a
real-world application can help to improve EL technology in general.Comment: Accepted for presentation at SEMANTiCS '1
Index-Driven Digitization and Indexation of Historical Archives
The promise of digitization of historical archives lies in their indexation at the level of contents. Unfortunately, this kind of indexation does not scale, if done manually. In this article we present a method to bootstrap the deployment of a content-based information system for digitized historical archives, relying on historical indexing tools. Commonly prepared to search within homogeneous records when the archive was still current, such indexes were as widespread as they were disconnected, that is to say situated in the very records they were meant to index. We first present a conceptual model to describe and manipulate historical indexing tools. We then introduce a methodological framework for their use in order to guide digitization campaigns and index digitized historical records. Finally, we exemplify the approach with a case study on the indexation system of the X Savi alle Decime in Rialto, a Venetian magistracy in charge for the exaction—and related record keeping—of a tax on real estate in early modern Venice
Good Applications for Crummy Entity Linkers? The Case of Corpus Selection in Digital Humanities
Over the last decade we have made great progress in entity linking (EL) systems, but performance may vary depending on the context and, arguably, there are even principled limitations preventing a "perfect" EL system. This also suggests that there may be applications for which current "imperfect" EL is already very useful, and makes finding the "right" application as important as building the "right" EL system. We investigate the Digital Humanities use case, where scholars spend a considerable amount of time selecting relevant source texts. We developed WideNet; a semantically-enhanced search tool which leverages the strengths of (imperfect) EL without getting in the way of its expert users. We evaluate this tool in two historical case-studies aiming to collect a set of references to historical periods in parliamentary debates from the last two decades; the first targeted the Dutch Golden Age, and the second World War II. The case-studies conclude with a critical reflection on the utility of WideNet for this kind of research, after which we outline how such a real-world application can help to improve EL technology in general