Search CORE

20,436 research outputs found

A Method for Record Linkage with Sparse Historical Data

Author: Colavizza Giovanni
Ehrmann Maud
Rochat Yannick
Publication venue
Publication date: 22/03/2016
Field of study

Massive digitization of archival material, coupled with automatic document processing techniques and data visualisation tools offers great opportunities for reconstructing and exploring the past. Unprecedented wealth of historical data (e.g. names of persons, places, transaction records) can indeed be gathered through the transcription and annotation of digitized documents and thereby foster large-scale studies of past societies. Yet, the transformation of hand-written documents into well-represented, structured and connected data is not straightforward and requires several processing steps. In this regard, a key issue is entity record linkage, a process aiming at linking different mentions in texts which refer to the same entity. Also known as entity disambiguation, record linkage is essential in that it allows to identify genuine individuals, to aggregate multi-source information about single entities, and to reconstruct networks across documents and document series. In this paper we present an approach to automatically identify coreferential entity mentions of type Person in a data set derived from Venetian apprenticeship contracts from the early modern period (16th-18th c.). Taking advantage of a manually annotated sub-part of the document series, we compute distances between pairs of mentions, combining various similarity measures based on (sparse) context information and person attributes

Reconciling and Using Historical Person Registers as Linked Open Data in the AcademySampo Portal and Data Service

Author: Hyvönen Eero
Leskinen Petri
Publication venue: Springer
Publication date: 01/10/2021
Field of study

This paper presents a method for extracting and reassembling a genealogical network automatically from a biographical register of historical people. The method is applied to a dataset of short textual biographies about all 28 000 Finnish and Swedish academic people educated in 1640-1899 in Finland. The aim is to connect and disambiguate the relatives mentioned in the biographies in order to build a continuous, genealogical network, which can be used in Digital Humanities for data and network analysis of historical academic people and their lives. An artificial neural network approach is presented for solving a supervised learning task to disambiguate relatives mentioned in the register descriptions using basic biographical information enhanced with an ontology of vocations and additional occasionally sparse genealogical information. Evaluation results of the record linkage are promising and provide novel insights into the problem of historical people register reconciliation. The outcome of the work has been used in practise as part of the in-use AcademySampo portal and linked open data service, a new member in the Sampo series of cultural heritage applications for Digital Humanities.Peer reviewe

Helsingin yliopiston digitaalinen arkisto

Avoiding disclosure of individually identifiable health information: a literature review

Author: Borton Joshua
Fernandes-Huessy Johannes
Gonzalez Claudia
Hair Elizabeth
Holden Craig
Mulcahy Tim
Prada Sergio I
Publication venue
Publication date
Field of study

Achieving data and information dissemination without arming anyone is a central task of any entity in charge of collecting data. In this article, the authors examine the literature on data and statistical confidentiality. Rather than comparing the theoretical properties of specific methods, they emphasize the main themes that emerge from the ongoing discussion among scientists regarding how best to achieve the appropriate balance between data protection, data utility, and data dissemination. They cover the literature on de-identification and reidentification methods with emphasis on health care data. The authors also discuss the benefits and limitations for the most common access methods. Although there is abundant theoretical and empirical research, their review reveals lack of consensus on fundamental questions for empirical practice: How to assess disclosure risk, how to choose among disclosure methods, how to assess reidentification risk, and how to measure utility loss.public use files, disclosure avoidance, reidentification, de-identification, data utility

Geomorphic constraints on fault throw rates and linkage times: Examples from the Northern Gulf of Evia, Greece.

Author: Walker AS
Whittaker AC
Publication venue: 'Wiley'
Publication date: 30/01/2015
Field of study

Spiral - Imperial College Digital Repository

Analysing Timelines of National Histories across Wikipedia Editions: A Comparative Computational Approach

Author: Lemmerich Florian
Samoilenko Anna
Strohmaier Markus
Weller Katrin
Zens Maria
Publication venue
Publication date: 01/01/2017
Field of study

Portrayals of history are never complete, and each description inherently exhibits a specific viewpoint and emphasis. In this paper, we aim to automatically identify such differences by computing timelines and detecting temporal focal points of written history across languages on Wikipedia. In particular, we study articles related to the history of all UN member states and compare them in 30 language editions. We develop a computational approach that allows to identify focal points quantitatively, and find that Wikipedia narratives about national histories (i) are skewed towards more recent events (recency bias) and (ii) are distributed unevenly across the continents with significant focus on the history of European countries (Eurocentric bias). We also establish that national historical timelines vary across language editions, although average interlingual consensus is rather high. We hope that this paper provides a starting point for a broader computational analysis of written history on Wikipedia and elsewhere

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

An application of the variable-r method to subpopulation growth rates in a 19th century agricultural population

Author: Corey Sparks
Publication venue
Publication date
Field of study

This paper presents an analysis of the differential growth rates of the farming and non-farming segments of a rural Scottish community during the 19th and early 20th centuries using the variable-r method allowing for net migration. Using this method, I find that the farming population of Orkney, Scotland, showed less variability in their reproduction and growth rates than the non-farming population during a period of net population decline. I conclude by suggesting that the variable-r method can be used in general cases where the relative growth of subpopulations or subpopulation reproduction is of interest.agricultural population, Scotland, subpopulation growth rates, variable-r method