20 research outputs found

    Source Oriented Harmonization of Aggregate Historical Census Data: a Flexible and Accountable Approach in RDF

    Get PDF
    Historical censuses are one of the most challenging datasets to compare over time. While many (successful) efforts have been made by researchers to harmonize these types of data, a lack of a generic workflow thwarts other researchers in their endeavors to do the same. In order to use historical census data for longitudinal analysis, a common process currently often loosely referred to as harmonization is inevitable. This process becomes even more challenging when dealing with aggregate data. Current approaches, whether focusing on micro or aggregate data, mainly provide specific, goal-oriented solutions to solve this problem. The nature of our data calls for an approach which allows different interpretations and preserves the link to the underlying sources at all times. To realize this we need a flexible, bottom-up harmonization process which allows us to iteratively discover the peculiarities of these types of data and provide different interpretations on the same data in an accountable way. In this article, we propose an approach which we refer to as source-oriented harmonization. We use the Resource Description Framework from (RDF) as the technological backbone of our efforts and aim to make the process of harmonization more graspable for others to stimulate similar efforts

    The Aggregate Dutch Historical Censuses

    Get PDF
    Historical censuses have an enormous potential for research. In order to fully use this potential, harmonization of these censuses is essential. During the last decades, enormous efforts have been undertaken in digitizing the published aggregated outcomes of the Dutch historical censuses (1795-1971). Although the accessibility has been improved enormously, researchers must cope with hundreds of heterogeneous and disconnected Excel tables. As a result, the census is still for the most part an untapped source of information. The authors describe the main harmonization challenges of the census and how they work toward one harmonized dataset. They propose a specific approach and model in creating an interlinked census dataset in the Semantic Web using the Resource Description Framework technology

    Integrating Historical Census Data in the Semantic Web

    No full text
    Historical censuses are one the most consulted, reliable and large scale statistical data sources available, describing the demographic, social and economic history of a nation. To answer their research ques- tions, researchers often need to query long time series of census data. However, such longitudinal queries are typically hampered by the scarce integration of the historical censuses, demanding manual and knowledge intensive harmonization and restructuring in order to obtain meaning- ful comparisons over time. The challenges are even harder if provenance microdata is lost. In this paper we describe the methdology followed in CEDAR5, a project of the Computational Humanities Programme6, to provide solutions to these data-issues in the Dutch historical censuses (1795-1971). Our proposal builds on top of Linked Data and the Resource Description Framework (RDF) technology, allowing us to transform the original census tables into a graph of Linked Census Data. With such a graph, every census data-item can be interlinked on the Web with other hubs of historical socioeconomic and demographic information. By fol- lowing the Linked Data principles, our aim is two-fold. On the one hand, we show how the integration of our own historical census data is im- proved by linking them to the network of Linked Historical Datasets on the Web. On the other hand, we envisage new historical classifications (like demographical structures, housing types, occupational classes and statuses, or religious denominations) coming out of our harmonization process, which are not published yet on the Web on a standard manner and could improve the interoperability of other datasets

    From Napoleon Conquests to the Big Brother Sabotage : Harmonization of the Dutch Historical Censuses in the Semantic Web

    No full text
    Around the turn of the 18th century, the first integral pop- ulation enumeration was held in the Netherlands during the Batavian Republic. It took over 30 years before the first official census was, by royal decree, organized and conducted in 1829, and was meant to be held from then onwards every ten years. The Dutch historical censuses are the only large scale, reliable statistical datasets available about the (demo- graphic, social and economic) history of the Netherlands, covering an all-encompassing geographical area for over two centuries (1795ā€“1971). Not surprisingly, the currently preserved and digitized historical censuses are the most consulted historical statistics by researchers. However, the 2 288 census tables are highly disconnected and scarcely integrated in their current form. Meaningful information is still hidden in these miss- ing table-links, meaning that this wealth of information is not reaped to its full potential. In this paper we describe the lessons learnt in CEDAR5, a project of the Computational Humanities Programme6, to provide so- lutions to these integration problems. Our system leverages semantic technologies and Linked Data practices, which allow us to convert the census tables into a graph of fine-grained Linked Census Data. Using the distributed architecture of the Web, we interlink this graph with other online historical socioeconomic and demographic Linked Datasets. We use the information provided by these external links to guide the harmonization process in our dataset. At the same time, we investigate which historical classifications are not online yet following Web stan- dards, and we use our census tables (on demographic structures, housing types, occupational classes and statuses, and religious denominations) to urge the need of publishing these historical classifications on the Web. Such historical hubs could increase enormously the interoperability of other datasets. Finally, we propose a querying pipeline on the resulting harmonized census dataset to enhance the data exploration work by his- torians and social scientists and help answering their research questions

    The CEDAR Project: Publishing and Consuming Harmonized Census Data

    No full text
    Abstract. This paper discusses the use of semantic technologies to increase quality, machineprocessability, format translatability and crossquerying of complex tabular datasets often found in many research areas of the Humanities. In particular, we are interested in enabling longitudinal studies of social processes in the past. We use the historical Dutch censuses as casestudy: census data is currently digitized, but it is notoriously difficult to compare, aggregate and query in a uniform fashion. We describe an approach to achieve these goals, emphasizing open problems and tradeoffs

    CEDAR: The Dutch Historical Censuses as Linked Open Data

    Get PDF
    Here, we describe the CEDAR dataset, a five-star Linked Open Data representation of the Dutch historical censuses. These were conducted in the Netherlands once every 10 years from 1795 to 1971. We produce a linked dataset from a digitized sample of 2,288 tables. It contains more than 6.8 million statistical observations about the demography, labour and housing of Dutch society in the 18th, 19th and 20th centuries. The dataset is modeled using the RDF Data Cube, Open Annotation, and PROV vocabularies. These are used to represent the multidimensionality of the data, to express rules of data harmonization, and to keep track of the provenance of all data points and their transformations, respectively. We link observations within the dataset to well known standard classification systems in social history, such as the Historical International Standard Classification of Occupations (HISCO) and the Amsterdamse Code (AC). The three contributions of the dataset are (1) an easier access to integrated census data for historical researchers; (2) richer connections to related Linked Data resources; and (3) novel concept schemes of historical relevance, like classifications of historical religions and historical house types

    CEDAR: Linked Open Census Data: Project Statement

    No full text
    Census Data Open Linked. From fragment to fabric - Dutch census data in a web of global cultural and historic information (CEDAR) is an ongoing (2011-2015) Dutch multidisciplinary national research project. It is funded by the Royal Netherlands Academy of Arts and Sciences (KNAW) as part of the Computational Humanities Programme. Its participants are Data Archiving and Networked Services (DANS), the VU University Amsterdam, the International Institute of Social History6 (IISH) and the Erasmus University Rotterdam
    corecore