Over the course of its four year project timeline, the CENDARI project has
collected archival descriptions and metadata in various formats from a broad
range of cultural heritage institutions. These data were drawn together in a
single repository and are being stored there. The repository contains curated
data which has been manually established by the CENDARI team as well as data
acquired from small, ‘hidden’ archives in spreadsheet format or from big
aggregators with advanced data exchange tools in place. While the acquisition
and curation of heterogeneous data in a single repository presents a technical
challenge in itself, the ingestion of data into the CENDARI repository also
opens up the possibility to process and index them through data extraction,
entity recognition, semantic enhancement and other transformations. In this
way the CENDARI project was able to act as a bridge between cultural heritage
institutions and historical researchers, insofar as it drew together holdings
from a broad range of institutions and enabled the browsing of this
heterogeneous content within a single search space. This paper describes a
broad range of ways in which the CENDARI project acquired data from cultural
heritage institutions as well as the necessary technical background. In
exemplifying diverse data creation or acquisition strategies, multiple formats
and technical solutions, assets and drawbacks of a repository, this “White
Book” aims at providing guidance and advice as well as best practices for
archivists and cultural heritage institutions collaborating or planning to
collaborate with infrastructure projects. http://www.cendari.eu/thematic-
research-guides/white-book-archives The CENDARI White Book of Archives.
Available from: http://hdl.handle.net/2262/7568