Data Stories in CLARIAH: Developing a Research Infrastructure for Storytelling with Heritage and Culture Data

Abstract

Online stories, from blog posts to journalistic articles to scientific publications, are commonly illustrated with media (e.g. images, audio clips) or statistical summaries (e.g. tables and graphs). Such “illustrations” are the result of a process of acquiring, parsing, filtering, mining, representing, refining and interacting with data [3]. Unfortunately, such processes are typically taken for granted and seldom mentioned in the story itself. Although recently a wide variety of interactive data visualisation techniques have been developed (see e.g., [6]), in many cases the illustrations in such publications are static; this prevents different audiences from engaging with the data and analyses as they desire. In this paper, we share our experiences with the concept of “data stories” that tackles both issues, enhancing opportunities for outreach, reporting on scientific inquiry, and FAIR data representation [9]. In journalism data stories are becoming widely accepted as the output of a process that is in many aspects similar to that of a computational scholar: gaining insights by analyzing data sets using (semi-)automatized methods and presenting these insights using (interactive) visualizations and other textual outputs based on data [4] [7] [5] [6]. In the context of scientific output, data stories can be regarded as digital “publications enriched with or linking to related research results, such as research data, workflows, software, and possibly connections among them” [1]. However, as infrastructure for (peerreviewed) enhanced publications is in an early stage of development (see e.g., [2]), scholarly data stories are currently often produced as blog posts, discussing a relevant topic. These may be accompanied by illustrations not limited to a single graph or image but characterized by different forms of interactivity: readers can, for instance, change the perspective or zoom level of graphs, or cycle through images or audio clips. Having experimented successfully with various types and uses of data stories1 in the CLARIAH2 project, we are working towards a more generic, stable and sustainable infrastructure to create, publish, and archive data stories. This includes providing environments for reproduction of data stories and verification of data via “close reading”. From an infrastructure perspective, this involves the provisioning of services for persistent storage of data (e.g. triple stores), data registration and search (registries), data publication (SPARQL end-points, search-APIs), data visualization, and (versioned) query creation. These services can be used by environments to develop data stories, either or not facilitating additional data analysis steps. For data stories that make use of data analysis, for example via Jupyter Notebooks [8], the infrastructure also needs to take computational requirements (load balancing) and restrictions (security) into account. Also, when data sets are restricted for copyright or privacy reasons, authentication and authorization infrastructure (AAI) is required. The large and rich data sets in (European) heritage archives that are increasingly made interoperable using FAIR principles, are eminently qualified as fertile ground for data stories. We therefore hope to be able to present our experiences with data stories, share our strategy for a more generic solution and receive feedback on shared challenges

    Similar works