40 research outputs found

    Yavaa: supporting data workflows from discovery to visualization

    Get PDF
    Recent years have witness an increasing number of data silos being opened up both within organizations and to the general public: Scientists publish their raw data as supplements to articles or even standalone artifacts to enable others to verify and extend their work. Governments pass laws to open up formerly protected data treasures to improve accountability and transparency as well as to enable new business ideas based on this public good. Even companies share structured information about their products and services to advertise their use and thus increase revenue. Exploiting this wealth of information holds many challenges for users, though. Oftentimes data is provided as tables whose sheer endless rows of daunting numbers are barely accessible. InfoVis can mitigate this gap. However, offered visualization options are generally very limited and next to no support is given in applying any of them. The same holds true for data wrangling. Only very few options to adjust the data to the current needs and barely any protection are in place to prevent even the most obvious mistakes. When it comes to data from multiple providers, the situation gets even bleaker. Only recently tools emerged to search for datasets across institutional borders reasonably. Easy-to-use ways to combine these datasets are still missing, though. Finally, results generally lack proper documentation of their provenance. So even the most compelling visualizations can be called into question when their coming about remains unclear. The foundations for a vivid exchange and exploitation of open data are set, but the barrier of entry remains relatively high, especially for non-expert users. This thesis aims to lower that barrier by providing tools and assistance, reducing the amount of prior experience and skills required. It covers the whole workflow ranging from identifying proper datasets, over possible transformations, up until the export of the result in the form of suitable visualizations

    Open Data Kit Goes Semantic - A Contribution to the Interpretability and Interoperability of Citizen Science Data

    Get PDF
    In den Bürgerwissenschaften spielt die Datenerfassung mittels mobiler Anwendungen eine immer wichtigere Rolle. In den letzten Jahren sind dazu eine ganze Reihe von Software-Frameworks entstanden, welche die einfache Erstellung von Umfragen und das Sammeln von Daten über Smartphone-Anwendungen ermöglichen. Diese Frameworks unterstützen zwar die Erstellung und Durchführung solcher Datenerfassungsumfragen, der Datenexport beschränkt sich jedoch meist auf tabellarische Standardformate wie CSV oder Excel. Da neben den eigentlichen Daten nur wenige Metadaten erhoben werden, bleibt zudem oft die Semantik der Daten (Was wurde wie gemessen/beobachtet?) unerfasst. Dies erschwert die Nachnutzung dieser Citizen-Science-Daten über den initialen Kontext hinaus, da Interpretierbarkeit und Integration mit anderen Daten beeinträchtigt werden. Unser Beitrag stellt eine Methode und zugehörige Implementierung vor, welches es Forschern ermöglicht ihre Umfragen für Kampagnen einfach semantisch anzureichern und die erhobenen Daten flexibel zu exportieren. Neben klassischen Formaten wie XML, wird auch der Export nach RDF (Linked Open Data) unterstützt, was die Verknüpfung der Daten mit ihrer maschinenlesbaren Bedeutung ermöglicht. Die Implementierung wurde als Erweiterung des weit verbreiteten Datenerfassungs-Frameworks Open Data Kit 1 (ODK1) realisiert und ist frei verfügbar. Damit leistet sie einen Beitrag zur Interoperabilität und Interpretierbarkeit von Citizen-Science-Daten

    FAIR for Digital Twins

    Get PDF
    We argue to transfer the experiences made in science regarding data handling and exchange to address current data management challenges in industry. Improving the interoperability of data exchanges also in business settings will lower costs and open up new opportunities both in advertising ones own products and services as well as finding proper suppliers and partners

    Analysis of Consistency between Wikidata and Wikipedia Categories

    Get PDF
    Wikipedia categories play a significant role in organizing articles by topic. They form a hierarchy, which groups related articles into larger collections. Wikidata provides a corresponding item for each category and allows to define membership of other items to the specific category by a SPARQL query or by specifying classes and properties. This provides us with multiple, redundant sources of category membership which may deviate quite substantially. In this paper, we investigate inconsistencies between Wikipedia and Wikidata category members and analyze possible reasons. We propose a candidate category generation and evaluation workflow that traverses the category hierarchy of Wikipedia in all available languages and compares the results with information obtained from Wikidata. This workflow can be executed either online using the publicly available endpoints or offline based on the provided dumps. Furthermore, we formulate concrete suggestions to harmonize category membership definitions between Wikipedia and Wikidata

    Showcase: Biodiversity Exploratories Information System Report of our current stage of migration to the new BEXIS2 instance

    Get PDF
    The development of the Biodiversity Exploratories Information System (here BExIS1) started more than 10 years ago and since then has been functioning as the data management platform and information system for the SPP 1374 “Biodiversitäts-Exploratorien” (BE). Since then, more than 1,000 datasets with a total of more than 20,000 variables and several million data rows were uploaded to the system. A dataset consists of metadata, a data structure, and the research data itself. Besides the storage of tabular data, it is also possible to upload unstructured data (files) to the system. Stored data is subject to common operations like updating, editing, and deletion. These actions led to various changes inside the datasets, which are reflected as versions or archived data. Of course, a lot of additional information to run such a system is stored. This mostly concerns authorization and authentication, but also plenty of personal information is included. Migrating data from one system to another is always a major and sometimes thrilling task. Data needs to be transferred without alterations. BEXIS2 uses a different schema to store and handle its data. Furthermore, some functionality is implemented differently from BExIS1. Examples include the concept of variables or the user management. These changes need to be considered during data migration and the transferred data needs to be adjusted to adhere to changed model. We implemented a system to transfer data from a BExIS1 instance to a BEXIS2 instance. BExIS1 data is accessed directly or by using BExIS1 functionalities. The data is then stored in BEXIS2 by making use of BEXIS2 API calls and by using auxiliary information. The main auxiliary source is an instance specific mapping of BExIS1 variables to the different BEXIS2 variable concept. We show our implementation of the system to migrate a production level Bexis1 instance into a newly set up Bexis2 instance. We show steps necessary and obstacles encountered together with our implemented solutions. We intend to make our implementation available to other BExIS1 instance users to facilitate and ease the migration to the new BEXIS2 system

    Large-Scale Data Management for Earth Observation Data - Challenges and Opportunities

    Get PDF
    Earth observation (EO) has witnessed a growing interest in research and industry, as it covers a wide range of different applications, ranging from land monitoring, climate change detection, and emergency management to atmosphere monitoring, among others. Due to the sheer size and heterogeneity of the data, EO poses tremendous challenges to the payload ground segment, to receive, store, process, and preserve the data for later investigation by end users. In this paper we describe the challenges of large-scale data management based on observations from a real system employed for EO at the German Remote Sensing Data Center. We outline research opportunities, which can serve as starting points to spark new research efforts in the management of large volumes of scientific data

    Digital Knowledge Exchange for Circularity of Materials

    Get PDF
    In the project "Methods and Technologies for an intelligent Circularity of Materials" (MaTiC-M) we aim to support the communication between stakeholders - or make it possible in the first place - using digital technologies, in particular using knowledge graphs to capture the most relevant information In our talk, we will give an overview of the project goals as well as its current status
    corecore