3 research outputs found

    The EDIT Platform for Cybertaxonomy - an integrated software environment for biodiversity research data management

    Get PDF
    The Platform for Cybertaxonomy [1], developed as part of the EU Network of Excellence EDIT (European Distributed Institute of Taxonomy), is an open-source software framework covering the full breadth of the taxonomic workflow, from fieldwork to publication [2]. It provides a number of tools for full, customized access to taxonomic data, editing and management, and collaborative team work. At the core of the platform is the Common Data Model [3], offering a comprehensive information model covering all relevant data domains: names and classifications, descriptive data (morphological and molecular), media, geographic information, literature, specimens, persons, and external resources [4]. The model adheres to community standards developed by the Biodiversity Information Standards organization TDWG [5]. Apart from its role as a software suite supporting the taxonomic workflow, the platform is a powerful information broker for a broad range of taxonomic data providing solid and open interfaces including a Java programmer’s library and a CDM Rest Service Layer. In the context of the DFG-funded "Additivity" project ("Achieving additivity of structured taxonomic character data by persistently linking them to preserved individual specimens", DFG project number 310530378), we are developing components for capturing and processing formal descriptions of specimens as well as algorithms for aggregating data from individual specimens in order to compute species-level descriptions [6]. Well-defined and agreed descriptive vocabularies referring to structures, characters and character states are instrumental in ensuring the consistency and comparability of measurements. This will be addressed with a new EDIT Platform module for specifying vocabularies based on existing ontologies for descriptive data. To ensure that these vocabularies can be re-used in different contexts, we are planning an interface to the Terminology Service developed by the German Federation for Biological Data (GFBio) [7]. The Terminology Service provides a semantic standards aware and harmonised access point for distributed or locally stored ontologies required for biodiversity research data management, archiving and publication processes [8]. The interface will work with a new OWL export function of the CDM library, which provides EDIT Platform vocabularies in a format that can be read by the import module of the Terminology Service. In addition, the EDIT Platform will be equipped with the ability to import semantic concepts from the Terminology Service using its API and keeping a persistent link to the original concept. With an active pipeline between the EDIT Platform and the GFBio Terminology Service, terminologies originating from the taxonomic research process can be re-used in different research contexts as well as for the semantic annotation and integration of existing research data processed by the GFBio archiving and data publication infrastructure. KEYWORDS: taxonomic computing, descriptive data, terminology, inference REFERENCES: 1. EDIT Platform for Cybertaxonomy. http://www.cybertaxonomy.org (accessed 17 May 2018). 2. Ciardelli, P., Kelbert, P., Kohlbecker, A., Hoffmann, N., Güntsch, A. & Berendsohn, W. G., 2009. The EDIT Platform for Cybertaxonomy and the Taxonomic Workflow: Selected Components, in: Fischer, S., Maehle, E., Reischuk, R. (Eds.): INFORMATIK 2009 – Im Focus das Leben. GI-Edition: Lecture Notes in Informatics (LNI) – Proceedings 154. Köllen Verlag, Bonn, pp. 28;625-638. 3. Müller, A., Berendsohn, W. G., Kohlbecker, A., Güntsch, A., Plitzner, P. & Luther, K., 2017. A Comprehensive and Standards-Aware Common Data Model (CDM) for Taxonomic Research. Proceedings of TDWG 1: e20367. https://doi.org/10.3897/tdwgproceedings.1.20367. 4. EDIT Common Data Model. https://dev.e-taxonomy.eu/redmine/projects/edit/wiki/CommonDataModel (accessed 17 May 2018). 5. Biodiversity Information Standards TDWG. http://www.tdwg.org/ (accessed 17 May 2018). 6. Henning T., Plitzner P., Güntsch A., Berendsohn W. G., Müller A. & Kilian N., 2018. Building compatible and dynamic character matrices – Current and future use of specimen-based character data. Bot. Lett. https://doi.org/10.1080/23818107.2018.1452791. 7. Diepenbroek, M., Glöckner, F., Grobe, P., Güntsch, A., Huber, R., König-Ries, B., Kostadinov, I., Nieschulze, J., Seeger, B.; Tolksdorf, R. & Triebel, D., 2014. Towards an Integrated Biodiversity and Ecological Research Data Management and Archiving Platform: The German Federation for the Curation of Biological Data (GFBio), in: Plödereder, E., Grunske, L., Schneider, E., Ull, D. (Eds.): Informatik 2014 – Big Data Komplexität meistern. GI-Edition: Lecture Notes in Informatics (LNI) – Proceedings 232. Köllen Verlag, Bonn, pp. 1711-1724. 8. Karam, N., Müller-Birn, C., Gleisberg, M., Fichtmüller, D., Tolksdorf, R., & Güntsch, A., 2016. A Terminology Service Supporting Semantic Annotation, Integration, Discovery and Analysis of Interdisciplinary Research Data. Datenbank-Spektrum, 16(3), 195–205. https://doi.org/10.1007/s13222-016-0231-8

    Long-Term Reusability of Biodiversity and Collection Data using a National Federated Data Infrastructure

    No full text
    GFBio “German Federation for Biological Data” is a data infrastructure and network set up by several research institutions in Germany. It fosters archiving and long-term reusability of research data and provides open and free access via a joint web portal at www.gfbio.org. As part of the working procedures data are semantically enriched and provided via a visualization and analysis tool. The main aim of the infrastructure is to make research data from the biological domain reusable and accessible on the long run following FAIR principles. In order to achieve this, several workflows and best practices have been established. The archiving of biodiversity and collection research data follows the reference model (ISO 14721) for an Open Archival Information System (OAIS). The challenges for making data reusable is on the one hand the heterogeneity of this data, on the other hand the often implicit but differing semantics making data integration a hard and difficult process. The use of data management plans is one approach we run to face and solve the challenges. Data management plans contain recipes about the research data, the tools used to acquire data, the content- and exchange formats, the metadata required to describe the data, and finally the costs and resources needed by data providers to deliver structured “Submission Information Packages” (SIPs) in the sense of OAIS. The archiving of a data package as “Archival Information Package” (AIP) is not sufficient to make it reusable in the future. Changes in the semantic meaning over time (content obsolescence), changes in the formats (format obsolescence), and changes in the technology of storage media (hardware obsolescence) are the major factors to be considered here. According to the FAIR principles and to our understanding data is best preserved if it is visible and available for use. The biodiversity and collection data centers involved in GFBio therefore have a curation layer (cf. management aka OAIS) in the archiving pipeline assembling their in-house management systems for sample and observation data and their asset management systems for all kinds of multimedia. This layer allows a continuous quality control and review of the incoming information packages. Thus, data providers can continuously maintain their data if wished for. The data are stored as AIPs sensu OAIS at the specialized data centers and are accessed by GFBio's core system. Dissemination Information Packages (DIPs) can be generated continuously at every time from the data and disseminated using content standards for data and metadata, like EML, ABCD and MIxS. Data are available via the GFBio website and in parallel using other web portals from the biological domain, e.g. INSDC and GBIF. The GFBio data centers now strive for certification of their archiving processes using the Core Trust Seal and for the certification of the FAIRness of single data records. Established data flows and documentation on best practices are available under: www.gfbio.org/data-centers
    corecore