11 research outputs found

    Sample data processing in an additive and reproducible taxonomic workflow by using character data persistently linked to preserved individual specimens

    Get PDF
    We present the model and implementation of a workflow that blazes a trail in systematic biology for the re-usability of character data (data on any kind of characters of pheno- and genotypes of organisms) and their additivity from specimen to taxon level. We take into account that any taxon characterization is based on a limited set of sampled individuals and characters, and that consequently any new individual and any new character may affect the recognition of biological entities and/or the subsequent delimitation and characterization of a taxon. Taxon concepts thus frequently change during the knowledge generation process in systematic biology. Structured character data are therefore not only needed for the knowledge generation process but also for easily adapting characterizations of taxa. We aim to facilitate the construction and reproducibility of taxon characterizations from structured character data of changing sample sets by establishing a stable and unambiguous association between each sampled individual and the data processed from it. Our workflow implementation uses the European Distributed Institute of Taxonomy Platform, a comprehensive taxonomic data management and publication environment to: (i) establish a reproducible connection between sampled individuals and all samples derived from them; (ii) stably link sample-based character data with the metadata of the respective samples; (iii) record and store structured specimen-based character data in formats allowing data exchange; (iv) reversibly assign sample metadata and character datasets to taxa in an editable classification and display them and (v) organize data exchange via standard exchange formats and enable the link between the character datasets and samples in research collections, ensuring high visibility and instant re-usability of the data. The workflow implemented will contribute to organizing the interface between phylogenetic analysis and revisionary taxonomic or monographic work

    The EDIT Platform for Cybertaxonomy - an integrated software environment for biodiversity research data management

    Get PDF
    The Platform for Cybertaxonomy [1], developed as part of the EU Network of Excellence EDIT (European Distributed Institute of Taxonomy), is an open-source software framework covering the full breadth of the taxonomic workflow, from fieldwork to publication [2]. It provides a number of tools for full, customized access to taxonomic data, editing and management, and collaborative team work. At the core of the platform is the Common Data Model [3], offering a comprehensive information model covering all relevant data domains: names and classifications, descriptive data (morphological and molecular), media, geographic information, literature, specimens, persons, and external resources [4]. The model adheres to community standards developed by the Biodiversity Information Standards organization TDWG [5]. Apart from its role as a software suite supporting the taxonomic workflow, the platform is a powerful information broker for a broad range of taxonomic data providing solid and open interfaces including a Java programmer’s library and a CDM Rest Service Layer. In the context of the DFG-funded "Additivity" project ("Achieving additivity of structured taxonomic character data by persistently linking them to preserved individual specimens", DFG project number 310530378), we are developing components for capturing and processing formal descriptions of specimens as well as algorithms for aggregating data from individual specimens in order to compute species-level descriptions [6]. Well-defined and agreed descriptive vocabularies referring to structures, characters and character states are instrumental in ensuring the consistency and comparability of measurements. This will be addressed with a new EDIT Platform module for specifying vocabularies based on existing ontologies for descriptive data. To ensure that these vocabularies can be re-used in different contexts, we are planning an interface to the Terminology Service developed by the German Federation for Biological Data (GFBio) [7]. The Terminology Service provides a semantic standards aware and harmonised access point for distributed or locally stored ontologies required for biodiversity research data management, archiving and publication processes [8]. The interface will work with a new OWL export function of the CDM library, which provides EDIT Platform vocabularies in a format that can be read by the import module of the Terminology Service. In addition, the EDIT Platform will be equipped with the ability to import semantic concepts from the Terminology Service using its API and keeping a persistent link to the original concept. With an active pipeline between the EDIT Platform and the GFBio Terminology Service, terminologies originating from the taxonomic research process can be re-used in different research contexts as well as for the semantic annotation and integration of existing research data processed by the GFBio archiving and data publication infrastructure. KEYWORDS: taxonomic computing, descriptive data, terminology, inference REFERENCES: 1. EDIT Platform for Cybertaxonomy. http://www.cybertaxonomy.org (accessed 17 May 2018). 2. Ciardelli, P., Kelbert, P., Kohlbecker, A., Hoffmann, N., Güntsch, A. & Berendsohn, W. G., 2009. The EDIT Platform for Cybertaxonomy and the Taxonomic Workflow: Selected Components, in: Fischer, S., Maehle, E., Reischuk, R. (Eds.): INFORMATIK 2009 – Im Focus das Leben. GI-Edition: Lecture Notes in Informatics (LNI) – Proceedings 154. Köllen Verlag, Bonn, pp. 28;625-638. 3. Müller, A., Berendsohn, W. G., Kohlbecker, A., Güntsch, A., Plitzner, P. & Luther, K., 2017. A Comprehensive and Standards-Aware Common Data Model (CDM) for Taxonomic Research. Proceedings of TDWG 1: e20367. https://doi.org/10.3897/tdwgproceedings.1.20367. 4. EDIT Common Data Model. https://dev.e-taxonomy.eu/redmine/projects/edit/wiki/CommonDataModel (accessed 17 May 2018). 5. Biodiversity Information Standards TDWG. http://www.tdwg.org/ (accessed 17 May 2018). 6. Henning T., Plitzner P., Güntsch A., Berendsohn W. G., Müller A. & Kilian N., 2018. Building compatible and dynamic character matrices – Current and future use of specimen-based character data. Bot. Lett. https://doi.org/10.1080/23818107.2018.1452791. 7. Diepenbroek, M., Glöckner, F., Grobe, P., Güntsch, A., Huber, R., König-Ries, B., Kostadinov, I., Nieschulze, J., Seeger, B.; Tolksdorf, R. & Triebel, D., 2014. Towards an Integrated Biodiversity and Ecological Research Data Management and Archiving Platform: The German Federation for the Curation of Biological Data (GFBio), in: Plödereder, E., Grunske, L., Schneider, E., Ull, D. (Eds.): Informatik 2014 – Big Data Komplexität meistern. GI-Edition: Lecture Notes in Informatics (LNI) – Proceedings 232. Köllen Verlag, Bonn, pp. 1711-1724. 8. Karam, N., Müller-Birn, C., Gleisberg, M., Fichtmüller, D., Tolksdorf, R., & Güntsch, A., 2016. A Terminology Service Supporting Semantic Annotation, Integration, Discovery and Analysis of Interdisciplinary Research Data. Datenbank-Spektrum, 16(3), 195–205. https://doi.org/10.1007/s13222-016-0231-8

    EDIT Platform Web Services in the Biodiversity Infrastructure Landscape

    No full text
    The EDIT Platform for Cybertaxonomy is a standards based suite of software components supporting the taxonomic research workflow from field work to publication in journals and dynamic web portals (FUB, BGBM 2011). The underlying Common Data Model (CDM) covers the main biodiversity informatics foci such as names, classifications, descriptions, literature, multimedia, literature as well as specimens and observations and their derived objects. Today, more than 30 instances of the platform are serving data to the international biodiversity research communities. An often overlooked feature of the platform is its well defined web service layer which provides capable functions for machine access and integration into the growing service-based biodiversity informatics landscape (FUB, BGBM 2010). All platform instances have a pre-installed and open service layer serving three different use cases: The CDM REST API provides a platform independent RESTful (read-only) interface to all resources represented in the CDM. In addition, a set of portal services have been designed to meet the special functional requirements of CDM data portals and their advanced navigation capabilities. While the "raw" REST API has already all functions for searching and browsing the entire information space spanned by the CDM, the integration of CDM services into external infrastructures and workflows requires an additional set of streamlined service endpoints with a special focus on documentation and version stability. To this end, the platform provides a set of "catalogue services" with optimized functions for (fuzzy) name, taxon, and occurrence data searches (FUB, BGBM 2013, FUB, BGBM 2014). A good example for the integration of EDIT platform catalogue services into broader workflows is the "Taxonomic Data Refinement Workflow" implemented in the context of the EU 7th Framework Program Project BioVeL (Hardisty et al. 2016). The workflow uses the service layer of an EDIT Platform based instance of the Catalogue of Life (CoL) for resolving taxonomic discrepancies between specimen datasets (Mathew et al. 2014). The same service is also part of the Unified Taxonomic Information Service (UTIS) providing an easy-to-use interface for running simultaneous searches across multiple taxonomic checklists (FUB, BGBM 2016)

    A Comprehensive and Standards-Aware Common Data Model (CDM) for Taxonomic Research

    No full text
    The EDIT Common Data Model (CDM) (FUB, BGBM 2008) is the centrepiece of the EDIT Platform for Cybertaxonomy (FUB, BGBM 2011, Ciardelli et al. 2009). Building on modelling efforts reaching back to the 1990ies, it aims to combine existing standards relevant to the taxonomic domain (but often designed for data exchange) with requirements of modern taxonomic tools. Modelled in the Unified Modelling Language (UML) (Booch et al. 2005), it offers an object oriented view on the information domain managed by expert taxonomists that is implemented independent of the used operating system and database management system (DBMS). Being used in various national and international research projects with diverse foci over the past decade, the model evolved and became the common base of a variety of taxonomic projects, such as floras, faunas and checklists (see FUB, BGBM 2016 for a number of data portals created and made publicly available by different projects). The CDM is strictly oriented towards the needs of the taxonomic experts community. Where requirements are complex it tries to reflect them reasonably rather than introducing ambiguity or reduced functionality via (over-)simplification. Where simplification is possible it tries to stay or become simple. Simplification on the model level is achieved by implementing business rules via constraints rather than via typification and subclassing. Simplification on the user interface level is achieved by numerous options for customisation. Being used as a generic model for a variety of application types and use cases, it is adaptable and extendable by users and developers. It uses a combination of static and dynamic typification to allow both efficient handling of complex but well-defined data domains such as taxonomic classifications and nomenclature as well as less well-defined flexible domains like factual and descriptive data. Additionally it allows the creation of more than 30 types of user defined vocabularies such as those for taxonomic rank, nomenclatural status, name-to-name relationships, geographic area, presence status, etc. A strong focus is set on good scientific praxis by making the source of almost all data citable in detail and offering data lineage to trace data back to its roots. It is also easy to reflect multiple opinions in parallel, e.g. differing taxonomic concepts (Berendsohn 1995, Berendsohn & al., this session) or several descriptive treatments obtained from different regional floras or faunas. The CDM attempts to comprehensively cover the data used in the taxonomic domain - nomenclature, taxonomy (including concepts), taxon distribution data, descriptive data of all kinds, including morphological data referring to taxa and/or specimens, images and multimedia data of various kinds, and a complex system covering specimens and specimen derivatives down to DNA samples and sequences (Kilian et al. 2015, Stöver and Müller 2015) that mirrors the complexity of knowledge accumulation in the taxonomic research process. In the context of the EDIT Platform, several applications have been developed based on the CDM and the library that provides the API and web Service interfaces based on the CDM (see Kohlbecker & al. and Güntsch & al., this session). In some areas the CDM is still evolving - although the basic structures are present, questions of application development feed back into modelling decisions. However, a "no-shortcuts" approach to modelling has variously delayed application development in the past, but it now pays off: the Platform can rapidly adapt to changing requirements from different projects and taxonomic specialists

    EDIT Platform Web Services in the Biodiversity Infrastructure Landscape

    No full text
    The EDIT Platform for Cybertaxonomy is a standards based suite of software components supporting the taxonomic research workflow from field work to publication in journals and dynamic web portals (FUB, BGBM 2011). The underlying Common Data Model (CDM) covers the main biodiversity informatics foci such as names, classifications, descriptions, literature, multimedia, literature as well as specimens and observations and their derived objects. Today, more than 30 instances of the platform are serving data to the international biodiversity research communities. An often overlooked feature of the platform is its well defined web service layer which provides capable functions for machine access and integration into the growing service-based biodiversity informatics landscape (FUB, BGBM 2010). All platform instances have a pre-installed and open service layer serving three different use cases: The CDM REST API provides a platform independent RESTful (read-only) interface to all resources represented in the CDM. In addition, a set of portal services have been designed to meet the special functional requirements of CDM data portals and their advanced navigation capabilities. While the "raw" REST API has already all functions for searching and browsing the entire information space spanned by the CDM, the integration of CDM services into external infrastructures and workflows requires an additional set of streamlined service endpoints with a special focus on documentation and version stability. To this end, the platform provides a set of "catalogue services" with optimized functions for (fuzzy) name, taxon, and occurrence data searches (FUB, BGBM 2013, FUB, BGBM 2014). A good example for the integration of EDIT platform catalogue services into broader workflows is the "Taxonomic Data Refinement Workflow" implemented in the context of the EU 7th Framework Program Project BioVeL (Hardisty et al. 2016). The workflow uses the service layer of an EDIT Platform based instance of the Catalogue of Life (CoL) for resolving taxonomic discrepancies between specimen datasets (Mathew et al. 2014). The same service is also part of the Unified Taxonomic Information Service (UTIS) providing an easy-to-use interface for running simultaneous searches across multiple taxonomic checklists (FUB, BGBM 2016)

    The Additivity Project: Achieving additivity of structured taxonomic character data by persistently linking them to individual specimens

    No full text
    Herbarium specimens have always played a central role in the classical disciplines of plant sciences and the global digitisation efforts now open new horizons. To make full use of the inherent possibilities of specimen based taxonomic descriptions corresponding workflows are needed. A crucial step in the comparative analyses of organisms is the preparation of a character matrix to record and compare the morphological variation of taxa on the basis of individual specimens. This project focuses on the optimisation of the taxonomic research process with respect to delimitation and characterisation (“descriptions”) of taxa (Henning et al. 2018). The angiosperm order Caryophyllales provides exemplar use cases through cooperation with the Global Caryophyllales Initiative (Borsch et al. 2015). The workflow for sample data handling (Kilian et al. 2015), implemented on the EDIT Platform for Cybertaxonomy (http://www.cybertaxonomy.org, Ciardelli et al. 2009), has been extended to support additive characterisation of taxa via specimen character data. The Common Data Model (CDM), already supporting persistent inter-linking of specimens and their metadata (Plitzner et al. 2017), has been adapted to facilitate specimen descriptions with characters constructed from the combination of structure and property terms and their corresponding states. Semantic web technology is used to establish and continuously elaborate expert community-coordinated exemplar vocabularies with term ontologies and explanations for characters and states (GFBio Terminology Service, Karam et al. 2016). Character data are recorded and stored in structured form in character state matrices for individual specimens instead of taxa, which allows generation of taxon characterisations by aggregating the data sets for the individual specimens included. Separating characters in structures and properties, which are based on concepts in public ontologies, guarantees a high visibility and instant re-usability of these character data. Taking into account that taxon concepts evolve during the iterative knowledge generation process in systematic biology, additivity of character data from specimen to taxon level therefore greatly facilitates the construction and reproducibility of taxon characterisations from changing specimen and character data sets

    The Additivity Project - Use Cases and User Interface

    No full text
    Herbarium specimens are central to botanical science and of rising importance thanks to increasing accessibility and broadened usability. Alongside the many new uses of specimen data, sit a range of traditional uses supporting the collection of morphological data and their application to taxonomy and systematics. (Henning et al. 2018). Technical workflows are needed to support the sustainable collection of this traditional information and maintain the high quality of the morphological data. Data exchange and re-usability requires the use of accepted controlled vocabularies (community approved) that are accessible (web-based ontologies and term vocabularies) and reliable (long-term availability/unique identifiers). The same applies to datasets that must be stored accessibly and sustainably by maintaining all data relationships that would facilitate convenient re-use. This project aims to construct a comprehensive workflow to optimise the delimitation and characterisation (“descriptions”) of taxa (see complementary talk by Plitzner et al.). It is implemented on the open-source software framework of the EDIT Platform for Cybertaxonomy (http://www.cybertaxonomy.org, Ciardelli et al. 2009) extending the workflow for sample data processing developed in a preceding project (Kilian et al. 2015). The  principal goals of this new software component are: specimen-level recording and storage of character data in structured character matrices generating taxon characterisations by aggregating the individual specimen-based datasets using and developing community-coordinated, ontology-based exemplar vocabularies persistently linking character datasets with source specimens for high visibility and re-usability The angiosperm order, Caryophyllales, provides an exemplar use case through cooperation with the Global Caryophyllales Initiative (Borsch et al. 2015). A basic set of morphological terms and vocabularies has been obtained from various online sources (ontologies, glossaries) and can be used, searched and expanded in the EDIT platform. The terms are categorised into: structures, properties and states. Different editors have been developed to combine structure and property terms to characters and assign a customised state vocabulary (categorical) or suitable values and units (numerical) to them. The workflow is built around a data set defining the taxonomic environment of individual use cases. A data set is specified by the characters and a taxonomic group, which can be filtered by area or rank. The dataset can be opened in a tabular representation (character matrix) to enter preselected state terms or values for the individual specimen. The matrix provides several features for basic comparison and analysis and allows the entry of alternative datasets (e.g. literature). Finally, the aggregation of data subsets to potential taxonomic units by adding up the values and summarising character states, allows the convenient test of taxonomic hypotheses. The term additivity is used here to describe this set of workflows and processes adding value to herbarium specimens and accumulating the specimen data for a taxon description

    The CDM Applied: Unit-Derivation, from Field Observations to DNA Sequences

    No full text
    Specimens form the falsifiable evidence used in plant systematics. Derivatives of specimens (including the specimen as the organism in the field) such as tissue and DNA samples play an increasing role in research. The EDIT Platform for Cybertaxonomy is a specialist's tool that allows to document and sustainably store all data that are used in the taxonomic work process, from field data to DNA sequences. The types of data stored can be very heterogeneous consisting of specimens, images, text data, primary data files, taxon assignments, etc. The EDIT Platform organizes the linking between such data by using a generic data model for representing the research process. Each step in the process is regarded as a derivation step and generates a derivative of the previous step. This could be a field unit having a specimen as its derivative or a specimen having a tissue sample as its derivative. Each derivation step also produces meta data storing who, when and how the derivation was done. The Platform's Common Data Model (CDM) and the applications build on the CDM library thus represent the first comprehensive implementation of the largely theoretical models developed in the late 1990ies (Berendsohn et al. 1999). In a pilot project research data about the genus Campanula (Kilian et al. 2015, FUB, BGBM 2012) was gathered and used to create a hierarchy of derivatives reaching from field data to DNA sequences. Additionally, the open source library for multiple sequence alignments LibrAlign (Stöver and Müller 2015) was used to integrate an alignment editor into the EDIT platform that allows to generate consensus sequences as derivatives of DNA sequences. The persistent storage of each link in the derivation process and the degree of detail on how the data and meta data are stored will speed up the research process, ease the reproducibility of research results and enhance sustainability of collections

    The Platform for Cybertaxonomy: Standards, services and tools

    No full text
    The Platform for Cybertaxonomy (http://www.cybertaxonomy.org) is a standards-based open-source software framework covering the breadth of the taxonomic workflow, from fieldwork to publication (Ciardelli et al. 2009). It provides coupled tools for full, customized access to taxonomic data, editing and management, and collaborative team work. At the core of the platform is the Common Data Model (CDM, Müller et al. 2017), offering a comprehensive information model covering all relevant data domains: names and classifications, descriptive data (morphological and molecular), media, geographic information, literature, specimens, types, persons, and external resources. Platform compliant software interacts via services and includes the following components: CDM Server Taxonomic Editor Rich Client Web-based editors Drupal-based and highly configurable portal software Map services and map viewer Xper2 descriptive data editor Specimen search tool Import and export modules Recent platform-based developments include software components for deriving formal species-level descriptions from measurements on individual specimens (Henning et al. 2018) as well as a registration system for nomenclatural acts of algae (Phycobank, https://www.phycobank.org/). Currently, about 30 portals with regional and taxonomic foci are using the Platform for Cybertaxonomy as their technical basis for capturing, managing, and publishing biodiversity data over the World Wide Web. Prominent examples are the Euro+Med Plantbase, the International Caryophyllales Network, and the Flora of Greece
    corecore