18 research outputs found

    No Pain No Gain: Standards mapping in Latimer Core development

    Get PDF
    Latimer Core (LtC) is a new proposed Biodiversity Information Standards (TDWG) data standard that supports the representation and discovery of natural science collections by structuring data about the groups of objects that those collections and their subcomponents encompass (Woodburn et al. 2022). It is designed to be applicable to a range of use cases that include high level collection registries, rich textual narratives and semantic networks of collections, as well as more granular, quantitative breakdowns of collections to aid collection discovery and digitisation planning.As a standard that is (in this first version) focused on natural science collections, LtC has significant intersections with existing data standards and models (Fig. 1) that represent individual natural science objects and occurrences and their associated data (e.g., Darwin Core (DwC), Access to Biological Collection Data (ABCD), Conceptual Reference Model of the International Committee on Documentation (CIDOC-CRM)). LtC's scope also overlaps with standards for more generic concepts like metadata, organisations, people and activities (i.e., Dublin Core, World Wide Web Consortium (W3C) ORG Ontology and PROV Ontology, Schema.org). LtC represents just an element of this extended network of data standards for the natural sciences and related concepts. Mapping between LtC and intersecting standards is therefore crucial for avoiding duplication of effort in the standard development process, and ensuring that data stored using the different standards are as interoperable as possible in alignment with FAIR (Findable, Accessible, Interoperable, Reusable) principles. In particular, it is vital to make robust associations between records representing groups of objects in LtC and records (where available) that represent the objects within those groups.During LtC development, efforts were made to identify and align with relevant standards and vocabularies, and adopt existing terms from them where possible. During expert review, a more structured approach was proposed and implemented using the Simple Knowledge Organization System (SKOS) mappingRelation vocabulary. This exercise helped to better describe the nature of the mappings between new LtC terms and related terms in other standards, and to validate decisions around the borrowing of existing terms for LtC. A further exercise also used elements of the Simple Standard for Sharing Ontological Mappings (SSSOM) to start to develop a more comprehensive set of metadata around these mappings. At present, these mappings (Suppl. material 1 and Suppl. material 2) are provisional and not considered to be comprehensive, but should be further refined and expanded over time.Even with the support provided by the SKOS and SSSOM standards, the LtC experience has proven the mapping process to be far from straightforward. Different standards vary in how they are structured, for example, DwC is a 'bag of terms', with informal classes and no structural constraints, while more structured standards and ontologies like ABCD and PROV employ different approaches to how structure is defined and documented. The various standards use different metadata schemas and serialisations (e.g., Resource Description Framework (RDF), XML) for their documentation, and different approaches to providing persistent, resolvable identifiers for their terms. There are also many subtle nuances involved in assessing the alignment between the concepts that the source and target terms represent, particularly when assessing whether a match is exact enough to allow the existing term to be adopted. These factors make the mapping process quite manual and labour-intensive. Approaches and tools, such as developing decision trees (Fig. 2) to represent the logic involved and further exploration of the SSSOM standard, could help to streamline this process.In this presentation, we will discuss the LtC experience of the standard mapping process, the challenges faced and methods used, and the potential to contribute this experience to a collaborative standards mapping within the anticipated TDWG Standards Mapping Interest Group

    Participative decision making and the sharing of benefits: laws, ethics, and data protection for building extended global communities

    Get PDF
    Transdisciplinary and cross-cultural cooperation and collaboration are needed to build extended, densely interconnected information resources. These are the prerequisites for the successful implementation and execution of, for example, an ambitious monitoring framework accompanying the post-2020 Global Biodiversity Framework (GBF) of the Convention on Biological Diversity (CBD; SCBD 2021). Data infrastructures that meet the requirements and preferences of concerned communities can focus and attract community involvement, thereby promoting participatory decision making and the sharing of benefits. Community acceptance, in turn, drives the development of the data resources and data use. Earlier this year, the alliance for biodiversity knowledge (2021a) conducted forum-based consultations seeking community input on designing the next generation of digital specimen representations and consequently enhanced infrastructures. The multitudes of connections that arise from extending the digital specimen representations through linkages in all “directions” will form a powerful network of information for research and application. Yet, with the power of an extended, accessible data network comes the responsibility to protect sensitive information (e.g., the locations of threatened populations, culturally context-sensitive traditional knowledge, or businesses’ fundamental data and infrastructure assets). In addition, existing legislation regulates access and the fair and equitable sharing of benefits. Current negotiations on ‘Digital Sequence Information’ under the CBD suggest such obligations might increase and become more complex in the context of extensible information networks. For example, in the case of data and resources funded by taxpayers in the EU, such access should follow the general principle of being “as open as possible; as closed as is legally necessary” (cp. EC 2016). At the same time, the international regulations of the CBD Nagoya Protocol (SCBD 2011) need to be taken into account. Summarizing main outcomes from the consultation discussions in the forum thread “Meeting legal/regulatory, ethical and sensitive data obligations” (alliance for biodiversity knowledge 2021b), we propose a framework of ten guidelines and functionalities to achieve community building and drive application: Substantially contribute to the conservation and protection of biodiversity (cp. EC 2020). Use language that is CBD conformant. Show the importance of the digital and extensible specimen infrastructure for the continuing design and implementation of the post-2020 GBF, as well as the mobilisation and aggregation of data for its monitoring elements and indicators. Strive to openly publish as much data and metadata as possible online. Establish a powerful and well-thought-out layer of user and data access management, ensuring security of ‘sensitive data’. Encrypt data and metadata where necessary at the level of an individual specimen or digital object; provide access via digital cryptographic keys. Link obligations, rights and cultural information regarding use to the digital key (e.g. CARE principles (Carroll et al. 2020), Local Context-labels (Local Contexts 2021), licenses, permits, use and loan agreements, etc.). Implement a transactional system that records every transaction. Amplify workforce capacity across the digital realm, its work areas and workflows. Do no harm (EC 2020): Reduce the social and ecological footprint of the implementation, aiming for a long-term sustainable infrastructure across its life-cycle, including development, implementation and management stages. Balancing the needs for open access, as well as protection, accountability and sustainability, the framework is designed to function as a robust interface between the (research) infrastructure implementing the extensible network of digital specimen representations, and the myriad of applications and operations in the real world. With the legal, ethical and data protection layers of the framework in place, the infrastructure will provide legal clarity and security for data providers and users, specifically in the context of access and benefit sharing under the CBD and its Nagoya Protocol. Forming layers of protection, the characteristics and functionalities of the framework are envisioned to be flexible and finely-grained, adjustable to fulfill the needs and preferences of a wide range of stakeholders and communities, while remaining focused on the protection and rights of the natural world. Respecting different value systems and national policies, the framework is expected to allow a divergence of views to coexist and balance differing interests. Thus, the infrastructure of the digital extensible specimen network is fair and equitable to many providers and users. This foundation has the capacity and potential to bring together the diverse global communities using, managing and protecting biodiversity

    Highlights and outcomes of the 2021 Global Community Consultation

    Get PDF
    International collaboration between collections, aggregators, and researchers within the biodiversity community and beyond is becoming increasingly important in our efforts to support biodiversity, conservation and the life of the planet. The social, technical, logistical and financial aspects of an equitable biodiversity data landscape – from workforce training and mobilization of linked specimen data, to data integration, use and publication – must be considered globally and within the context of a growing biodiversity crisis. In recent years, several initiatives have outlined paths forward that describe how digital versions of natural history specimens can be extended and linked with associated data. In the United States, Webster (2017) presented the “extended specimen”, which was expanded upon by Lendemer et al. (2019) through the work of the Biodiversity Collections Network (BCoN). At the same time, a “digital specimen” concept was developed by DiSSCo in Europe (Hardisty 2020). Both the extended and digital specimen concepts depict a digital proxy of an analog natural history specimen, whose digital nature provides greater capabilities such as being machine-processable, linkages with associated data, globally accessible information-rich biodiversity data, improved tracking, attribution and annotation, additional opportunities for data use and cross-disciplinary collaborations forming the basis for FAIR (Findable, Accessible, Interoperable, Reproducible) and equitable sharing of benefits worldwide, and innumerable other advantages, with slight variation in how an extended or digital specimen model would be executed. Recognizing the need to align the two closely-related concepts, and to provide a place for open discussion around various topics of the Digital Extended Specimen (DES; the current working name for the joined concepts), we initiated a virtual consultation on the discourse platform hosted by the Alliance for Biodiversity Knowledge through GBIF. This platform provided a forum for threaded discussions around topics related and relevant to the DES. The goals of the consultation align with the goals of the Alliance for Biodiversity Knowledge: expand participation in the process, build support for further collaboration, identify use cases, identify significant challenges and obstacles, and develop a comprehensive roadmap towards achieving the vision for a global specification for data integration. In early 2021, Phase 1 launched with five topics: Making FAIR data for specimens accessible; Extending, enriching and integrating data; Annotating specimens and other data; Data attribution; and Analyzing/mining specimen data for novel applications. This round of full discussion was productive and engaged dozens of contributors, with hundreds of posts and thousands of views. During Phase 1, several deeper, more technical, or additional topics of relevance were identified and formed the foundation for Phase 2 which began in May 2021 with the following topics: Robust access points and data infrastructure alignment; Persistent identifier (PID) scheme(s); Meeting legal/regulatory, ethical and sensitive data obligations; Workforce capacity development and inclusivity; Transactional mechanisms and provenance; and Partnerships to collaborate more effectively. In Phase 2 fruitful progress was made towards solutions to some of these complex functional and technical long-term goals. Simultaneously, our commitment to open participation was reinforced, through increased efforts to involve new voices from allied and complementary fields. Among a wealth of ideas expressed, the community highlighted the need for unambiguous persistent identifiers and a dedicated agent to assign them, support for a fully linked system that includes robust publishing mechanisms, strong support for social structures that build trustworthiness of the system, appropriate attribution of legacy and new work, a system that is inclusive, removed from colonial practices, and supportive of creative use of biodiversity data, building a truly global data infrastructure, balancing open access with legal obligations and ethical responsibilities, and the partnerships necessary for success. These two consultation periods, and the myriad activities surrounding the online discussion, produced a wide variety of perspectives, strategies, and approaches to converging the digital and extended specimen concepts, and progressing plans for the DES -- steps necessary to improve access to research-ready data to advance our understanding of the diversity and distribution of life. Discussions continue and we hope to include your contributions to the DES in future implementation plans

    Digital Extended Specimens: Enabling an Extensible Network of Biodiversity Data Records as Integrated Digital Objects on the Internet

    Get PDF
    The early twenty-first century has witnessed massive expansions in availability and accessibility of digital data in virtually all domains of the biodiversity sciences. Led by an array of asynchronous digitization activities spanning ecological, environmental, climatological, and biological collections data, these initiatives have resulted in a plethora of mostly disconnected and siloed data, leaving to researchers the tedious and time-consuming manual task of finding and connecting them in usable ways, integrating them into coherent data sets, and making them interoperable. The focus to date has been on elevating analog and physical records to digital replicas in local databases prior to elevating them to ever-growing aggregations of essentially disconnected discipline-specific information. In the present article, we propose a new interconnected network of digital objects on the Internet—the Digital Extended Specimen (DES) network—that transcends existing aggregator technology, augments the DES with third-party data through machine algorithms, and provides a platform for more efficient research and robust interdisciplinary discovery

    Building the Digital Extended Specimen: A case study of invasive European frog-bit (Hydrocharis morsus-ranae L.)

    No full text
    The Extended Specimen was first described by Webster (2017). He defined a “constellation of specimen preparations and data types,” centered around an occurrence of an organism, which captures the breadth of empirical facts about an organism’s phenotype, genotype, and ecology in space and time. The Extended Specimen Network was embraced by the collections community in the Biodiversity Collections Network Extended Specimen Report (Lendemer et al. 2020) and the National Academies of Science, Engineering, and Medicine Future of Collections report (Lendemer et al. 2020, National Academies of Science, Engineering, and Medicine 2020). Several global discussions are underway to build a common definition of the Digital Extended Specimen (DES) and elucidate next steps in building the infrastructure to support Digital Extended Specimens and their network of associated data (including efforts among Distributed System of Scientific Collections (DiSSCo), Biodiversity Collections Network (BCoN), GBIF’s Alliance for Biodiversity Knowledge, TDWG's Task Group on Minimum Information about a Digital Specimen (MIDS), and others.) At the foundation of the DES is the occurrence of an organism in time and space, which is represented by physical specimens or observations serving as tokens of reality. Tokens are translated to digital records, which can be extended through a network of linkages between them and with derived and associated data, e.g. project methodologies, environmental conditions, habitat characteristics, and associated taxa. For digital records to be integrated with the larger network of Digital Extended Specimens, they must become FAIR digital objects that are Findable, Accessible, Interoperable, and Reusable (Wilkinson et al. 2016). By translating the Digital Extended Specimen concept to the local project scale, we provide opportunities to move beyond a theoretical understanding of the DES and towards a practical framework for its implementation.Here we present and discuss the power, limits, and questions in the implementation of the Digital Extended Specimen framework by applying it to the case study of an invasive aquatic plant in the Laurentian Great Lakes region. European frog-bit (Hydrocharis morsus-ranae L.; EFB) is native to western and northern Eurasia and invasive in North America and India. Dense mats of EFB may hinder commercial and recreational use of waterways and decrease light, dissolved oxygen, and native species diversity. We describe a multi-taxonomic study that examined EFB along with associated plant species, animal species, and environmental characteristics (Monfils et al. 2021). The integration of such diverse types of empirical data is a necessary prerequisite for determining the factors associated with EFB establishment, the impacts of EFB on native coastal wetland ecosystems, and the development of suitable management regimes for the conservation of native biodiversity. Data gathered from this study are housed in a local database. In our database, we consider both physical specimens and recorded observations as tokens of concrete occurrences of EFB, which define the base units. These tokens are linked to their collection events, which provide environmental and sampling context, as well as co-occurrences of other taxa including plants, invertebrates, fish, anurans, reptiles, and birds. Digitally linked, these extensions of each digital representation of a collected token provide not only empirical evidence of an EFB occurrence, but also directly connect it with all additionally sampled, derived, and associated information. Through this network of extensions we gain a more holistic understanding of EFB’s species associations, habitats, and ecosystem impacts at the level of populations and communities. The application of the Digital Extended Specimen framework at the project level illustrates how the DES can be used in a real-world context and highlights challenges in translating the concept from a theoretical to a practical perspective

    The Digital Extended Specimen will Enable New Science and Applications

    No full text
    Specimens have long been viewed as critical to research in the natural sciences because each specimen captures the phenotype (and often the genotype) of a particular individual at a particular point in space and time. In recent years there has been considerable focus on digitizing the many physical specimens currently in the world’s natural history research collections. As a result, a growing number of specimens are each now represented by their own “digital specimen”, that is, a findable, accessible, interoperable and re-usable (FAIR) digital representation of the physical specimen, which contains data about it. At the same time, there has been growing recognition that each digital specimen can be extended, and made more valuable for research, by linking it to data/samples derived from the curated physical specimen itself (e.g., computed tomography (CT) scan imagery, DNA sequences or tissue samples), directly related specimens or data about the organism's life (e.g., specimens of parasites collected from it, photos or recordings of the organism in life, immediate surrounding ecological community), and the wide range of associated specimen-independent data sets and model-based contextualisations (e.g., taxonomic information, conservation status, bioclimatological region, remote sensing images, environmental-climatological data, traditional knowledge, genome annotations). The resulting connected network of extended digital specimens will enable new research on a number of fronts, and indeed this has already begun. The new types of research enabled fall into four distinct but overlapping categories.First, because the digital specimen is a surrogate—acting on the Internet for a physical specimen in a natural science collection—it is amenable to analytical approaches that are simply not possible with physical specimens. For example, digital specimens can serve as training, validation and test sets for predictive process-based or machine learning algorithms, which are opening new doors of discovery and forecasting. Such sophisticated and powerful analytical approaches depend on FAIR, and on extended digital specimen data being as open as possible. These analytical approaches are derived from biodiversity monitoring outputs that are critically needed by the biodiversity community because they are central to conservation efforts at all levels of analysis, from genetics to species to ecosystem diversity.Second, linking specimens to closely associated specimens (potentially across multiple disparate collections) allows for the coordinated co-analysis of those specimens. For example, linking specimens of parasites/pathogens to specimens of the hosts from which they were collected, allows for a powerful new understanding of coevolution, including pathogen range expansion and shifts to new hosts. Similarly, linking specimens of pollinators, their food plants, and their predators can help untangle complex food webs and multi-trophic interactions.Third, linking derived data to their associated voucher specimens increases information richness, density, and robustness, thereby allowing for novel types of analyses, strengthening validation through linked independent data and thus, improving confidence levels and risk assessment. For example, digital representations of specimens, which incorporate e.g., images, CT scans, or vocalizations, may capture important information that otherwise is lost during preservation, such as coloration or behavior. In addition, permanently linking genetic and genomic data to the specimen of the individual from which they were derived—something that is currently done inconsistently—allows for detailed studies of the connections between genotype and phenotype. Furthermore, persistent links to physical specimens, of additional information and associated transactions, are the building blocks of documentation and preservation of chains of custody. The links will also facilitate data cleaning, updating, as well as maintenance of digital specimens and their derived and associated datasets, with ever-expanding research questions and applied uses materializing over time. The resulting high-quality data resources are needed for fact-based decision-making and forecasting based on monitoring, forensics and prediction workflows in conservation, sustainable management and policy-making.Finally, linking specimens to diverse but associated datasets allows for detailed, often transdisciplinary, studies of topics ranging from local adaptation, through the forces driving range expansion and contraction (critically important to our understanding of the consequences of climate change), and social vectors in disease transmission.A network of extended digital specimens will enable new and critically important research and applications in all of these categories, as well as science and uses that we cannot yet envision

    Modelling exploration of the future of European beech (Fagus sylvatica L.) under climate change-Range, abundance, genetic diversity and adaptive response

    No full text
    We explored impacts of climate change on the geographic distribution of European beech by applying state of the art statistical and process-based models, and assessed possible climate change impacts on both adaptive capacity in the centre of its distribution and adaptive responses of functional traits at the leading and trailing edge of the current distribution. The species area models agree that beech has the potential to expand its northern edge and loose habitat at the southern edge of its distribution in a future climate. The change in local population size in the centre of the distribution of beech has a small effect on the genetic diversity of beech, which is projected to maintain its current population size or to increase in population size. Thus, an adaptive response of functional traits of small populations at the leading and trailing edges of the distribution is possible based on genetic diversity available in the local population, even within a period of 2-3 generations. We conclude that the adaptive responses of key functional traits should not be ignored in climate change impact assessment on beech. Adaptation to the local environment may lead to genetic and phenotypic structured populations over the species area already in few generations, depending on the forest management system applied. We recommend taking local differentiation into account in a future generation of process-based species area models. (C) 2009 Elsevier B.V. All rights reserved

    Joint statement by CETAF, SPNHC and BHL on DATA within scientific publications: clarification of [non]copyrightability

    No full text
    The EU and other states have made legislative efforts to clarify data mining in copyrightable works, but the situation remains obscure and confusing, especially in a globalised field where international legislation can contribute to opacity. The present paper aims at asserting a common position of three communities representing biodiversity sciences and data specialists on this issue and to propose common and best practice guidelines so that they become universally accepted rules.As scientific data users, we take the standpoint that scientific data are not copyrightable and, furthermore, they can be accessed, shared and reused freely. Thus, once legal access has been gained to copyrighted publications, the data within those scholarly publications can be considered to be open data that is freely extractable. This set of recommendations has been reached specifically for scientific use and societal benefits
    corecore