40 research outputs found

    Community next steps for making globally unique identifiers work for biocollections data

    Get PDF
    Biodiversity data is being digitized and made available online at a rapidly increasing rate but current practices typically do not preserve linkages between these data, which impedes interoperation, provenance tracking, and assembly of larger datasets. For data associated with biocollections, the biodiversity community has long recognized that an essential part of establishing and preserving linkages is to apply globally unique identifiers at the point when data are generated in the field and to persist these identifiers downstream, but this is seldom implemented in practice. There has neither been coalescence towards one single identifier solution (as in some other domains), nor even a set of recommended best practices and standards to support multiple identifier schemes sharing consistent responses. In order to further progress towards a broader community consensus, a group of biocollections and informatics experts assembled in Stockholm in October 2014 to discuss community next steps to overcome current roadblocks. The workshop participants divided into four groups focusing on: identifier practice in current field biocollections; identifier application for legacy biocollections; identifiers as applied to biodiversity data records as they are published and made available in semantically marked-up publications; and cross-cutting identifier solutions that bridge across these domains. The main outcome was consensus on key issues, including recognition of differences between legacy and new biocollections processes, the need for identifier metadata profiles that can report information on identifier persistence missions, and the unambiguous indication of the type of object associated with the identifier. Current identifier characteristics are also summarized, and an overview of available schemes and practices is provided

    Biodiversity and Biocollections: Problem of Correspondence

    Get PDF
    This text is an English translation of those several sections of the original paper in Russian, where collection-related issues are considered. The full citation of the original paper is as following: Pavlinov I.Ya. 2016. [Bioraznoobrazie i biokollektsii: problema sootvetstvia]. In: Pavlinov I.Ya. (comp.). Aspects of Biodiversity. Archives of Zoological Museum of Lomonosov Moscow State University, Vol. 54, Pр. 733–786. Orientation of biology, as a natural science, on the study and explanation of the similarities and differences between organisms led in the second half of the 20th century to the recognition of a specifi c subject area of biological explorations, viz. biodiversity (BD). One of the important general scientifi c prerequisites for this shift was understanding that (at the level of ontology) the structured diversity of the living nature is its fundamental property equivocal to subjecting of some of its manifestations to certain laws. At the level of epistemology, this led to acknowledging that the “diversifi cationary” approach to description of the living beings is as justifi able as the before dominated “unifi cationary” one. This general trend has led to a signifi cant increase in the attention to BD. From a pragmatic perspective, its leitmotif was conservation of BD as a renewable resource, while from a scientifi c perspective the leitmotif was studying it was studying BD as a specifi c natural phenomenon. These two points of view are united by recognition of the need for scientific substantiation of BD conservation strategy, which implies the need for a detailed study of BD itself. At the level of ontology, one of the key problems in the study of BD (leaving aside the question of its genesis) is determination of its structure, which is interpreted as a manifestation of the structure of the Earth’s biota itself. With this, it is acknowledged that the subject area of empirical explorations is not the BD as a whole ( “Umgebung”) but its particular manifestations (“Umwelts”). It is proposed herewith to recognized, within the latter: fragments of BD (especially taxa and ecosystems), hierarchical levels of BD (primarily within- and interorganismal ones), and aspects of BD (before all taxonomic and meronomic ones). Attention is drawn to a new interpretation of bioinformatics as a discipline that studies the information support of BD explorations. An important fraction of this support are biocollections. The scientifi c value of collections means that they make it possible both empirical inferring and testing (verification) of the knowledge about BD. This makes biocollections, in their epistemological status, equivalent to experiments, and so makes studies of BD quite scientific. It is emphasized that the natural objects (naturalia), which are permanently kept in collections, contain primary (objective) information about BD, while information retrieved somehow from them is a secondary (subjective) one. Collection, as an information resource, serves as a research sample in the studies of BD. Collection pool, as the totality of all collection materials kept in repositories according to certain standards, can be treated as a general sample, and every single collection as a local sample. The main characteristic of collection-as-sample is its representativeness; so the basic strategy of development of the collection pool is to maximize its representativeness as a means to ensure correspondence of structure of biocollection pool to that of BD itself. The most fundamental characteristic of collection, as an information resource, is its scientific signifi cance. The following three main groups of more particular characteristics are distinguished: — the “proper” characteristics of every collection are its meaningfulness, informativeness, reliability, adequacy, documenting, systematicity, volume, structure, uniqueness, stability, lability; — the “external” characteristics of collection are resolution, usability, ethic constituent; — the “service” characteristics of collection are its museofication, storage system security, inclusion in metastructure, cost. In the contemporary world, development of the biocollection pool, as a specific resource for BD research, requires considerable organizational efforts, including work on their “information support” aimed at demonstrating the necessity of existence of the biocollections

    Two species? - Limits of the species concepts in the pygmy grasshoppers of the Tetrix bipunctata complex (Orthoptera, Tetrigidae)

    Get PDF
    Today, integrative taxonomy is often considered the gold standard when it comes to species recognition and delimitation. Using the Tetrix bipunctata complex, we here present a case where even integrative taxonomy may reach its limits. The Tetrix bipunctata complex consists of two morphs, bipunctata and kraussi, which are easily distinguished by a single character, the length of the hind wing. Both morphs are widely distributed in Europe and reported to occur over a large area in sympatry, where they occasionally may live also in syntopy. The pattern has led to disparate classifications, as on the one extreme, the morphs were treated merely as forms or subspecies of a single species, on the other, as separate species. For this paper, we re-visited the morphology by using multivariate ratio analysis (MRA) of 17 distance measurements, checked the distributional data based on verified specimens and examined micro-habitat use. We were able to confirm that hind wing length is, indeed, the only morphological difference between bipunctata and kraussi. We were also able to exclude a mere allometric scaling. The morphs are, furthermore, largely sympatrically distributed, with syntopy occurring regularly. However, a microhabitat niche difference can be observed. Ecological measurements in a shared habitat confirm that kraussi prefers a drier and hotter microhabitat, which possibly also explains the generally lower altitudinal distribution. Based on these results, we can exclude classification as subspecies, but the taxonomic classification as species remains unclear. Even with different approaches to classify the Tetrix bipunctata complex, this case is, therefore, not settled. We recommend continuing to record kraussi and bipunctata separately

    Towards a biodiversity knowledge graph

    Get PDF
    One way to think about "core" biodiversity data is as a network of connected entities, such as taxa, taxonomic names, publications, people, species, sequences, images, and collections that form the "biodiversity knowledge graph". Many questions in biodiversity informatics can be framed as paths in this graph. This article explores this futher, and sketches a set of services and tools we would need in order to construct the graph

    MSB-ECA: Phylogenetically-informed modeling of the regional context of community assembly

    Get PDF

    TOLKIN – Tree of Life Knowledge and Information Network: Filling a Gap for Collaborative Research in Biological Systematics

    Get PDF
    The development of biological informatics infrastructure capable of supporting growing data management and analysis environments is an increasing need within the systematics biology community. Although significant progress has been made in recent years on developing new algorithms and tools for analyzing and visualizing large phylogenetic data and trees, implementation of these resources is often carried out by bioinformatics experts, using one-off scripts. Therefore, a gap exists in providing data management support for a large set of non-technical users. The TOLKIN project (Tree of Life Knowledge and Information Network) addresses this need by supporting capabilities to manage, integrate, and provide public access to molecular, morphological, and biocollections data and research outcomes through a collaborative, web application. This data management framework allows aggregation and import of sequences, underlying documentation about their source, including vouchers, tissues, and DNA extraction. It combines features of LIMS and workflow environments by supporting management at the level of individual observations, sequences, and specimens, as well as assembly and versioning of data sets used in phylogenetic inference. As a web application, the system provides multi-user support that obviates current practices of sharing data sets as files or spreadsheets via email

    Using standard keywords in publications to facilitate updates of new fungal taxonomic names

    Get PDF
    The combination of manual curation and the reliance on updates from submitters to the public sequence databases is currently inefficient and impedes the comprehensive and timely release of records with new taxonomic names. This should be improved by making several steps during data release more efficient. This article focuses on one such step by proposing a standard way for publications to flag papers with novel taxonomic information. As a result, the potential for automated searches of publication aggregators are improved, as well as the accurate curation of taxonomic information.This paper resulted from discussions in Group 5 under the auspices of the International Commission on the Taxonomy of Fungi (ICTF).http://www.imafungus.orgam2018Forestry and Agricultural Biotechnology Institute (FABI)Microbiology and Plant Patholog

    Ten Simple Rules for Digital Data Storage

    Get PDF
    Data is the central currency of science, but the nature of scientific data has changed dramatically with the rapid pace of technology. This change has led to the development of a wide variety of data formats, dataset sizes, data complexity, data use cases, and data sharing practices. Improvements in high throughput DNA sequencing, sustained institutional support for large sensor networks, and sky surveys with large-format digital cameras have created massive quantities of data. At the same time, the combination of increasingly diverse research teams and data aggregation in portals (e.g. for biodiversity data, GBIF or iDigBio) necessitates increased coordination among data collectors and institutions. As a consequence, “data” can now mean anything from petabytes of information stored in professionally-maintained databases, through spreadsheets on a single computer, to hand-written tables in lab notebooks on shelves. All remain important, but data curation practices must continue to keep pace with the changes brought about by new forms and practices of data collection and storage.</jats:p
    corecore