18 research outputs found

    EFFECTIVELY SEARCHING SPECIMEN AND OBSERVATION DATA WITH TOQE, THE THESAURUS OPTIMIZED QUERY EXPANDER

    Get PDF
    Today’s specimen and observation data portals lack a flexible mechanism, able to link up thesaurus-enabled data sources such as taxonomic checklist databases and expand user queries to related terms, significantly enhancing result sets. The TOQE system (Thesaurus Optimized Query Expander) is a REST-like XML web-service implemented in Python and designed for this purpose. Acting as an interface between portals and thesauri, TOQE allows the implementation of specialized portal systems with a set of thesauri supporting its specific focus. It is both easy to use for portal programmers and easy to configure for thesaurus database holders who want to expose their system as a service for query expansions. Currently, TOQE is used in four specimen and observation data portals. The documentation is available from http://search.biocase.org/toqe/

    Badgers: generating data quality deficits with Python

    Full text link
    Generating context specific data quality deficits is necessary to experimentally assess data quality of data-driven (artificial intelligence (AI) or machine learning (ML)) applications. In this paper we present badgers, an extensible open-source Python library to generate data quality deficits (outliers, imbalanced data, drift, etc.) for different modalities (tabular data, time-series, text, etc.). The documentation is accessible at https://fraunhofer-iese.github.io/badgers/ and the source code at https://github.com/Fraunhofer-IESE/badgersComment: 17 pages, 16 figure

    Enriched biodiversity data as a resource and service

    Get PDF
    Background: Recent years have seen a surge in projects that produce large volumes of structured, machine-readable biodiversity data. To make these data amenable to processing by generic, open source “data enrichment” workflows, they are increasingly being represented in a variety of standards-compliant interchange formats. Here, we report on an initiative in which software developers and taxonomists came together to address the challenges and highlight the opportunities in the enrichment of such biodiversity data by engaging in intensive, collaborative software development: The Biodiversity Data Enrichment Hackathon. Results: The hackathon brought together 37 participants (including developers and taxonomists, i.e. scientific professionals that gather, identify, name and classify species) from 10 countries: Belgium, Bulgaria, Canada, Finland, Germany, Italy, the Netherlands, New Zealand, the UK, and the US. The participants brought expertise in processing structured data, text mining, development of ontologies, digital identification keys, geographic information systems, niche modeling, natural language processing, provenance annotation, semantic integration, taxonomic name resolution, web service interfaces, workflow tools and visualisation. Most use cases and exemplar data were provided by taxonomists. One goal of the meeting was to facilitate re-use and enhancement of biodiversity knowledge by a broad range of stakeholders, such as taxonomists, systematists, ecologists, niche modelers, informaticians and ontologists. The suggested use cases resulted in nine breakout groups addressing three main themes: i) mobilising heritage biodiversity knowledge; ii) formalising and linking concepts; and iii) addressing interoperability between service platforms. Another goal was to further foster a community of experts in biodiversity informatics and to build human links between research projects and institutions, in response to recent calls to further such integration in this research domain. Conclusions: Beyond deriving prototype solutions for each use case, areas of inadequacy were discussed and are being pursued further. It was striking how many possible applications for biodiversity data there were and how quickly solutions could be put together when the normal constraints to collaboration were broken down for a week. Conversely, mobilising biodiversity knowledge from their silos in heritage literature and natural history collections will continue to require formalisation of the concepts (and the links between them) that define the research domain, as well as increased interoperability between the software platforms that operate on these concepts

    A common, automated, pre-publication registration model for higher plants (International Plant Names Index, IPNI), fungi (Index Fungorum, MycoBank) and animals (ZooBank)

    No full text
    <p>A common, automated, pre-publication registration model for higher plants (International Plant Names Index, IPNI), fungi (Index Fungorum, MycoBank) and animals (ZooBank) [pdf, 650 KB]</p> <p>Use the reference from http://dx.doi.org/10.6084/m9.figshare.784947 *only*</p

    Interoperability model between PLAZI and the CDM Platform

    No full text
    <p>Interoperability model between PLAZI and the CDM Platform.</p

    Methods used in the development of Common Data Models for health data – A Scoping Review Protocol

    No full text
    Common Data Models (CDMs) are essential tools for data harmonization, which can lead to significant improvements in healthcare. CDMs harmonize data from disparate sources and eases collaborations across institutions which lead to generation of larger standardized data repositories across different entities. This Scoping Review (Sc-R) on methods used in the development of CDMs for healthcare aims to obtain a broad overview of approaches that are used in developing CDMs, i.e., Common Data Elements (CDEs) or Common Data Sets (CDS) for different disease domains on an international level. To get an overview of the state-of-the-art literature databases, namely PubMed, Web of Science, Science Direct, and Scopus are searched for five-year publications, starting from 2017, with associated keywords. The included articles will be evaluated methodically and a list of different types of methods will be created. The methods will then be categorized into groups

    Tracking biogeographical change from its footprints in botanical literature

    No full text
    <p>Early results from an investigation into the usefulness of botanical literature to provide historical information on the distributions of plants. Based upon the case of <em>Chenopodium vulvaria</em>, a small weed of waste places.</p

    B-HIT - A Tool for Harvesting and Indexing Biodiversity Data.

    Get PDF
    With the rapidly growing number of data publishers, the process of harvesting and indexing information to offer advanced search and discovery becomes a critical bottleneck in globally distributed primary biodiversity data infrastructures. The Global Biodiversity Information Facility (GBIF) implemented a Harvesting and Indexing Toolkit (HIT), which largely automates data harvesting activities for hundreds of collection and observational data providers. The team of the Botanic Garden and Botanical Museum Berlin-Dahlem has extended this well-established system with a range of additional functions, including improved processing of multiple taxon identifications, the ability to represent associations between specimen and observation units, new data quality control and new reporting capabilities. The open source software B-HIT can be freely installed and used for setting up thematic networks serving the demands of particular user groups

    HIV-PDI: A Protein Drug Interaction Resource for Structural Analyses of HIV Drug Resistance: 2. Examples of Use and Proof-of-Concept

    No full text
    International audienceThe HIV-PDI resource was designed and implemented to address the problems of drug resistance with a central focus on the 3D structure of the target-drug interaction. Clinical and biological data, structural and physico-chemical information and 3D interaction data concerning the targets (HIV protease) and the drugs (ARVs) were meticulously integrated and combined with tools dedicated to study HIV mutations and their consequences on the efficacy of drugs. Here, the capabilities of the HIV-PDI resource are demonstrated for several different scenarios ranging from retrieving information associated with patients to analyzing structural data relating cognate proteins and ligands. HIV-PDI allows such diverse data to be correlated, especially data linking antiretroviral drug (ARV) resistance to a given treatment with changes in three-dimensional interactions between a drug molecule and the mutated protease. Our work is based on the assumption that ARV resistance results from a loss of affinity between the mutated HIV protease and a drug molecule due to subtle changes in the nature of the protein-ligand interaction. Therefore, a set of patients whose resistance to first line treatment was corrected by a second line treatment was selected from the HIV-PDI database for detailed study, and several queries regarding these patients are processed via its graphical user interface. Considering the protease mutations found in the selected set of patients, our retrospective analysis was able to establish in most cases that the first line treatment was not suitable, and it predicted a second line treatment which agreed perfectly with the clincian's prescription. The present study demonstrates the capabilities of HIV-PDI. We anticipate that this decision support tool will help clinicians and researchers find suitable HIV treatments for individual patients. The HIVPDI database is thereby useful as a system of data collection allowing interpretation on the basis of all available information, thus helping in possible decision-makings
    corecore