32 research outputs found

    Making sense of big data in health research: Towards an EU action plan.

    Get PDF
    Medicine and healthcare are undergoing profound changes. Whole-genome sequencing and high-resolution imaging technologies are key drivers of this rapid and crucial transformation. Technological innovation combined with automation and miniaturization has triggered an explosion in data production that will soon reach exabyte proportions. How are we going to deal with this exponential increase in data production? The potential of "big data" for improving health is enormous but, at the same time, we face a wide range of challenges to overcome urgently. Europe is very proud of its cultural diversity; however, exploitation of the data made available through advances in genomic medicine, imaging, and a wide range of mobile health applications or connected devices is hampered by numerous historical, technical, legal, and political barriers. European health systems and databases are diverse and fragmented. There is a lack of harmonization of data formats, processing, analysis, and data transfer, which leads to incompatibilities and lost opportunities. Legal frameworks for data sharing are evolving. Clinicians, researchers, and citizens need improved methods, tools, and training to generate, analyze, and query data effectively. Addressing these barriers will contribute to creating the European Single Market for health, which will improve health and healthcare for all Europeans

    ELIXIR: Data for Life – Coordinating life science data and services across Europe 

    No full text
    ELIXIR unites Europe’s leading life science organisations in managing and safeguarding the increasing volume of data being generated by publicly funded research. It coordinates, integrates and sustains bioinformatics resources across its 22 member states, plus EMBL-EBI (European Molecular Biology Laboratory - European Bioinformatics Institute), and enables end users to access services and data that are vital for their research. ELIXIR's remit spans the full breadth of life science data, including data related to human health, food production (agriculture, farming, aquaculture) and the environment (e.g. pollution remediation, ecology), all of clear socio-economic benefit. As a result, ELIXIR contributes to the delivery of several sustainable development goals. This poster will introduce ELIXIR and describe the contribution it can make to coordinating data and services relevant to biodiversity. The poster will set the context for how molecularly-derived biodiversity occurrence data can significantly enhance resources such as the Global Biodiversity Information Facility (GBIF)  and the Ocean Biogeographic Information System (OBIS), e.g. by filling in acute gaps in our knowledge of species across realms

    Mainstreaming Molecular Biodiversity: A call for a unified and interoperable framework

    No full text
    Over the past 20 years, immense progress has been made in enhancing the effectiveness, affordability, and deployability of molecular methods for biodiversity assessment and monitoring. From the micro- to macroscopic scale, methods such as amplicon sequencing of phylogenetic marker genes, metagenomics, and metatranscriptomics have greatly impacted biology and ecology, and are steadily being integrated into national and international biodiversity policy. Over the next decade, technologies such as miniaturised and autonomous DNA sequencing platforms will amplify this momentum, ushering in an unprecedented volume of deeply minable biodiversity information. While production-grade resources exist to standardise, archive, and exchange raw molecular data (e.g. the resources of the International Nucleotide Sequence Database Collaboration (INSDC) for DNA and RNA sequences), there are still no equivalent frameworks for biodiversity information derived from molecular methods. Research infrastructures in both the biodiversity and molecular biology domains must fill this gap with great urgency to channel molecular advances into efforts to understand and sustain Earth's imperilled biosphere. This session seeks to accelerate the implementation of global standards to link molecular biodiversity data to taxonomy-based systems. Only with these in place can we realise a robust, distributed, yet fully interoperating, network of infrastructures, projects, and researchers addressing molecular biodiversity. This introductory series of flash talks will present the rationale and goals of the session, alongside a joint vision from representatives of several convening stakeholders. A contribution from ELIXIR, an intergovernmental organisation of distributed infrastructures for biological data, will demonstrate the high readiness of biological data resources such as the European Nucleotide Archive (ENA) to mobilise molecular data along new standards. An intervention from the SILVA rRNA database project - itself an ELIXIR Core Data Resource - will note the actionability of interfacing molecular-based phylogenies with Linnaean systems hosted by partners such as the Global Biodiversity Information Facility (GBIF). Two more contributions will emphasise the essential role (and thus critical need) of molecular biodiversity standards in bridging research and operations. The first will focus on the nation-scale Metagenomics-Based Ecosystem Biomonitoring (EcoBiomics) project in Canada, which is using 'omic approaches to better assess, monitor, and remediate microbial and invertebrate biodiversity in soil and aquatic ecosystems, thus sustaining ecosystem resilience and service provision upon which society and economies depend. The second will underscore the need for international and stable standards to advance the long-term mission of the Global Omics Observatory Network (GLOMICON), and its contribution to the Global Ocean Observing System's Essential Ocean Variables (GOOS EOVs) under the Intergovernmental Oceanographic Commission of the United Nations Educational, Scientific, and Cultural Organization (IOC-UNESCO). Collectively, these contributions will make the case for a concerted effort to expedite the principled creation of operational information standards in molecular biodiversity. We invite all stakeholders to join us in implementing these standards in the coming years

    Towards Connecting Molecular Data and the Biodiversity Research Community: An ENA and ELIXIR biodiversity community perspective

    No full text
    Global and regional efforts for generating molecular sequencing data are fundamental to characterise and monitor the Earth’s biodiversity. However, exploiting the full potential of molecular data for biodiversity monitoring and conservation is still a challenge. There is still the need to fully connect the generation and archiving of sequence data with other biodiversity infrastructures, thereby promoting Findability, Accessability, Interoperability and Reusability (FAIR) of data.Here we present the ongoing activities and future plans of the European Life-Science Infrastructure (ELIXIR) and the European Molecular Biology Laboratory European Bioinformatics Institute’s (EMBL-EBI) European Nucleotide Archive (ENA, the European node of the International Nucleotide Sequence Database Collaboration - INSDC) towards an enriched set of sequence data connected to the wider biodiversity research community.ELIXIR has an emerging Biodiversity Community that was originally created as a focus group in 2019, to better align the work in biodiversity across the ELIXIR Nodes and with global initiatives in the biodiversity domain. This group has been working on understanding the capabilities, interests and ongoing projects that exist across the Nodes, developing connections with external partners in the biodiversity area (e.g. Global Biodiversity Information Facilitiy, GBIF; LifeWatch Eric) and developing a longer term strategy for support of biodiversity by ELIXIR. A recent opinion piece by the group (Waterhouse et al. 2021) highlights opportunities for infrastructure developments in the area of biodiversity and provides recommendations for closer integration of molecular data with biodiversity research. These recommendations include the alignment of taxonomies across domains and the general adoption of standardized metadata.ELIXIR and EMBL-EBI are involved in several biodiversity genomics initiatives, including the Earth BioGenome Project (EBP), the Darwin Tree of Life Project (DToL), the European Reference Genome Atlas (ERGA), and the BIOSCAN Europe, where support is being provided to data curation, submission and visibility and in the definition of standards for the associated metadata (e.g. Lawniczak et al. 2022). Moreover, EMBL-EBI is a partner of UniEuk, an initiative that is working towards building a flexible universal taxonomic framework for eukaryotes. ELIXIR and EMBL-EBI are also part of the Biodiversity Community Integrated Knowledge Library (BiCIKL), an Horizon 2020 project that is working towards establishing FAIR practices in the biodiversity domain, and thereby developing tools and workflows for connecting data along the biodiversity research cycle (Penev et al. 2022).These projects and community efforts are contributing to improving metadata standards and pushing the development of tools and workflows to support enriched metadata and increased linkage with other biodiversity infrastructures. Overall, we need to continue to work towards a strong foundation of interlinked knowledge to be able to effectively respond to global challenges such as biodiversity loss and ecosystem change

    Improving FAIRness of eDNA and Metabarcoding Data: Standards and tools for European Nucleotide Archive data deposition

    No full text
    The advancements in sequencing technologies have promoted the generation of molecular data for cataloguing and describing biodiversity. The analysis of environmental DNA (eDNA) through the application of metabarcoding techniques enables comprehensive descriptions of communities and their function, being fundamental for understanding and preserving biodiversity. Metabarcoding is becoming widely used and standard methods are being generated for a growing range of applications with high scalability. The generated data can be made available in its unprocessed form, as raw data (the sequenced reads) or as interpreted data, including sets of sequences derived after bioinformatics processing (Amplicon Sequence Variants (ASVs) or Operational Taxonomic Units (OTUs)) and occurrence tables (tables that describe the occurrences and abundances of species or OTUs/ASVs). However, for this data to be Findable, Accessible, Interoperable and Reusable (FAIR), and therefore fully available for meaningful interpretation, it needs to be deposited in public repositories together with enriched sample metadata, protocols and analysis workflows (ten Hoopen et al. 2017). Metabarcoding raw data and associated sample metadata is often stored and made available through the International Nucleotide Sequence Database Collaboration (INSDC) archives (Arita et al. 2020), of which the European Nucleotide Archive (ENA, Burgin et al. 2022) is its European database, but it is often deposited with minimal information, which hinders data reusability. Within the scope of the Horizon 2020 project, Biodiversity Community Integrated Knowledge Library (BiCIKL), which is building a community of interconnected data for biodiversity research (Penev et al. 2022), we are working towards improving the standards for molecular ecology data sharing, developing tools to facilitate data deposition and retrieval, and linking between data types. Here we will present the ENA data model, showcasing how metabarcoding data can be shared, while providing enriched metadata, and how this data is linked with existing data in other research infrastructures in the biodiversity domain, such as the Global Biodiversity Information Facility (GBIF), where data is deposited following the guidelines published in Abarenkov et al. (2023). We will also present the results of our recent discussions on standards for this data type and discuss future plans towards continuing to improve data sharing and interoperability for molecular ecology

    Going Molecular: Sequence-based spatiotemporal biodiversity evidence in GBIF

    No full text
    The Global Biodiversity Information Facility (GBIF) was established by governments in 2001, largely through the initiative and leadership of the natural history collections community, following the 1999 recommendation by a working group under the Megascience Forum (predecessor of the Global Science Forum) of the Organization for Economic Cooperation and Development (OECD). Over 20 years, GBIF has helped develop standards and convened a global community of data-publishing institutions, aggregrating over one billion specimen occurrence records freely and openly available for use in research and policy making. These GBIF mediated data range from vouchered museum specimens to observation records generated by humans and machines. New data are being generated from integrated remote sensing, ecological sampling, and molecular sequencing that have strong geospatial components but lack traditional vouchers. GBIF is working with partners to develop best practices of bringing this data into the GBIF architecture. Following discussions during the second Global Biodiversity Information Conference in 2018, GBIF and the European Bioinformatics Institute (EMBL-EBI), supported by ELIXIR, have extended collaboration to share species occurrence records known only from their genetic material. When these data providers contribute data coordinates along with the sequences to the European Nucleotide Archive (ENA), the records will appear on GBIF maps and in spatial searches. This collaboration enables significant new molecular data streams to become discoverable through GBIF.org: by mid-March 2019, over 7.8m individual occurrence records via the ENA, and over 13.2m records as standardized Darwin Core sampling-event datasets via MGnify, a resource that provides taxonomic and functional annotations on sequences derived from environmental sequencing projects. Sequence-based occurrence records published by ENA and MGnify boost representation of microbial diversity which was underrepresented at GBIF. The ELIXIR-ENA-MGnify-GBIF partnership is working on further refinement of the dynamic data linkages, frequency of updates and other improvements. The API-based tool that connects GBIF data infrastructures is open to new data contributors and for indexes of molecular occurrences. Indexing of these data streams is dependent on the presence of a name (any rank) with the sequence. Under the current Codes of nomenclature, animals, fungi, plants, and algae cannot be described based on exclusively sequence data. Yet, a significant volume of biodiversity data has only been represented by DNA sequences. Barcoding and sequence clustering procedures vary among taxa and research communities, but clusters can be related to a taxon with a Latin name. Many DNA similarity clusters do not contain a sequence from a formally described taxon; however these sequence clusters provide provisional molecular names for nomenclatural communication. In the best cases, curated libraries of reference sequences, their metadata, clusters, alignments, and links to individuals and physical material become de facto naming conventions for certain taxonomic groups, and co-exist with Latin names. Integration of molecular names into the taxonomic backbone of GBIF started with Fungi and UNITE, a data management and identification environment for fungal ITS barcodes with 87,000+ fungal species hypotheses demarcating 800,000+ sequence specimens as of March 2019. Checklist publication of all names in UNITE through GBIF.org including Linnaean names and stable, DOI-trackable molecular sequence based ‘species hypotheses’, enables indexing of fungal metabarcoding data worldwide, such as BIOWIDE. As names are currently essential to indexing the world’s occurrence data, GBIF will develop similar linkages with names in the Barcode of Life data system (BOLD) and in SILVA - a resource for high-quality ribosomal RNA sequence data and taxonomy, and welcomes other reference systems to this development. Expanding the molecular data streams (Fig. 1) allows GBIF to address spatial, temporal and taxonomic gaps and biases, and to support large-scale data-intensive research openly and worldwide

    Deliverable D8.3 Web interface for ELIXIR Contextual Data ClearingHouse

    No full text
    This deliverable report includes description of the work steps towards building a web interface for the reporting of errors and gaps in sequenced material source annotations as part of the Task 8.3 of BiCIKL. Beta version of the web interface has been published and is available for the registered users of PlutoF platform

    Identification and characterisation of the angiotensin converting enzyme-3 (ACE3) gene: a novel mammalian homologue of ACE

    Get PDF
    BACKGROUND: Mammalian angiotensin converting enzyme (ACE) plays a key role in blood pressure regulation. Although multiple ACE-like proteins exist in non-mammalian organisms, to date only one other ACE homologue, ACE2, has been identified in mammals. RESULTS: Here we report the identification and characterisation of the gene encoding a third homologue of ACE, termed ACE3, in several mammalian genomes. The ACE3 gene is located on the same chromosome downstream of the ACE gene. Multiple sequence alignment and molecular modelling have been employed to characterise the predicted ACE3 protein. In mouse, rat, cow and dog, the predicted protein has mutations in some of the critical residues involved in catalysis, including the catalytic Glu in the HEXXH zinc binding motif which is Gln, and ESTs or reverse-transcription PCR indicate that the gene is expressed. In humans, the predicted ACE3 protein has an intact HEXXH motif, but there are other deletions and insertions in the gene and no ESTs have been identified. CONCLUSION: In the genomes of several mammalian species there is a gene that encodes a novel, single domain ACE-like protein, ACE3. In mouse, rat, cow and dog ACE3, the catalytic Glu is replaced by Gln in the putative zinc binding motif, indicating that in these species ACE3 would lack catalytic activity as a zinc metalloprotease. In humans, no evidence was found that the ACE3 gene is expressed and the presence of deletions and insertions in the sequence indicate that ACE3 is a pseudogene

    Deliverable D8.3 Web interface for ELIXIR Contextual Data ClearingHouse

    No full text
    This deliverable report includes description of the work steps towards building a web interface for the reporting of errors and gaps in sequenced material source annotations as part of the Task 8.3 of BiCIKL. Beta version of the web interface has been published and is available for the registered users of PlutoF platform

    Enabling Community Curation of Biological Source Annotations of Molecular Data Through PlutoF and the ELIXIR Contextual Data Clearinghouse

    No full text
    The advancements in sequencing technologies have greatly contributed to the documentation of Earth’s biodiversity. However, for exploring the full potential of molecular resources for biodiversity, there needs to be a good linkage between sequence data and its biological source, contributing to a network of connected data in the biodiversity research cycle. This requires a foundation of well-structured and accessible annotations in the molecular sequence repositories.The International Nucleotide Sequence Database Collaboration (INSDC), of which the European Nucleotide Archive (ENA) is its European node, holds a large amount of annotations associated with sequence data, relating to its biological source (e.g., specimens in natural history collections). However, for a number of records, these annotations may be incomplete (e.g., missing voucher information), ambiguous or even inaccurate.Therefore, we have implemented a workflow that allows third-party annotations to be attached to sequence and sample records using two existing services, the PlutoF platform and the ELIXIR Contextual Data ClearingHouse. This work was developed within the scope of the BiCIKL (Biodiversity Community Integrated Knowledge Library) project, which aims to establish open science practices in the biodiversity domain.PlutoF is an online data management platform that also provides computing services for biology-related research. PlutoF features allow registered users to enter their own data and access public data at INSDC. Users can enter and manage a range of data, as taxonomic classifications, occurrences, etc. This platform also includes a module that allows the addition of third-party annotations (on material source, taxonomic identification, etc.) linked to specimens or sequence records. This module was already in use by the UNITE community for annotation of INSDC rDNA Internal Transcribed Spacer sequence datasets (Abarenkov et al. 2021). These UNITE annotations are displayed in the National Centre for Biotechnology Information (NCBI) records through links to the PlutoF platform. However, there was the need for an automated solution that allowed third-party annotations to any sequence or sample record at INSDC. This was implemented through the operation of the ELIXIR Contextual Data ClearingHouse (hereafter as Clearinghouse). The Clearinghouse holds a simple RESTful Application Programming Interface (API) to support the submission of additions and improvements to current metadata attributes, such as information on material sources, on records publicly available in the ELIXIR data resources. The Clearinghouse enables the submission of these corrected metadata from databases (such as the PlutoF platform) to the primary data repositories.The workflow developed is shown in Fig. 1 and consists of the following steps: i) users annotate sequence metadata that is regularly downloaded from INSDC using NCBI’s E-utilities; ii) an annotation proposal is created and a verification notification is sent to an assigned reviewer; iii) the reviewer evaluates the annotation proposal and accepts it or rejects it with comments; iv) if the annotation proposal is accepted, the annotated fields that may be mapped to ENA fields are then pushed to the Clearinghouse using their RESTful API. The annotations when received at ENA are then reviewed before being displayed. This workflow is implemented through a web interface in PlutoF, which allows user-friendly and effortless reporting of corrections or additions to biological source metadata in sequence records.Overall, we expect this tool to contribute to the enrichment of metadata associated with sequence records, and therefore increase the links between the molecular and biodiversity resources, and enable sequencing data to deliver their full potential for biodiversity conservation
    corecore