33 research outputs found

    Making sense of big data in health research: Towards an EU action plan.

    Get PDF
    Medicine and healthcare are undergoing profound changes. Whole-genome sequencing and high-resolution imaging technologies are key drivers of this rapid and crucial transformation. Technological innovation combined with automation and miniaturization has triggered an explosion in data production that will soon reach exabyte proportions. How are we going to deal with this exponential increase in data production? The potential of "big data" for improving health is enormous but, at the same time, we face a wide range of challenges to overcome urgently. Europe is very proud of its cultural diversity; however, exploitation of the data made available through advances in genomic medicine, imaging, and a wide range of mobile health applications or connected devices is hampered by numerous historical, technical, legal, and political barriers. European health systems and databases are diverse and fragmented. There is a lack of harmonization of data formats, processing, analysis, and data transfer, which leads to incompatibilities and lost opportunities. Legal frameworks for data sharing are evolving. Clinicians, researchers, and citizens need improved methods, tools, and training to generate, analyze, and query data effectively. Addressing these barriers will contribute to creating the European Single Market for health, which will improve health and healthcare for all Europeans

    ELIXIR: Data for Life – Coordinating life science data and services across Europe 

    No full text
    ELIXIR unites Europe’s leading life science organisations in managing and safeguarding the increasing volume of data being generated by publicly funded research. It coordinates, integrates and sustains bioinformatics resources across its 22 member states, plus EMBL-EBI (European Molecular Biology Laboratory - European Bioinformatics Institute), and enables end users to access services and data that are vital for their research. ELIXIR's remit spans the full breadth of life science data, including data related to human health, food production (agriculture, farming, aquaculture) and the environment (e.g. pollution remediation, ecology), all of clear socio-economic benefit. As a result, ELIXIR contributes to the delivery of several sustainable development goals. This poster will introduce ELIXIR and describe the contribution it can make to coordinating data and services relevant to biodiversity. The poster will set the context for how molecularly-derived biodiversity occurrence data can significantly enhance resources such as the Global Biodiversity Information Facility (GBIF)  and the Ocean Biogeographic Information System (OBIS), e.g. by filling in acute gaps in our knowledge of species across realms

    Mainstreaming Molecular Biodiversity: A call for a unified and interoperable framework

    No full text
    Over the past 20 years, immense progress has been made in enhancing the effectiveness, affordability, and deployability of molecular methods for biodiversity assessment and monitoring. From the micro- to macroscopic scale, methods such as amplicon sequencing of phylogenetic marker genes, metagenomics, and metatranscriptomics have greatly impacted biology and ecology, and are steadily being integrated into national and international biodiversity policy. Over the next decade, technologies such as miniaturised and autonomous DNA sequencing platforms will amplify this momentum, ushering in an unprecedented volume of deeply minable biodiversity information. While production-grade resources exist to standardise, archive, and exchange raw molecular data (e.g. the resources of the International Nucleotide Sequence Database Collaboration (INSDC) for DNA and RNA sequences), there are still no equivalent frameworks for biodiversity information derived from molecular methods. Research infrastructures in both the biodiversity and molecular biology domains must fill this gap with great urgency to channel molecular advances into efforts to understand and sustain Earth's imperilled biosphere. This session seeks to accelerate the implementation of global standards to link molecular biodiversity data to taxonomy-based systems. Only with these in place can we realise a robust, distributed, yet fully interoperating, network of infrastructures, projects, and researchers addressing molecular biodiversity. This introductory series of flash talks will present the rationale and goals of the session, alongside a joint vision from representatives of several convening stakeholders. A contribution from ELIXIR, an intergovernmental organisation of distributed infrastructures for biological data, will demonstrate the high readiness of biological data resources such as the European Nucleotide Archive (ENA) to mobilise molecular data along new standards. An intervention from the SILVA rRNA database project - itself an ELIXIR Core Data Resource - will note the actionability of interfacing molecular-based phylogenies with Linnaean systems hosted by partners such as the Global Biodiversity Information Facility (GBIF). Two more contributions will emphasise the essential role (and thus critical need) of molecular biodiversity standards in bridging research and operations. The first will focus on the nation-scale Metagenomics-Based Ecosystem Biomonitoring (EcoBiomics) project in Canada, which is using 'omic approaches to better assess, monitor, and remediate microbial and invertebrate biodiversity in soil and aquatic ecosystems, thus sustaining ecosystem resilience and service provision upon which society and economies depend. The second will underscore the need for international and stable standards to advance the long-term mission of the Global Omics Observatory Network (GLOMICON), and its contribution to the Global Ocean Observing System's Essential Ocean Variables (GOOS EOVs) under the Intergovernmental Oceanographic Commission of the United Nations Educational, Scientific, and Cultural Organization (IOC-UNESCO). Collectively, these contributions will make the case for a concerted effort to expedite the principled creation of operational information standards in molecular biodiversity. We invite all stakeholders to join us in implementing these standards in the coming years

    Towards Connecting Molecular Data and the Biodiversity Research Community: An ENA and ELIXIR biodiversity community perspective

    No full text
    Global and regional efforts for generating molecular sequencing data are fundamental to characterise and monitor the Earth’s biodiversity. However, exploiting the full potential of molecular data for biodiversity monitoring and conservation is still a challenge. There is still the need to fully connect the generation and archiving of sequence data with other biodiversity infrastructures, thereby promoting Findability, Accessability, Interoperability and Reusability (FAIR) of data.Here we present the ongoing activities and future plans of the European Life-Science Infrastructure (ELIXIR) and the European Molecular Biology Laboratory European Bioinformatics Institute’s (EMBL-EBI) European Nucleotide Archive (ENA, the European node of the International Nucleotide Sequence Database Collaboration - INSDC) towards an enriched set of sequence data connected to the wider biodiversity research community.ELIXIR has an emerging Biodiversity Community that was originally created as a focus group in 2019, to better align the work in biodiversity across the ELIXIR Nodes and with global initiatives in the biodiversity domain. This group has been working on understanding the capabilities, interests and ongoing projects that exist across the Nodes, developing connections with external partners in the biodiversity area (e.g. Global Biodiversity Information Facilitiy, GBIF; LifeWatch Eric) and developing a longer term strategy for support of biodiversity by ELIXIR. A recent opinion piece by the group (Waterhouse et al. 2021) highlights opportunities for infrastructure developments in the area of biodiversity and provides recommendations for closer integration of molecular data with biodiversity research. These recommendations include the alignment of taxonomies across domains and the general adoption of standardized metadata.ELIXIR and EMBL-EBI are involved in several biodiversity genomics initiatives, including the Earth BioGenome Project (EBP), the Darwin Tree of Life Project (DToL), the European Reference Genome Atlas (ERGA), and the BIOSCAN Europe, where support is being provided to data curation, submission and visibility and in the definition of standards for the associated metadata (e.g. Lawniczak et al. 2022). Moreover, EMBL-EBI is a partner of UniEuk, an initiative that is working towards building a flexible universal taxonomic framework for eukaryotes. ELIXIR and EMBL-EBI are also part of the Biodiversity Community Integrated Knowledge Library (BiCIKL), an Horizon 2020 project that is working towards establishing FAIR practices in the biodiversity domain, and thereby developing tools and workflows for connecting data along the biodiversity research cycle (Penev et al. 2022).These projects and community efforts are contributing to improving metadata standards and pushing the development of tools and workflows to support enriched metadata and increased linkage with other biodiversity infrastructures. Overall, we need to continue to work towards a strong foundation of interlinked knowledge to be able to effectively respond to global challenges such as biodiversity loss and ecosystem change

    Improving FAIRness of eDNA and Metabarcoding Data: Standards and tools for European Nucleotide Archive data deposition

    No full text
    The advancements in sequencing technologies have promoted the generation of molecular data for cataloguing and describing biodiversity. The analysis of environmental DNA (eDNA) through the application of metabarcoding techniques enables comprehensive descriptions of communities and their function, being fundamental for understanding and preserving biodiversity. Metabarcoding is becoming widely used and standard methods are being generated for a growing range of applications with high scalability. The generated data can be made available in its unprocessed form, as raw data (the sequenced reads) or as interpreted data, including sets of sequences derived after bioinformatics processing (Amplicon Sequence Variants (ASVs) or Operational Taxonomic Units (OTUs)) and occurrence tables (tables that describe the occurrences and abundances of species or OTUs/ASVs). However, for this data to be Findable, Accessible, Interoperable and Reusable (FAIR), and therefore fully available for meaningful interpretation, it needs to be deposited in public repositories together with enriched sample metadata, protocols and analysis workflows (ten Hoopen et al. 2017). Metabarcoding raw data and associated sample metadata is often stored and made available through the International Nucleotide Sequence Database Collaboration (INSDC) archives (Arita et al. 2020), of which the European Nucleotide Archive (ENA, Burgin et al. 2022) is its European database, but it is often deposited with minimal information, which hinders data reusability. Within the scope of the Horizon 2020 project, Biodiversity Community Integrated Knowledge Library (BiCIKL), which is building a community of interconnected data for biodiversity research (Penev et al. 2022), we are working towards improving the standards for molecular ecology data sharing, developing tools to facilitate data deposition and retrieval, and linking between data types. Here we will present the ENA data model, showcasing how metabarcoding data can be shared, while providing enriched metadata, and how this data is linked with existing data in other research infrastructures in the biodiversity domain, such as the Global Biodiversity Information Facility (GBIF), where data is deposited following the guidelines published in Abarenkov et al. (2023). We will also present the results of our recent discussions on standards for this data type and discuss future plans towards continuing to improve data sharing and interoperability for molecular ecology

    Going Molecular: Sequence-based spatiotemporal biodiversity evidence in GBIF

    No full text
    The Global Biodiversity Information Facility (GBIF) was established by governments in 2001, largely through the initiative and leadership of the natural history collections community, following the 1999 recommendation by a working group under the Megascience Forum (predecessor of the Global Science Forum) of the Organization for Economic Cooperation and Development (OECD). Over 20 years, GBIF has helped develop standards and convened a global community of data-publishing institutions, aggregrating over one billion specimen occurrence records freely and openly available for use in research and policy making. These GBIF mediated data range from vouchered museum specimens to observation records generated by humans and machines. New data are being generated from integrated remote sensing, ecological sampling, and molecular sequencing that have strong geospatial components but lack traditional vouchers. GBIF is working with partners to develop best practices of bringing this data into the GBIF architecture. Following discussions during the second Global Biodiversity Information Conference in 2018, GBIF and the European Bioinformatics Institute (EMBL-EBI), supported by ELIXIR, have extended collaboration to share species occurrence records known only from their genetic material. When these data providers contribute data coordinates along with the sequences to the European Nucleotide Archive (ENA), the records will appear on GBIF maps and in spatial searches. This collaboration enables significant new molecular data streams to become discoverable through GBIF.org: by mid-March 2019, over 7.8m individual occurrence records via the ENA, and over 13.2m records as standardized Darwin Core sampling-event datasets via MGnify, a resource that provides taxonomic and functional annotations on sequences derived from environmental sequencing projects. Sequence-based occurrence records published by ENA and MGnify boost representation of microbial diversity which was underrepresented at GBIF. The ELIXIR-ENA-MGnify-GBIF partnership is working on further refinement of the dynamic data linkages, frequency of updates and other improvements. The API-based tool that connects GBIF data infrastructures is open to new data contributors and for indexes of molecular occurrences. Indexing of these data streams is dependent on the presence of a name (any rank) with the sequence. Under the current Codes of nomenclature, animals, fungi, plants, and algae cannot be described based on exclusively sequence data. Yet, a significant volume of biodiversity data has only been represented by DNA sequences. Barcoding and sequence clustering procedures vary among taxa and research communities, but clusters can be related to a taxon with a Latin name. Many DNA similarity clusters do not contain a sequence from a formally described taxon; however these sequence clusters provide provisional molecular names for nomenclatural communication. In the best cases, curated libraries of reference sequences, their metadata, clusters, alignments, and links to individuals and physical material become de facto naming conventions for certain taxonomic groups, and co-exist with Latin names. Integration of molecular names into the taxonomic backbone of GBIF started with Fungi and UNITE, a data management and identification environment for fungal ITS barcodes with 87,000+ fungal species hypotheses demarcating 800,000+ sequence specimens as of March 2019. Checklist publication of all names in UNITE through GBIF.org including Linnaean names and stable, DOI-trackable molecular sequence based ‘species hypotheses’, enables indexing of fungal metabarcoding data worldwide, such as BIOWIDE. As names are currently essential to indexing the world’s occurrence data, GBIF will develop similar linkages with names in the Barcode of Life data system (BOLD) and in SILVA - a resource for high-quality ribosomal RNA sequence data and taxonomy, and welcomes other reference systems to this development. Expanding the molecular data streams (Fig. 1) allows GBIF to address spatial, temporal and taxonomic gaps and biases, and to support large-scale data-intensive research openly and worldwide

    Going Molecular: Sequence-based spatiotemporal biodiversity evidence in GBIF

    No full text
    The Global Biodiversity Information Facility (GBIF) was established by governments in 2001, largely through the initiative and leadership of the natural history collections community, following the 1999 recommendation by a working group under the Megascience Forum (predecessor of the Global Science Forum) of the Organization for Economic Cooperation and Development (OECD). Over 20 years, GBIF has helped develop standards and convened a global community of data-publishing institutions, aggregrating over one billion specimen occurrence records freely and openly available for use in research and policy making. These GBIF mediated data range from vouchered museum specimens to observation records generated by humans and machines. New data are being generated from integrated remote sensing, ecological sampling, and molecular sequencing that have strong geospatial components but lack traditional vouchers. GBIF is working with partners to develop best practices of bringing this data into the GBIF architecture. Following discussions during the second Global Biodiversity Information Conference in 2018, GBIF and the European Bioinformatics Institute (EMBL-EBI), supported by ELIXIR, have extended collaboration to share species occurrence records known only from their genetic material. When these data providers contribute data coordinates along with the sequences to the European Nucleotide Archive (ENA), the records will appear on GBIF maps and in spatial searches. This collaboration enables significant new molecular data streams to become discoverable through GBIF.org: by mid-March 2019, over 7.8m individual occurrence records via the ENA, and over 13.2m records as standardized Darwin Core sampling-event datasets via MGnify, a resource that provides taxonomic and functional annotations on sequences derived from environmental sequencing projects. Sequence-based occurrence records published by ENA and MGnify boost representation of microbial diversity which was underrepresented at GBIF. The ELIXIR-ENA-MGnify-GBIF partnership is working on further refinement of the dynamic data linkages, frequency of updates and other improvements. The API-based tool that connects GBIF data infrastructures is open to new data contributors and for indexes of molecular occurrences. Indexing of these data streams is dependent on the presence of a name (any rank) with the sequence. Under the current Codes of nomenclature, animals, fungi, plants, and algae cannot be described based on exclusively sequence data. Yet, a significant volume of biodiversity data has only been represented by DNA sequences. Barcoding and sequence clustering procedures vary among taxa and research communities, but clusters can be related to a taxon with a Latin name. Many DNA similarity clusters do not contain a sequence from a formally described taxon; however these sequence clusters provide provisional molecular names for nomenclatural communication. In the best cases, curated libraries of reference sequences, their metadata, clusters, alignments, and links to individuals and physical material become de facto naming conventions for certain taxonomic groups, and co-exist with Latin names. Integration of molecular names into the taxonomic backbone of GBIF started with Fungi and UNITE, a data management and identification environment for fungal ITS barcodes with 87,000+ fungal species hypotheses demarcating 800,000+ sequence specimens as of March 2019. Checklist publication of all names in UNITE through GBIF.org including Linnaean names and stable, DOI-trackable molecular sequence based ‘species hypotheses’, enables indexing of fungal metabarcoding data worldwide, such as BIOWIDE. As names are currently essential to indexing the world’s occurrence data, GBIF will develop similar linkages with names in the Barcode of Life data system (BOLD) and in SILVA - a resource for high-quality ribosomal RNA sequence data and taxonomy, and welcomes other reference systems to this development. Expanding the molecular data streams (Fig. 1) allows GBIF to address spatial, temporal and taxonomic gaps and biases, and to support large-scale data-intensive research openly and worldwide

    Deliverable D8.3 Web interface for ELIXIR Contextual Data ClearingHouse

    No full text
    This deliverable report includes description of the work steps towards building a web interface for the reporting of errors and gaps in sequenced material source annotations as part of the Task 8.3 of BiCIKL. Beta version of the web interface has been published and is available for the registered users of PlutoF platform

    Identification and characterisation of the angiotensin converting enzyme-3 (ACE3) gene: a novel mammalian homologue of ACE

    Get PDF
    BACKGROUND: Mammalian angiotensin converting enzyme (ACE) plays a key role in blood pressure regulation. Although multiple ACE-like proteins exist in non-mammalian organisms, to date only one other ACE homologue, ACE2, has been identified in mammals. RESULTS: Here we report the identification and characterisation of the gene encoding a third homologue of ACE, termed ACE3, in several mammalian genomes. The ACE3 gene is located on the same chromosome downstream of the ACE gene. Multiple sequence alignment and molecular modelling have been employed to characterise the predicted ACE3 protein. In mouse, rat, cow and dog, the predicted protein has mutations in some of the critical residues involved in catalysis, including the catalytic Glu in the HEXXH zinc binding motif which is Gln, and ESTs or reverse-transcription PCR indicate that the gene is expressed. In humans, the predicted ACE3 protein has an intact HEXXH motif, but there are other deletions and insertions in the gene and no ESTs have been identified. CONCLUSION: In the genomes of several mammalian species there is a gene that encodes a novel, single domain ACE-like protein, ACE3. In mouse, rat, cow and dog ACE3, the catalytic Glu is replaced by Gln in the putative zinc binding motif, indicating that in these species ACE3 would lack catalytic activity as a zinc metalloprotease. In humans, no evidence was found that the ACE3 gene is expressed and the presence of deletions and insertions in the sequence indicate that ACE3 is a pseudogene

    Deliverable D8.3 Web interface for ELIXIR Contextual Data ClearingHouse

    No full text
    This deliverable report includes description of the work steps towards building a web interface for the reporting of errors and gaps in sequenced material source annotations as part of the Task 8.3 of BiCIKL. Beta version of the web interface has been published and is available for the registered users of PlutoF platform
    corecore