15 research outputs found

    Supplemental Information 2: Example dataset description

    Get PDF
    Access to consistent, high-quality metadata is critical to finding, understanding, and reusing scientific data. However, while there are many relevant vocabularies for the annotation of a dataset, none sufficiently captures all the necessary metadata. This prevents uniform indexing and querying of dataset repositories. Towards providing a practical guide for producing a high quality description of biomedical datasets, the W3C Semantic Web for Health Care and the Life Sciences Interest Group (HCLSIG) identified Resource Description Framework (RDF) vocabularies that could be used to specify common metadata elements and their value sets. The resulting guideline covers elements of description, identification, attribution, versioning, provenance, and content summarization. This guideline reuses existing vocabularies, and is intended to meet key functional requirements including indexing, discovery, exchange, query, and retrieval of datasets, thereby enabling the publication of FAIR data. The resulting metadata profile is generic and could be used by other domains with an interest in providing machine readable descriptions of versioned datasets

    pubmed2ensembl: A Resource for Mining the Biological Literature on Genes

    Get PDF
    The last two decades have witnessed a dramatic acceleration in the production of genomic sequence information and publication of biomedical articles. Despite the fact that genome sequence data and publications are two of the most heavily relied-upon sources of information for many biologists, very little effort has been made to systematically integrate data from genomic sequences directly with the biological literature. For a limited number of model organisms dedicated teams manually curate publications about genes; however for species with no such dedicated staff many thousands of articles are never mapped to genes or genomic regions.To overcome the lack of integration between genomic data and biological literature, we have developed pubmed2ensembl (http://www.pubmed2ensembl.org), an extension to the BioMart system that links over 2,000,000 articles in PubMed to nearly 150,000 genes in Ensembl from 50 species. We use several sources of curated (e.g., Entrez Gene) and automatically generated (e.g., gene names extracted through text-mining on MEDLINE records) sources of gene-publication links, allowing users to filter and combine different data sources to suit their individual needs for information extraction and biological discovery. In addition to extending the Ensembl BioMart database to include published information on genes, we also implemented a scripting language for automated BioMart construction and a novel BioMart interface that allows text-based queries to be performed against PubMed and PubMed Central documents in conjunction with constraints on genomic features. Finally, we illustrate the potential of pubmed2ensembl through typical use cases that involve integrated queries across the biomedical literature and genomic data.By allowing biologists to find the relevant literature on specific genomic regions or sets of functionally related genes more easily, pubmed2ensembl offers a much-needed genome informatics inspired solution to accessing the ever-increasing biomedical literature

    Density functional theory based screening of ternary alkali-transition metal borohydrides: A computational material design project

    Get PDF
    The dissociation of molecules, even the most simple hydrogen molecule, cannot be described accurately within density functional theory because none of the currently available functionals accounts for strong on-site correlation. This problem led to a discussion of properties that the local Kohn-Sham potential has to satisfy in order to correctly describe strongly correlated systems. We derive an analytic expression for the nontrivial form of the Kohn-Sham potential in between the two fragments for the dissociation of a single bond. We show that the numerical calculations for a one-dimensional two-electron model system indeed approach and reach this limit. It is shown that the functional form of the potential is universal, i.e., independent of the details of the two fragments.We acknowledge funding by the Spanish MEC (Grant No. FIS2007-65702-C02-01), “Grupos Consolidados UPV/EHU del Gobierno Vasco” (Grant No. IT-319-07), and the European Community through e-I3 ETSF project (Grant Agreement No. 211956).Peer reviewe

    Density functional theory based screening of ternary alkali-transition metal borohydrides: A computational material design project

    Get PDF

    Genome-wide associations for birth weight and correlations with adult disease

    Get PDF
    Birth weight (BW) has been shown to be influenced by both fetal and maternal factors and in observational studies is reproducibly associated with future risk of adult metabolic diseases including type 2 diabetes (T2D) and cardiovascular disease. These life-course associations have often been attributed to the impact of an adverse early life environment. Here, we performed a multi-ancestry genome-wide association study (GWAS) meta-analysis of BW in 153,781 individuals, identifying 60 loci where fetal genotype was associated with BW (P\textit{P}  < 5 × 108^{-8}). Overall, approximately 15% of variance in BW was captured by assays of fetal genetic variation. Using genetic association alone, we found strong inverse genetic correlations between BW and systolic blood pressure (R\textit{R}g_{g} = -0.22, P\textit{P}  = 5.5 × 1013^{-13}), T2D (R\textit{R}g_{g} = -0.27, P\textit{P}  = 1.1 × 106^{-6}) and coronary artery disease (R\textit{R}g_{g} = -0.30, P\textit{P}  = 6.5 × 109^{-9}). In addition, using large -cohort datasets, we demonstrated that genetic factors were the major contributor to the negative covariance between BW and future cardiometabolic risk. Pathway analyses indicated that the protein products of genes within BW-associated regions were enriched for diverse processes including insulin signalling, glucose homeostasis, glycogen biosynthesis and chromatin remodelling. There was also enrichment of associations with BW in known imprinted regions (P\textit{P} = 1.9 × 104^{-4}). We demonstrate that life-course associations between early growth phenotypes and adult cardiometabolic disease are in part the result of shared genetic effects and identify some of the pathways through which these causal genetic effects are mediated.For a full list of the funders pelase visit the publisher's website and look at the supplemetary material provided. Some of the funders are: British Heart Foundation, Cancer Research UK, Medical Research Council, National Institutes of Health, Royal Society and Wellcome Trust

    The health care and life sciences community profile for dataset descriptions

    Get PDF
    Access to consistent, high-quality metadata is critical to finding, understanding, and reusing scientific data. However, while there are many relevant vocabularies for the annotation of a dataset, none sufficiently captures all the necessary metadata. This prevents uniform indexing and querying of dataset repositories. Towards providing a practical guide for producing a high quality description of biomedical datasets, the W3C Semantic Web for Health Care and the Life Sciences Interest Group (HCLSIG) identified Resource Description Framework (RDF) vocabularies that could be used to specify common metadata elements and their value sets. The resulting guideline covers elements of description, identification, attribution, versioning, provenance, and content summarization. This guideline reuses existing vocabularies, and is intended to meet key functional requirements including indexing, discovery, exchange, query, and retrieval of datasets, thereby enabling the publication of FAIR data. The resulting metadata profile is generic and could be used by other domains with an interest in providing machine readable descriptions of versioned datasets

    <scp>ReSurveyEurope</scp>: A database of resurveyed vegetation plots in Europe

    Get PDF
    AbstractAimsWe introduce ReSurveyEurope — a new data source of resurveyed vegetation plots in Europe, compiled by a collaborative network of vegetation scientists. We describe the scope of this initiative, provide an overview of currently available data, governance, data contribution rules, and accessibility. In addition, we outline further steps, including potential research questions.ResultsReSurveyEurope includes resurveyed vegetation plots from all habitats. Version 1.0 of ReSurveyEurope contains 283,135 observations (i.e., individual surveys of each plot) from 79,190 plots sampled in 449 independent resurvey projects. Of these, 62,139 (78%) are permanent plots, that is, marked in situ, or located with GPS, which allow for high spatial accuracy in resurvey. The remaining 17,051 (22%) plots are from studies in which plots from the initial survey could not be exactly relocated. Four data sets, which together account for 28,470 (36%) plots, provide only presence/absence information on plant species, while the remaining 50,720 (64%) plots contain abundance information (e.g., percentage cover or cover–abundance classes such as variants of the Braun‐Blanquet scale). The oldest plots were sampled in 1911 in the Swiss Alps, while most plots were sampled between 1950 and 2020.ConclusionsReSurveyEurope is a new resource to address a wide range of research questions on fine‐scale changes in European vegetation. The initiative is devoted to an inclusive and transparent governance and data usage approach, based on slightly adapted rules of the well‐established European Vegetation Archive (EVA). ReSurveyEurope data are ready for use, and proposals for analyses of the data set can be submitted at any time to the coordinators. Still, further data contributions are highly welcome.</jats:sec

    FALDO: a semantic standard for describing the location of nucleotide and protein feature annotation

    Get PDF
    BACKGROUND: Nucleotide and protein sequence feature annotations are essential to understand biology on the genomic, transcriptomic, and proteomic level. Using Semantic Web technologies to query biological annotations, there was no standard that described this potentially complex location information as subject-predicate-object triples. DESCRIPTION: We have developed an ontology, the Feature Annotation Location Description Ontology (FALDO), to describe the positions of annotated features on linear and circular sequences. FALDO can be used to describe nucleotide features in sequence records, protein annotations, and glycan binding sites, among other features in coordinate systems of the aforementioned “omics” areas. Using the same data format to represent sequence positions that are independent of file formats allows us to integrate sequence data from multiple sources and data types. The genome browser JBrowse is used to demonstrate accessing multiple SPARQL endpoints to display genomic feature annotations, as well as protein annotations from UniProt mapped to genomic locations. CONCLUSIONS: Our ontology allows users to uniformly describe – and potentially merge – sequence annotations from multiple sources. Data sources using FALDO can prospectively be retrieved using federalised SPARQL queries against public SPARQL endpoints and/or local private triple stores

    The Semanticscience Integrated Ontology (SIO) for biomedical research and knowledge discovery

    Get PDF
    The Semanticscience Integrated Ontology (SIO) is an ontology to facilitate biomedical knowledge discovery. SIO features a simple upper level comprised of essential types and relations for the rich description of arbitrary (real, hypothesized, virtual, fictional) objects, processes and their attributes. SIO specifies simple design patterns to describe and associate qualities, capabilities, functions, quantities, and informational entities including textual, geometrical, and mathematical entities, and provides specific extensions in the domains of chemistry, biology, biochemistry, and bioinformatics. SIO provides an ontological foundation for the Bio2RDF linked data for the life sciences project and is used for semantic integration and discovery for SADI-based semantic web services. SIO is freely available to all users under a creative commons by attribution license. See website for further information: http://sio.semanticscience.org

    The health care and life sciences community profile for dataset descriptions

    Get PDF
    Access to consistent, high-quality metadata is critical to finding, understanding, and reusing scientific data. However, while there are many relevant vocabularies for the annotation of a dataset, none sufficiently captures all the necessary metadata. This prevents uniform indexing and querying of dataset repositories. Towards providing a practical guide for producing a high quality description of biomedical datasets, the W3C Semantic Web for Health Care and the Life Sciences Interest Group (HCLSIG) identified Resource Description Framework (RDF) vocabularies that could be used to specify common metadata elements and their value sets. The resulting guideline covers elements of description, identification, attribution, versioning, provenance, and content summarization. This guideline reuses existing vocabularies, and is intended to meet key functional requirements including indexing, discovery, exchange, query, and retrieval of datasets, thereby enabling the publication of FAIR data. The resulting metadata profile is generic and could be used by other domains with an interest in providing machine readable descriptions of versioned datasets
    corecore