4 research outputs found
ESIP Community Ontology Repository (COR)
Presents the Community Ontology Repository (COR), an instance of the Marine Metadata Ontology (ORR), developed with NSF funding support for use by ESIP members
Additional file 1: of NCBO Ontology Recommender 2.0: an enhanced approach for biomedical ontology recommendation
Ontology Recommender traffic summary. Summary of traffic received by the Ontology Recommender for the period 2014–2016, compared to the other most used BioPortal services. (PDF 27 kb
SAP – a CEDAR-based pipeline for semantic annotation of biomedical metadata
<p>The exponential growth in the volume of biomedical data held in public data repositories has created tremendous opportunity to evaluate novel research hypotheses in silico. But such search and analysis of disparate data presupposes a consistent semantic representation of the metadata that annotate the research data. Semantic grouping of data is the cornerstone of efficient searches and meta-analyses. Existing metadata are either granularly defined as tag–value pairs (e.g., sample organism="homo sapiens") or implicitly found in long textual descriptions (e.g., in a study design overview). Current practice is to manually map metadata strings to ontological terms before any data analysis can begin. But manual semantic annotation is time-consuming and requires domain and ontology expertise, and therefore may not scale with metadata growth. </p><p>Under the umbrella of the Center for Enhanced Data Annotation and Retrieval (CEDAR) metadata enrichment effort, we are building the Semantic Annotation Pipeline (SAP), which automates semantic annotation of biomedical data stored in public data repositories. The pipeline has two major segments: 1) Reformat the metadata stored in the data repository to CEDAR JSON-LD format, using templates created in the CEDAR repository, and 2) Add semantic annotations to the CEDAR formatted metadata. We are employing Apache’s UIMA ConceptMapper to efficiently map metadata text segments to ontology terms. </p><p>We are using the GEO microarray data repository to build and evaluate SAP. Our initial focus is on annotating a specific set of experiment metadata including experiment design, and sample characteristics such as organism, disease, and treatment. We plan to evaluate the SAP annotations against manually curated data, including GEO datasets found in the GEO repository. We intend to show that SAP can ease the process of semantic annotation of metadata, and the enriched metadata can support efficient search and meta-analyses of biological and biomedical data.</p
Recommended from our members
FAIR LINCS Data and Metadata powered by the CEDAR Framework
The Library of Integrated Network-based Signatures (LINCS) program generates a wide variety of cell-based perturbation-response signatures using diverse assay technologies. For example, LINCS includes large-scale transcriptional profiling of genetic and small molecule perturbations, and various proteomics and imaging datasets. We have developed data processing pipelines, and supporting informatics infrastructure to access, standardize and harmonize, register and publish LINCS datasets and metadata from all Data and Signature Generating Centers (DSGC’s). Metadata standards specifications provide a foundation for harmonizing and integrating LINCS data. Here we introduce a CEDAR-based LINCS Community Metadata Environment, to support end-to-end metadata management framework that supports authoring, curation, validation, management, and sharing of LINCS metadata, while building upon the existing LINCS metadata standards and data-release workflows. Following this initial validation, our goal is to create reusable metadata modules with user friendly templates for each of the LINCS metadata categories and to make our suite of tools compatible with the CEDAR metadata technologies. This should further simplify metadata handling in the LINCS consortium and facilitate a global metadata repository at CEDAR. As other projects apply the same approach, many more datasets will become cross-searchable and can be linked optimizing the metadata pathway from submission to discovery