5 research outputs found

    Comparison of automated and human assignment of MeSH terms on publicly-available molecular datasets

    Get PDF
    AbstractPublicly available molecular datasets can be used for independent verification or investigative repurposing, but depends on the presence, consistency and quality of descriptive annotations. Annotation and indexing of molecular datasets using well-defined controlled vocabularies or ontologies enables accurate and systematic data discovery, yet the majority of molecular datasets available through public data repositories lack such annotations. A number of automated annotation methods have been developed; however few systematic evaluations of the quality of annotations supplied by application of these methods have been performed using annotations from standing public data repositories. Here, we compared manually-assigned Medical Subject Heading (MeSH) annotations associated with experiments by data submitters in the PRoteomics IDEntification (PRIDE) proteomics data repository to automated MeSH annotations derived through the National Center for Biomedical Ontology Annotator and National Library of Medicine MetaMap programs. These programs were applied to free-text annotations for experiments in PRIDE. As many submitted datasets were referenced in publications, we used the manually curated MeSH annotations of those linked publications in MEDLINE as “gold standard”. Annotator and MetaMap exhibited recall performance 3-fold greater than that of the manual annotations. We connected PRIDE experiments in a network topology according to shared MeSH annotations and found 373 distinct clusters, many of which were found to be biologically coherent by network analysis. The results of this study suggest that both Annotator and MetaMap are capable of annotating public molecular datasets with a quality comparable, and often exceeding, that of the actual data submitters, highlighting a continuous need to improve and apply automated methods to molecular datasets in public data repositories to maximize their value and utility

    Extending TCGA queries to automatically identify analogous genomic data from dbGaP [version 1; referees: 2 approved, 1 approved with reservations]

    Get PDF
    Data sharing is critical to advance genomic research by reducing the demand to collect new data by reusing and combining existing data and by promoting reproducible research. The Cancer Genome Atlas (TCGA) is a popular resource for individual-level genotype-phenotype cancer related data. The Database of Genotypes and Phenotypes (dbGaP) contains many datasets similar to those in TCGA. We have created a software pipeline that will allow researchers to discover relevant genomic data from dbGaP, based on matching TCGA metadata. The resulting research provides an easy to use tool to connect these two data sources

    Integrative Approach to Pain Genetics Identifies Pain Sensitivity Loci across Diseases

    Get PDF
    Identifying human genes relevant for the processing of pain requires difficult-to-conduct and expensive large-scale clinical trials. Here, we examine a novel integrative paradigm for data-driven discovery of pain gene candidates, taking advantage of the vast amount of existing disease-related clinical literature and gene expression microarray data stored in large international repositories. First, thousands of diseases were ranked according to a disease-specific pain index (DSPI), derived from Medical Subject Heading (MESH) annotations in MEDLINE. Second, gene expression profiles of 121 of these human diseases were obtained from public sources. Third, genes with expression variation significantly correlated with DSPI across diseases were selected as candidate pain genes. Finally, selected candidate pain genes were genotyped in an independent human cohort and prospectively evaluated for significant association between variants and measures of pain sensitivity. The strongest signal was with rs4512126 (5q32, ABLIM3, P = 1.3×10−10) for the sensitivity to cold pressor pain in males, but not in females. Significant associations were also observed with rs12548828, rs7826700 and rs1075791 on 8q22.2 within NCALD (P = 1.7×10−4, 1.8×10−4, and 2.2×10−4 respectively). Our results demonstrate the utility of a novel paradigm that integrates publicly available disease-specific gene expression data with clinical data curated from MEDLINE to facilitate the discovery of pain-relevant genes. This data-derived list of pain gene candidates enables additional focused and efficient biological studies validating additional candidates

    Identification of potential signature whistles from free-ranging common dolphins (Delphinus delphis) in South Africa

    Get PDF
    Conveying identity is important for social animals to maintain individually based relationships. Communication of identity information relies on both signal encoding and perception. Several delphinid species use individually distinctive signature whistles to transmit identity information, best described for the common bottlenose dolphin (Tursiops truncatus). In this study, we investigate signature whistle use in wild common dolphins (Delphinus delphis). Acoustic recordings were analysed from 11 encounters from three locations in South Africa (Hout Bay, False Bay, and Plettenberg Bay) during 2009, 2016 and 2017. The frequency contours of whistles were visually categorised, with 29 signature whistle types (SWTs) identified through contour categorisation and a bout analysis approach developed specifically to identify signature whistles in bottlenose dolphins (SIGID). Categorisation verification was conducted using an unsupervised neural network (ARTwarp) at both a 91% and 96% vigilance parameter. For this, individual SWTs were analysed type by type and then in a ‘global’ analysis whereby all 497 whistle contours were categorised simultaneously. Overall the analysis demonstrated high stereotypy in the structure and temporal production of whistles, consistent with signature whistle use. We suggest that individual identity information may be encoded in these whistle contours. However, the large group sizes and high degree of vocal activity characteristic of this dolphin species generate a cluttered acoustic environment with high potential for masking from conspecific vocalisations. Therefore, further investigation into the mechanisms of identity perception in such acoustically cluttered environments is required to demonstrate the function of these stereotyped whistle types in common dolphins.This research (2014–2018) was supported by Sea Search Research and Conservation NPO and Grant 93689 to Simon Elwen from the South African National Research Foundation. For the data collected in Plettenberg Bay in 2009, funding was provided by the Russel Trust.http://link.springer.com/journal/100712020-06-08hj2019Mammal Research InstituteZoology and Entomolog
    corecore