11 research outputs found
Before and After: Comparison of Legacy and Harmonized TCGA Genomic Data Commons’ Data
We present a systematic analysis of the effects of synchronizing a large-scale, deeply characterized, multi-omic dataset to the current human reference genome, using updated software, pipelines, and annotations. For each of 5 molecular data platforms in The Cancer Genome Atlas (TCGA)—mRNA and miRNA expression, single nucleotide variants, DNA methylation and copy number alterations—comprehensive sample, gene, and probe-level studies were performed, towards quantifying the degree of similarity between the ‘legacy’ GRCh37 (hg19) TCGA data and its GRCh38 (hg38) version as ‘harmonized’ by the Genomic Data Commons. We offer gene lists to elucidate differences that remained after controlling for confounders, and strategies to mitigate their impact on biological interpretation. Our results demonstrate that the hg19 and hg38 TCGA datasets are very highly concordant, promote informed use of either legacy or harmonized omics data, and provide a rubric that encourages similar comparisons as new data emerge and reference data evolve. Gao et al. performed a systematic analysis of the effects of synchronizing the large-scale, widely used, multi-omic dataset of The Cancer Genome Atlas to the current human reference genome. For each of the five molecular data platforms assessed, they demonstrated a very high concordance between the ‘legacy’ GRCh37 (hg19) TCGA data and its GRCh38 (hg38) version as ‘harmonized’ by the Genomic Data Commons
Recommended from our members
Informatics and Standards for Nanomedicine Technology
There are several issues to be addressed concerning the management and effective use of information (or data), generated from nanotechnology studies in biomedical research and medicine. These data are large in volume, diverse in content, and are beset with gaps and ambiguities in the description and characterization of nanomaterials. In this work, we have reviewed three areas of nanomedicine informatics: information resources; taxonomies, controlled vocabularies, and ontologies; and information standards. Informatics methods and standards in each of these areas are critical for enabling collaboration; data sharing; unambiguous representation and interpretation of data; semantic (meaningful) search and integration of data; and for ensuring data quality, reliability, and reproducibility. In particular, we have considered four types of information standards in this article, which are standard characterization protocols, common terminology standards, minimum information standards, and standard data communication (exchange) formats. Currently, because of gaps and ambiguities in the data, it is also difficult to apply computational methods and machine learning techniques to analyze, interpret, and recognize patterns in data that are high dimensional in nature, and also to relate variations in nanomaterial properties to variations in their chemical composition, synthesis, characterization protocols, and so on. Progress toward resolving the issues of information management in nanomedicine using informatics methods and standards discussed in this article will be essential to the rapidly growing field of nanomedicine informatics.This article is a U.S. Government work, and as such, is in the public domain in the United States of America
Current situation on the availability of nanostructure-biological activity data
The recent developments in nanotechnology have not only increased the number of nanoproducts on the market, but also raised concerns about the safety of engineered nanomaterials (ENMs) for human health and the environment. As the production and use of ENMs are increasing, we are approaching the point at which it is impossible to individually assess the toxicity of a vast number of ENMs. Therefore, it is desirable to use time- effective computational methods, such as the quantitative structure-activity relationship (QSAR) models, in order to predict the toxicity of ENMs. However, the accuracy of the nano-(Q)SARs is directly tied to the quality of the data from which the model is estimated. Although the amount of available nanotoxicity data is insufficient for generating robust nano-(Q)SAR models in most cases, there are a handful of studies that provide appropriate experimental data for (Q)SAR-like modelling investigations. The aim of this study is to review the available literature data that are particularly suitable for nano-(Q)SAR modelling. We hope that this paper can serve as a starting point for those who would like to know more about the current availability of experimental data on the health effects of ENMs for future modelling purposes
Before and After: Comparison of Legacy and Harmonized TCGA Genomic Data Commons\u27 Data.
We present a systematic analysis of the effects of synchronizing a large-scale, deeply characterized, multi-omic dataset to the current human reference genome, using updated software, pipelines, and annotations. For each of 5 molecular data platforms in The Cancer Genome Atlas (TCGA)-mRNA and miRNA expression, single nucleotide variants, DNA methylation and copy number alterations-comprehensive sample, gene, and probe-level studies were performed, towards quantifying the degree of similarity between the \u27legacy\u27 GRCh37 (hg19) TCGA data and its GRCh38 (hg38) version as \u27harmonized\u27 by the Genomic Data Commons. We offer gene lists to elucidate differences that remained after controlling for confounders, and strategies to mitigate their impact on biological interpretation. Our results demonstrate that the hg19 and hg38 TCGA datasets are very highly concordant, promote informed use of either legacy or harmonized omics data, and provide a rubric that encourages similar comparisons as new data emerge and reference data evolve
Supplementary Table S1. from NCI Cancer Research Data Commons: Resources to Share Key Cancer Data
A list of web resources for CRDC data commons</p
Minimum information reporting on bio-nano experimental literature
Studying the interactions between nanoengineered materials and biological systems plays a vital role in the development of biological applications of nanotechnology and the improvement of our fundamental understanding of the bio–nano interface. A significant barrier to progress in this multidisciplinary area is the variability of published literature with regards to characterizations performed and experimental details reported. Here, we suggest a ‘minimum information standard’ for experimental literature investigating bio–nano interactions. This standard consists of specific components to be reported, divided into three categories: material characterization, biological characterization and details of experimental protocols. Our intention is for these proposed standards to improve reproducibility, increase quantitative comparisons of bio–nano materials, and facilitate meta analyses and in silico modelling