19 research outputs found

    Methods for visual mining of genomic and proteomic data atlases

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>As the volume, complexity and diversity of the information that scientists work with on a daily basis continues to rise, so too does the requirement for new analytic software. The analytic software must solve the dichotomy that exists between the need to allow for a high level of scientific reasoning, and the requirement to have an intuitive and easy to use tool which does not require specialist, and often arduous, training to use. Information visualization provides a solution to this problem, as it allows for direct manipulation and interaction with diverse and complex data. The challenge addressing bioinformatics researches is how to apply this knowledge to data sets that are continually growing in a field that is rapidly changing.</p> <p>Results</p> <p>This paper discusses an approach to the development of visual mining tools capable of supporting the mining of massive data collections used in systems biology research, and also discusses lessons that have been learned providing tools for both local researchers and the wider community. Example tools were developed which are designed to enable the exploration and analyses of both proteomics and genomics based atlases. These atlases represent large repositories of raw and processed experiment data generated to support the identification of biomarkers through mass spectrometry (the PeptideAtlas) and the genomic characterization of cancer (The Cancer Genome Atlas). Specifically the tools are designed to allow for: the visual mining of thousands of mass spectrometry experiments, to assist in designing informed targeted protein assays; and the interactive analysis of hundreds of genomes, to explore the variations across different cancer genomes and cancer types.</p> <p>Conclusions</p> <p>The mining of massive repositories of biological data requires the development of new tools and techniques. Visual exploration of the large-scale atlas data sets allows researchers to mine data to find new meaning and make sense at scales from single samples to entire populations. Providing linked task specific views that allow a user to start from points of interest (from diseases to single genes) enables targeted exploration of thousands of spectra and genomes. As the composition of the atlases changes, and our understanding of the biology increase, new tasks will continually arise. It is therefore important to provide the means to make the data available in a suitable manner in as short a time as possible. We have done this through the use of common visualization workflows, into which we rapidly deploy visual tools. These visualizations follow common metaphors where possible to assist users in understanding the displayed data. Rapid development of tools and task specific views allows researchers to mine large-scale data almost as quickly as it is produced. Ultimately these visual tools enable new inferences, new analyses and further refinement of the large scale data being provided in atlases such as PeptideAtlas and The Cancer Genome Atlas.</p

    The Somatic Genomic Landscape of Glioblastoma

    Get PDF
    We describe the landscape of somatic genomic alterations based on multi-dimensional and comprehensive characterization of more than 500 glioblastoma tumors (GBMs). We identify several novel mutated genes as well as complex rearrangements of signature receptors including EGFR and PDGFRA. TERT promoter mutations are shown to correlate with elevated mRNA expression, supporting a role in telomerase reactivation. Correlative analyses confirm that the survival advantage of the proneural subtype is conferred by the G-CIMP phenotype, and MGMT DNA methylation may be a predictive biomarker for treatment response only in classical subtype GBM. Integrative analysis of genomic and proteomic profiles challenges the notion of therapeutic inhibition of a pathway as an alternative to inhibition of the target itself. These data will facilitate the discovery of therapeutic and diagnostic target candidates, the validation of research and clinical observations and the generation of unanticipated hypotheses that can advance our molecular understanding of this lethal cancer

    Multiplatform Analysis of 12 Cancer Types Reveals Molecular Classification within and across Tissues of Origin

    Get PDF
    Recent genomic analyses of pathologically-defined tumor types identify “within-a-tissue” disease subtypes. However, the extent to which genomic signatures are shared across tissues is still unclear. We performed an integrative analysis using five genome-wide platforms and one proteomic platform on 3,527 specimens from 12 cancer types, revealing a unified classification into 11 major subtypes. Five subtypes were nearly identical to their tissue-of-origin counterparts, but several distinct cancer types were found to converge into common subtypes. Lung squamous, head & neck, and a subset of bladder cancers coalesced into one subtype typified by TP53 alterations, TP63 amplifications, and high expression of immune and proliferation pathway genes. Of note, bladder cancers split into three pan-cancer subtypes. The multi-platform classification, while correlated with tissue-of-origin, provides independent information for predicting clinical outcomes. All datasets are available for data-mining from a unified resource to support further biological discoveries and insights into novel therapeutic strategies

    CloudForest: A Scalable and Efficient Random Forest Implementation for Biological Data

    No full text
    <p>Comparison between CloudForest and scikit-learn in terms of prediction performance (<b>a</b>) and training time (<b>b</b>) for a TCGA dataset with varying numbers of missing values (x-axis). For scikit-learn missing values are imputed before RF analysis, whereas CloudForest natively handles missing values without imputation. The time necessary for imputation for scikit-learn is not included in the training times depicted.</p

    CloudForest: A Scalable and Efficient Random Forest Implementation for Biological Data

    No full text
    <p>Comparison between CloudForest and other RF implementations in terms of prediction performance (<b>a</b>) and training time (<b>b</b>). The RFs consisted of 500 trees and were trained using the same standard parameter settings for all implementations.</p

    The Condition-Dependent Transcriptional Landscape of Burkholderia pseudomallei

    Get PDF
    Burkholderia pseudomallei (Bp), the causative agent of the often-deadly infectious disease melioidosis, contains one of the largest prokaryotic genomes sequenced to date, at 7.2 Mb with two large circular chromosomes (1 and 2). To comprehensively delineate the Bp transcriptome, we integrated whole-genome tiling array expression data of Bp exposed to >80 diverse physical, chemical, and biological conditions. Our results provide direct experimental support for the strand-specific expression of 5,467 Sanger protein-coding genes, 1,041 operons, and 766 non-coding RNAs. A large proportion of these transcripts displayed condition-dependent expression, consistent with them playing functional roles. The two Bp chromosomes exhibited dramatically different transcriptional landscapes — Chr 1 genes were highly and constitutively expressed, while Chr 2 genes exhibited mosaic expression where distinct subsets were expressed in a strongly condition-dependent manner. We identified dozens of cis-regulatory motifs associated with specific condition-dependent expression programs, and used the condition compendium to elucidate key biological processes associated with two complex pathogen phenotypes — quorum sensing and in vivo infection. Our results demonstrate the utility of a Bp condition-compendium as a community resource for biological discovery. Moreover, the observation that significant portions of the Bp virulence machinery can be activated by specific in vitro cues provides insights into Bp's capacity as an “accidental pathogen”, where genetic pathways used by the bacterium to survive in environmental niches may have also facilitated its ability to colonize human hosts
    corecore