21,106 research outputs found

    Prediction with Dimension Reduction of Multiple Molecular Data Sources for Patient Survival

    Full text link
    Predictive modeling from high-dimensional genomic data is often preceded by a dimension reduction step, such as principal components analysis (PCA). However, the application of PCA is not straightforward for multi-source data, wherein multiple sources of 'omics data measure different but related biological components. In this article we utilize recent advances in the dimension reduction of multi-source data for predictive modeling. In particular, we apply exploratory results from Joint and Individual Variation Explained (JIVE), an extension of PCA for multi-source data, for prediction of differing response types. We conduct illustrative simulations to illustrate the practical advantages and interpretability of our approach. As an application example we consider predicting survival for Glioblastoma Multiforme (GBM) patients from three data sources measuring mRNA expression, miRNA expression, and DNA methylation. We also introduce a method to estimate JIVE scores for new samples that were not used in the initial dimension reduction, and study its theoretical properties; this method is implemented in the R package R.JIVE on CRAN, in the function 'jive.predict'.Comment: 11 pages, 9 figure

    Unconventional machine learning of genome-wide human cancer data

    Full text link
    Recent advances in high-throughput genomic technologies coupled with exponential increases in computer processing and memory have allowed us to interrogate the complex aberrant molecular underpinnings of human disease from a genome-wide perspective. While the deluge of genomic information is expected to increase, a bottleneck in conventional high-performance computing is rapidly approaching. Inspired in part by recent advances in physical quantum processors, we evaluated several unconventional machine learning (ML) strategies on actual human tumor data. Here we show for the first time the efficacy of multiple annealing-based ML algorithms for classification of high-dimensional, multi-omics human cancer data from the Cancer Genome Atlas. To assess algorithm performance, we compared these classifiers to a variety of standard ML methods. Our results indicate the feasibility of using annealing-based ML to provide competitive classification of human cancer types and associated molecular subtypes and superior performance with smaller training datasets, thus providing compelling empirical evidence for the potential future application of unconventional computing architectures in the biomedical sciences

    Posterior Contraction Rates of the Phylogenetic Indian Buffet Processes

    Get PDF
    By expressing prior distributions as general stochastic processes, nonparametric Bayesian methods provide a flexible way to incorporate prior knowledge and constrain the latent structure in statistical inference. The Indian buffet process (IBP) is such an example that can be used to define a prior distribution on infinite binary features, where the exchangeability among subjects is assumed. The phylogenetic Indian buffet process (pIBP), a derivative of IBP, enables the modeling of non-exchangeability among subjects through a stochastic process on a rooted tree, which is similar to that used in phylogenetics, to describe relationships among the subjects. In this paper, we study the theoretical properties of IBP and pIBP under a binary factor model. We establish the posterior contraction rates for both IBP and pIBP and substantiate the theoretical results through simulation studies. This is the first work addressing the frequentist property of the posterior behaviors of IBP and pIBP. We also demonstrated its practical usefulness by applying pIBP prior to a real data example arising in the field of cancer genomics where the exchangeability among subjects is violated

    Applicability of in vivo staging of regional amyloid burden in a cognitively normal cohort with subjective memory complaints: the INSIGHT-preAD study.

    Get PDF
    BACKGROUND:Current methods of amyloid PET interpretation based on the binary classification of global amyloid signal fail to identify early phases of amyloid deposition. A recent analysis of 18F-florbetapir PET data from the Alzheimer's disease Neuroimaging Initiative cohort suggested a hierarchical four-stage model of regional amyloid deposition that resembles neuropathologic estimates and can be used to stage an individual's amyloid burden in vivo. Here, we evaluated the validity of this in vivo amyloid staging model in an independent cohort of older people with subjective memory complaints (SMC). We further examined its potential association with subtle cognitive impairments in this population at elevated risk for Alzheimer's disease (AD). METHODS:The monocentric INSIGHT-preAD cohort includes 318 cognitively intact older individuals with SMC. All individuals underwent 18F-florbetapir PET scanning and extensive neuropsychological testing. We projected the regional amyloid uptake signal into the previously proposed hierarchical staging model of in vivo amyloid progression. We determined the adherence to this model across all cases and tested the association between increasing in vivo amyloid stage and cognitive performance using ANCOVA models. RESULTS:In total, 156 participants (49%) showed evidence of regional amyloid deposition, and all but 2 of these (99%) adhered to the hierarchical regional pattern implied by the in vivo amyloid progression model. According to a conventional binary classification based on global signal (SUVRCereb = 1.10), individuals in stages III and IV were classified as amyloid-positive (except one in stage III), but 99% of individuals in stage I and even 28% of individuals in stage II were classified as amyloid-negative. Neither in vivo amyloid stage nor conventional binary amyloid status was significantly associated with cognitive performance in this preclinical cohort. CONCLUSIONS:The proposed hierarchical staging scheme of PET-evidenced amyloid deposition generalizes well to data from an independent cohort of older people at elevated risk for AD. Future studies will determine the prognostic value of the staging approach for predicting longitudinal cognitive decline in older individuals at increased risk for AD

    Analysis of nucleosome positioning landscapes enables gene discovery in the human malaria parasite Plasmodium falciparum.

    Get PDF
    BackgroundPlasmodium falciparum, the deadliest malaria-causing parasite, has an extremely AT-rich (80.7 %) genome. Because of high AT-content, sequence-based annotation of genes and functional elements remains challenging. In order to better understand the regulatory network controlling gene expression in the parasite, a more complete genome annotation as well as analysis tools adapted for AT-rich genomes are needed. Recent studies on genome-wide nucleosome positioning in eukaryotes have shown that nucleosome landscapes exhibit regular characteristic patterns at the 5'- and 3'-end of protein and non-protein coding genes. In addition, nucleosome depleted regions can be found near transcription start sites. These unique nucleosome landscape patterns may be exploited for the identification of novel genes. In this paper, we propose a computational approach to discover novel putative genes based exclusively on nucleosome positioning data in the AT-rich genome of P. falciparum.ResultsUsing binary classifiers trained on nucleosome landscapes at the gene boundaries from two independent nucleosome positioning data sets, we were able to detect a total of 231 regions containing putative genes in the genome of Plasmodium falciparum, of which 67 highly confident genes were found in both data sets. Eighty-eight of these 231 newly predicted genes exhibited transcription signal in RNA-Seq data, indicative of active transcription. In addition, 20 out of 21 selected gene candidates were further validated by RT-PCR, and 28 out of the 231 genes showed significant matches using BLASTN against an expressed sequence tag (EST) database. Furthermore, 108 (47%) out of the 231 putative novel genes overlapped with previously identified but unannotated long non-coding RNAs. Collectively, these results provide experimental validation for 163 predicted genes (70.6%). Finally, 73 out of 231 genes were found to be potentially translated based on their signal in polysome-associated RNA-Seq representing transcripts that are actively being translated.ConclusionOur results clearly indicate that nucleosome positioning data contains sufficient information for novel gene discovery. As distinct nucleosome landscapes around genes are found in many other eukaryotic organisms, this methodology could be used to characterize the transcriptome of any organism, especially when coupled with other DNA-based gene finding and experimental methods (e.g., RNA-Seq)
    corecore