3,967 research outputs found

    Cancer gene prioritization by integrative analysis of mRNA expression and DNA copy number data: a comparative review

    Get PDF
    A variety of genome-wide profiling techniques are available to probe complementary aspects of genome structure and function. Integrative analysis of heterogeneous data sources can reveal higher-level interactions that cannot be detected based on individual observations. A standard integration task in cancer studies is to identify altered genomic regions that induce changes in the expression of the associated genes based on joint analysis of genome-wide gene expression and copy number profiling measurements. In this review, we provide a comparison among various modeling procedures for integrating genome-wide profiling data of gene copy number and transcriptional alterations and highlight common approaches to genomic data integration. A transparent benchmarking procedure is introduced to quantitatively compare the cancer gene prioritization performance of the alternative methods. The benchmarking algorithms and data sets are available at http://intcomp.r-forge.r-project.orgComment: PDF file including supplementary material. 9 pages. Preprin

    Sparse integrative clustering of multiple omics data sets

    Get PDF
    High resolution microarrays and second-generation sequencing platforms are powerful tools to investigate genome-wide alterations in DNA copy number, methylation and gene expression associated with a disease. An integrated genomic profiling approach measures multiple omics data types simultaneously in the same set of biological samples. Such approach renders an integrated data resolution that would not be available with any single data type. In this study, we use penalized latent variable regression methods for joint modeling of multiple omics data types to identify common latent variables that can be used to cluster patient samples into biologically and clinically relevant disease subtypes. We consider lasso [J. Roy. Statist. Soc. Ser. B 58 (1996) 267-288], elastic net [J. R. Stat. Soc. Ser. B Stat. Methodol. 67 (2005) 301-320] and fused lasso [J. R. Stat. Soc. Ser. B Stat. Methodol. 67 (2005) 91-108] methods to induce sparsity in the coefficient vectors, revealing important genomic features that have significant contributions to the latent variables. An iterative ridge regression is used to compute the sparse coefficient vectors. In model selection, a uniform design [Monographs on Statistics and Applied Probability (1994) Chapman & Hall] is used to seek "experimental" points that scattered uniformly across the search domain for efficient sampling of tuning parameter combinations. We compared our method to sparse singular value decomposition (SVD) and penalized Gaussian mixture model (GMM) using both real and simulated data sets. The proposed method is applied to integrate genomic, epigenomic and transcriptomic data for subtype analysis in breast and lung cancer data sets.Comment: Published in at http://dx.doi.org/10.1214/12-AOAS578 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Unconventional machine learning of genome-wide human cancer data

    Full text link
    Recent advances in high-throughput genomic technologies coupled with exponential increases in computer processing and memory have allowed us to interrogate the complex aberrant molecular underpinnings of human disease from a genome-wide perspective. While the deluge of genomic information is expected to increase, a bottleneck in conventional high-performance computing is rapidly approaching. Inspired in part by recent advances in physical quantum processors, we evaluated several unconventional machine learning (ML) strategies on actual human tumor data. Here we show for the first time the efficacy of multiple annealing-based ML algorithms for classification of high-dimensional, multi-omics human cancer data from the Cancer Genome Atlas. To assess algorithm performance, we compared these classifiers to a variety of standard ML methods. Our results indicate the feasibility of using annealing-based ML to provide competitive classification of human cancer types and associated molecular subtypes and superior performance with smaller training datasets, thus providing compelling empirical evidence for the potential future application of unconventional computing architectures in the biomedical sciences

    Cancer diagnosis using deep learning: A bibliographic review

    Get PDF
    In this paper, we first describe the basics of the field of cancer diagnosis, which includes steps of cancer diagnosis followed by the typical classification methods used by doctors, providing a historical idea of cancer classification techniques to the readers. These methods include Asymmetry, Border, Color and Diameter (ABCD) method, seven-point detection method, Menzies method, and pattern analysis. They are used regularly by doctors for cancer diagnosis, although they are not considered very efficient for obtaining better performance. Moreover, considering all types of audience, the basic evaluation criteria are also discussed. The criteria include the receiver operating characteristic curve (ROC curve), Area under the ROC curve (AUC), F1 score, accuracy, specificity, sensitivity, precision, dice-coefficient, average accuracy, and Jaccard index. Previously used methods are considered inefficient, asking for better and smarter methods for cancer diagnosis. Artificial intelligence and cancer diagnosis are gaining attention as a way to define better diagnostic tools. In particular, deep neural networks can be successfully used for intelligent image analysis. The basic framework of how this machine learning works on medical imaging is provided in this study, i.e., pre-processing, image segmentation and post-processing. The second part of this manuscript describes the different deep learning techniques, such as convolutional neural networks (CNNs), generative adversarial models (GANs), deep autoencoders (DANs), restricted Boltzmann’s machine (RBM), stacked autoencoders (SAE), convolutional autoencoders (CAE), recurrent neural networks (RNNs), long short-term memory (LTSM), multi-scale convolutional neural network (M-CNN), multi-instance learning convolutional neural network (MIL-CNN). For each technique, we provide Python codes, to allow interested readers to experiment with the cited algorithms on their own diagnostic problems. The third part of this manuscript compiles the successfully applied deep learning models for different types of cancers. Considering the length of the manuscript, we restrict ourselves to the discussion of breast cancer, lung cancer, brain cancer, and skin cancer. The purpose of this bibliographic review is to provide researchers opting to work in implementing deep learning and artificial neural networks for cancer diagnosis a knowledge from scratch of the state-of-the-art achievements

    Network inference and community detection, based on covariance matrices, correlations and test statistics from arbitrary distributions

    Get PDF
    In this paper we propose methodology for inference of binary-valued adjacency matrices from various measures of the strength of association between pairs of network nodes, or more generally pairs of variables. This strength of association can be quantified by sample covariance and correlation matrices, and more generally by test-statistics and hypothesis test p-values from arbitrary distributions. Community detection methods such as block modelling typically require binary-valued adjacency matrices as a starting point. Hence, a main motivation for the methodology we propose is to obtain binary-valued adjacency matrices from such pairwise measures of strength of association between variables. The proposed methodology is applicable to large high-dimensional data-sets and is based on computationally efficient algorithms. We illustrate its utility in a range of contexts and data-sets

    A global transcriptional network connecting noncoding mutations to changes in tumor gene expression.

    Get PDF
    Although cancer genomes are replete with noncoding mutations, the effects of these mutations remain poorly characterized. Here we perform an integrative analysis of 930 tumor whole genomes and matched transcriptomes, identifying a network of 193 noncoding loci in which mutations disrupt target gene expression. These 'somatic eQTLs' (expression quantitative trait loci) are frequently mutated in specific cancer tissues, and the majority can be validated in an independent cohort of 3,382 tumors. Among these, we find that the effects of noncoding mutations on DAAM1, MTG2 and HYI transcription are recapitulated in multiple cancer cell lines and that increasing DAAM1 expression leads to invasive cell migration. Collectively, the noncoding loci converge on a set of core pathways, permitting a classification of tumors into pathway-based subtypes. The somatic eQTL network is disrupted in 88% of tumors, suggesting widespread impact of noncoding mutations in cancer
    • …
    corecore