528 research outputs found

    Machine Learning and Integrative Analysis of Biomedical Big Data.

    Get PDF
    Recent developments in high-throughput technologies have accelerated the accumulation of massive amounts of omics data from multiple sources: genome, epigenome, transcriptome, proteome, metabolome, etc. Traditionally, data from each source (e.g., genome) is analyzed in isolation using statistical and machine learning (ML) methods. Integrative analysis of multi-omics and clinical data is key to new biomedical discoveries and advancements in precision medicine. However, data integration poses new computational challenges as well as exacerbates the ones associated with single-omics studies. Specialized computational approaches are required to effectively and efficiently perform integrative analysis of biomedical data acquired from diverse modalities. In this review, we discuss state-of-the-art ML-based approaches for tackling five specific computational challenges associated with integrative analysis: curse of dimensionality, data heterogeneity, missing data, class imbalance and scalability issues

    Unsupervised Algorithms for Microarray Sample Stratification

    Get PDF
    The amount of data made available by microarrays gives researchers the opportunity to delve into the complexity of biological systems. However, the noisy and extremely high-dimensional nature of this kind of data poses significant challenges. Microarrays allow for the parallel measurement of thousands of molecular objects spanning different layers of interactions. In order to be able to discover hidden patterns, the most disparate analytical techniques have been proposed. Here, we describe the basic methodologies to approach the analysis of microarray datasets that focus on the task of (sub)group discovery.Peer reviewe

    Network-based stratification of tumor mutations.

    Get PDF
    Many forms of cancer have multiple subtypes with different causes and clinical outcomes. Somatic tumor genome sequences provide a rich new source of data for uncovering these subtypes but have proven difficult to compare, as two tumors rarely share the same mutations. Here we introduce network-based stratification (NBS), a method to integrate somatic tumor genomes with gene networks. This approach allows for stratification of cancer into informative subtypes by clustering together patients with mutations in similar network regions. We demonstrate NBS in ovarian, uterine and lung cancer cohorts from The Cancer Genome Atlas. For each tissue, NBS identifies subtypes that are predictive of clinical outcomes such as patient survival, response to therapy or tumor histology. We identify network regions characteristic of each subtype and show how mutation-derived subtypes can be used to train an mRNA expression signature, which provides similar information in the absence of DNA sequence

    BayesCCE: a Bayesian framework for estimating cell-type composition from DNA methylation without the need for methylation reference.

    Get PDF
    We introduce a Bayesian semi-supervised method for estimating cell counts from DNA methylation by leveraging an easily obtainable prior knowledge on the cell-type composition distribution of the studied tissue. We show mathematically and empirically that alternative methods which attempt to infer cell counts without methylation reference only capture linear combinations of cell counts rather than provide one component per cell type. Our approach allows the construction of components such that each component corresponds to a single cell type, and provides a new opportunity to investigate cell compositions in genomic studies of tissues for which it was not possible before

    Integrative methods for analyzing big data in precision medicine

    Get PDF
    We provide an overview of recent developments in big data analyses in the context of precision medicine and health informatics. With the advance in technologies capturing molecular and medical data, we entered the area of “Big Data” in biology and medicine. These data offer many opportunities to advance precision medicine. We outline key challenges in precision medicine and present recent advances in data integration-based methods to uncover personalized information from big data produced by various omics studies. We survey recent integrative methods for disease subtyping, biomarkers discovery, and drug repurposing, and list the tools that are available to domain scientists. Given the ever-growing nature of these big data, we highlight key issues that big data integration methods will face

    Wavelet-Based Cancer Drug Recommender System

    Get PDF
    A natureza molecular do cancro serve de base para estudos sistemáticos de genomas cancerígenos, fornecendo valiosos insights e permitindo o desenvolvimento de tratamentos clínicos. Acima de tudo, estes estudos estão a impulsionar o uso clínico de informação genómica na escolha de tratamentos, de outro modo não expectáveis, em pacientes com diversos tipos de cancro, possibilitando a medicina de precisão. Com isso em mente, neste projeto combinamos técnicas de processamento de imagem, para aprimoramento de dados, e sistemas de recomendação para propor um ranking personalizado de drogas anticancerígenas. O sistema é implementado em Python e testado usando uma base de dados que contém registos de sensibilidade a drogas, com mais de 310.000 IC50 que, por sua vez, descrevem a resposta de mais de 300 drogas anticancerígenas em 987 linhas celulares cancerígenas. Após várias tarefas de pré-processamento, são realizadas duas experiências. A primeira experiência usa as imagens originais de microarrays de DNA e a segunda usa as mesmas imagens, mas submetidas a uma transformada wavelet. As experiências confirmam que as imagens de microarrays de DNA submetidas a transformadas wavelet melhoram o desempenho do sistema de recomendação, otimizando a pesquisa de linhas celulares cancerígenas com perfil semelhante ao da nova linha celular. Além disso, concluímos que as imagens de microarrays de DNA com transformadas de wavelet apropriadas, não apenas fornecem informações mais ricas para a pesquisa de utilizadores similares, mas também comprimem essas imagens com eficiência, otimizando os recursos computacionais. Tanto quanto é do nosso conhecimento, este projeto é inovador no que diz respeito ao uso de imagens de microarrays de DNA submetidas a transformadas wavelet, para perfilar linhas celulares num sistema de recomendação personalizado de drogas anticancerígenas
    corecore