9,758 research outputs found

    Machine Learning and Integrative Analysis of Biomedical Big Data.

    Get PDF
    Recent developments in high-throughput technologies have accelerated the accumulation of massive amounts of omics data from multiple sources: genome, epigenome, transcriptome, proteome, metabolome, etc. Traditionally, data from each source (e.g., genome) is analyzed in isolation using statistical and machine learning (ML) methods. Integrative analysis of multi-omics and clinical data is key to new biomedical discoveries and advancements in precision medicine. However, data integration poses new computational challenges as well as exacerbates the ones associated with single-omics studies. Specialized computational approaches are required to effectively and efficiently perform integrative analysis of biomedical data acquired from diverse modalities. In this review, we discuss state-of-the-art ML-based approaches for tackling five specific computational challenges associated with integrative analysis: curse of dimensionality, data heterogeneity, missing data, class imbalance and scalability issues

    Transcriptional landscape of neuronal and cancer stem cells

    Get PDF
    Tumor mass is composed by heterogeneous cell population including a subset of “cancer stem cells” (CSC). Oncogenic signals foster CSC by transforming tissue stem cells or by reprogramming progenitor/differentiated cells towards stemness. Thus, CSC share features with cancer and stem cells (e.g. self-renewal, hierarchical developmental program leading to differentiated cells, epithelial/mesenchimal transition) and these latter are maintained by the constitutive activation of stemness-promoting signals. CSC could trigger tumor formation, drive to resistance to conventional therapeutics and underlie patients’ relapse. Indeed, stem cell signatures have been associated with poor prognosis in various. This background makes the identification of CSC molecular features mandatory to highlight the survival inner working and to design novel CSC specific therapeutic strategies. Medulloblastoma (MB) is the most common childhood malignant brain tumor and a leading cause of cancerrelated morbidity and mortality. Current multimodal therapies are effective in about 50% of patients but often cause long-term side effects, i.e. developmental, neurological, neuroendocrine and psychosocial deficits (Northcott PA Nature Rev cancer 2012). For many years, MB treated as a single tumor entity despite the divergent tumor histology, patients’ outcome and drug sensitivity, and also by the diversity of the stem cell of origin. Very recently the scenario of human MB has dramatically changed since its heterogeneous biology has been addressed by high-throughput gene expression analysis (oligonucleotide microarrays) or by the powerful genomic next-generation sequencing. These led to the identification of four tumor subgroups (WNT, SHH, Group 3 and Group 4) uncovering the existence of a highly diverse mutational spectra and gene expression. However a quantitative approach has not yet been applied to the transcriptional landscape of Medulloblastoma stem cells (MbSC) through RNA Next Generation Sequencing (RNA-Seq) technology. This is a relevant issue, since RNA-Seq is able to interrogate the genome wide global transcriptome including new transcripts, alternative spliced isoforms and non-coding RNAs. Lower rhombic lip progenitors of the dorsal brainstem are considered the trigger cells in WNT tumors; in SHH subgroup initiation cells are Prominin1+ CD15+ stem cells from the subventricular zone requiring the commitment to Math1+ granule cell progenitors [GCP] of the external granule cell layer [EGL]; while Math1+ or Math1- EGL-GCP or Prominin1+/lineage-negative stem cells sustain the MYC driven Group 3. MbSC derived from SHH tumors and postnatal normal cerebellar stem cells (NcSC) have been reported to share several features. A key signal for both of them is Hedgehog. Furthermore, both NcSC and MbSC display up-regulation of stemness genes (e.g Sox2, Nestin, Nanog, Prom1). Finally, constitutive activation of the Shh pathway by conditional deletion of Ptch1 inhibitory receptor in NcSC, promote medulloblastoma in vivo, producing a mouse model of the human SHH tumor. Acquisition of stemness features may therefore represent the first step of oncogenic conversion. Cooperation with additional oncogenic signals is however needed to enhance MbSC tumorigenicity. In order to understand the MbSCs transcriptional programs, we analyze by RNA-Seq, MbSC derived from Ptch1+/- tumors (Ptch1+/- MbSC). This choice, of a genetically determined model of MB, has allowed us to work with Ptch1+/- MbSC together with appropriate NcSC counterpart, and to analyze biological replicates doing statistical analysis. We identify a number of transcripts, annotated ones, novel isoforms, and long non-coding RNAs, characterizing MbSC and/or NcSC. Some of these genes control stemness or are cancer related and conserved in human medulloblastomas. Interestingly a subset of them, belonging to cell stress response, are of prognostic relevance being significantly related to clinical outcome. Correlation of genes expression characterizing MbSC with survival information from our human medulloblastomas database further demonstrates the significance of these findings. Our data suggest that the modulation of normal and cancer stem cell functions observed in vitro is effective in dissecting the transcriptional programs underlying the in vivo behavior of human medulloblastomas

    NOVEL APPLICATIONS OF MACHINE LEARNING IN BIOINFORMATICS

    Get PDF
    Technological advances in next-generation sequencing and biomedical imaging have led to a rapid increase in biomedical data dimension and acquisition rate, which is challenging the conventional data analysis strategies. Modern machine learning techniques promise to leverage large data sets for finding hidden patterns within them, and for making accurate predictions. This dissertation aims to design novel machine learning-based models to transform biomedical big data into valuable biological insights. The research presented in this dissertation focuses on three bioinformatics domains: splice junction classification, gene regulatory network reconstruction, and lesion detection in mammograms. A critical step in defining gene structures and mRNA transcript variants is to accurately identify splice junctions. In the first work, we built the first deep learning-based splice junction classifier, DeepSplice. It outperforms the state-of-the-art classification tools in terms of both classification accuracy and computational efficiency. To uncover transcription factors governing metabolic reprogramming in non-small-cell lung cancer patients, we developed TFmeta, a machine learning approach to reconstruct relationships between transcription factors and their target genes in the second work. Our approach achieves the best performance on benchmark data sets. In the third work, we designed deep learning-based architectures to perform lesion detection in both 2D and 3D whole mammogram images
    • …
    corecore