1,639 research outputs found

    Visualization and analysis of cancer genome sequencing studies

    Full text link
    Large-scale genomics projects such as the Cancer Genome Atlas (TCGA), and the Encyclopedia of DNA Elements (ENCODE) involve generation of data at an unprecedented scale, requiring new computational techniques for analysis and interpretation. In the three studies I present in this thesis, I utilize these data sources to derive biological insights or created visualization tools that enable others to obtain insights more easily. First, I examine the distribution of the lengths for copy number variations (CNVs) in the cancer genome. This analysis shows that a small number of genes are altered at a greater frequency than expected from a power law distribution, suggesting that a large number of genomes must be sequenced for a given tumor type to a comprehensive discovery of somatic mutations. Second, I investigate germline CNVs in thousands of TCGA samples using single nucleotide polymorphism (SNP) array data to find variants that may confer increased susceptibility to cancer. This CNV-based genome-wide association study resulted in many germline CNVs that potentially increase risk in brain, breast, colorectal, renal, or ovarian cancers. Finally, I apply several visualization techniques to create tools for the TCGA and ENCODE projects in order to help investigators better process and synthesize meaning from large volume of data. Seqeyes combines linear and circular genomic views to explore predicted structural variations to help guide experimental validation. The modEncode browser visualizes chromatin organization by integrating data from a multitude of histone marks and chromosomal proteins. These results present visualization as a useful strategy for rapid identification of salient genomic features from large, heterogeneous genomic datasets

    Machine Learning and Integrative Analysis of Biomedical Big Data.

    Get PDF
    Recent developments in high-throughput technologies have accelerated the accumulation of massive amounts of omics data from multiple sources: genome, epigenome, transcriptome, proteome, metabolome, etc. Traditionally, data from each source (e.g., genome) is analyzed in isolation using statistical and machine learning (ML) methods. Integrative analysis of multi-omics and clinical data is key to new biomedical discoveries and advancements in precision medicine. However, data integration poses new computational challenges as well as exacerbates the ones associated with single-omics studies. Specialized computational approaches are required to effectively and efficiently perform integrative analysis of biomedical data acquired from diverse modalities. In this review, we discuss state-of-the-art ML-based approaches for tackling five specific computational challenges associated with integrative analysis: curse of dimensionality, data heterogeneity, missing data, class imbalance and scalability issues

    Computational pan-genomics: status, promises and challenges

    Get PDF
    International audienceMany disciplines, from human genetics and oncology to plant breeding, microbiology and virology, commonly face the challenge of analyzing rapidly increasing numbers of genomes. In case of Homo sapiens, the number of sequenced genomes will approach hundreds of thousands in the next few years. Simply scaling up established bioinformatics pipelines will not be sufficient for leveraging the full potential of such rich genomic data sets. Instead, novel, qualitatively different computational methods and paradigms are needed. We will witness the rapid extension of computational pan-genomics, a new sub-area of research in computational biology. In this article, we generalize existing definitions and understand a pan-genome as any collection of genomic sequences to be analyzed jointly or to be used as a reference. We examine already available approaches to construct and use pan-genomes, discuss the potential benefits of future technologies and methodologies and review open challenges from the vantage point of the above-mentioned biological disciplines. As a prominent example for a computational paradigm shift, we particularly highlight the transition from the representation of reference genomes as strings to representations as graphs. We outline how this and other challenges from different application domains translate into common computational problems, point out relevant bioinformatics techniques and identify open problems in computer science. With this review, we aim to increase awareness that a joint approach to computational pan-genomics can help address many of the problems currently faced in various domains

    PANTHER: Pathway Augmented Nonnegative Tensor factorization for HighER-order feature learning

    Full text link
    Genetic pathways usually encode molecular mechanisms that can inform targeted interventions. It is often challenging for existing machine learning approaches to jointly model genetic pathways (higher-order features) and variants (atomic features), and present to clinicians interpretable models. In order to build more accurate and better interpretable machine learning models for genetic medicine, we introduce Pathway Augmented Nonnegative Tensor factorization for HighER-order feature learning (PANTHER). PANTHER selects informative genetic pathways that directly encode molecular mechanisms. We apply genetically motivated constrained tensor factorization to group pathways in a way that reflects molecular mechanism interactions. We then train a softmax classifier for disease types using the identified pathway groups. We evaluated PANTHER against multiple state-of-the-art constrained tensor/matrix factorization models, as well as group guided and Bayesian hierarchical models. PANTHER outperforms all state-of-the-art comparison models significantly (p<0.05). Our experiments on large scale Next Generation Sequencing (NGS) and whole-genome genotyping datasets also demonstrated wide applicability of PANTHER. We performed feature analysis in predicting disease types, which suggested insights and benefits of the identified pathway groups.Comment: Accepted by 35th AAAI Conference on Artificial Intelligence (AAAI 2021

    Innovating Computational Biology and Intelligent Medicine: ICIBM 2019 Special Issue

    Get PDF
    The International Association for Intelligent Biology and Medicine (IAIBM) is a nonprofit organization that promotes intelligent biology and medical science. It hosts an annual International Conference on Intelligent Biology and Medicine (ICIBM), which was established in 2012. The ICIBM 2019 was held from 9 to 11 June 2019 in Columbus, Ohio, USA. Out of the 105 original research manuscripts submitted to the conference, 18 were selected for publication in a Special Issue in Genes. The topics of the selected manuscripts cover a wide range of current topics in biomedical research including cancer informatics, transcriptomic, computational algorithms, visualization and tools, deep learning, and microbiome research. In this editorial, we briefly introduce each of the manuscripts and discuss their contribution to the advance of science and technology

    DPVis: Visual Analytics with Hidden Markov Models for Disease Progression Pathways

    Full text link
    Clinical researchers use disease progression models to understand patient status and characterize progression patterns from longitudinal health records. One approach for disease progression modeling is to describe patient status using a small number of states that represent distinctive distributions over a set of observed measures. Hidden Markov models (HMMs) and its variants are a class of models that both discover these states and make inferences of health states for patients. Despite the advantages of using the algorithms for discovering interesting patterns, it still remains challenging for medical experts to interpret model outputs, understand complex modeling parameters, and clinically make sense of the patterns. To tackle these problems, we conducted a design study with clinical scientists, statisticians, and visualization experts, with the goal to investigate disease progression pathways of chronic diseases, namely type 1 diabetes (T1D), Huntington's disease, Parkinson's disease, and chronic obstructive pulmonary disease (COPD). As a result, we introduce DPVis which seamlessly integrates model parameters and outcomes of HMMs into interpretable and interactive visualizations. In this study, we demonstrate that DPVis is successful in evaluating disease progression models, visually summarizing disease states, interactively exploring disease progression patterns, and building, analyzing, and comparing clinically relevant patient subgroups.Comment: to appear at IEEE Transactions on Visualization and Computer Graphic

    Molecular Beacons: Powerful Tools for Imaging RNA in Living Cells

    Get PDF
    Recent advances in RNA functional studies highlights the pivotal role of these molecules in cell physiology. Diverse methods have been implemented to measure the expression levels of various RNA species, using either purified RNA or fixed cells. Despite the fact that fixed cells offer the possibility to observe the spatial distribution of RNA, assays with capability to real-time monitoring RNA transport into living cells are needed to further understand the role of RNA dynamics in cellular functions. Molecular beacons (MBs) are stem-loop hairpin-structured oligonucleotides equipped with a fluorescence quencher at one end and a fluorescent dye (also called reporter or fluorophore) at the opposite end. This structure permits that MB in the absence of their target complementary sequence do not fluoresce. Upon binding to targets, MBs emit fluorescence, due to the spatial separation of the quencher and the reporter. Molecular beacons are promising probes for the development of RNA imaging techniques; nevertheless much work remains to be done in order to obtain a robust technology for imaging various RNA molecules together in real time and in living cells. The present work concentrates on the different requirements needed to use successfully MB for cellular studies, summarizing recent advances in this area

    The Role of Axl in Cancer and Stem Cell Plasticity: in vivo Lineage Tracing and Imaging Mass Cytometry Analysis

    Get PDF
    Postponed access: the file will be accessible after 2020-09-02BMED395MAMD-MEDB
    corecore