29,235 research outputs found

    Discovering Higher-order SNP Interactions in High-dimensional Genomic Data

    Get PDF
    In this thesis, a multifactor dimensionality reduction based method on associative classification is employed to identify higher-order SNP interactions for enhancing the understanding of the genetic architecture of complex diseases. Further, this thesis explored the application of deep learning techniques by providing new clues into the interaction analysis. The performance of the deep learning method is maximized by unifying deep neural networks with a random forest for achieving reliable interactions in the presence of noise

    Machine Learning and Integrative Analysis of Biomedical Big Data.

    Get PDF
    Recent developments in high-throughput technologies have accelerated the accumulation of massive amounts of omics data from multiple sources: genome, epigenome, transcriptome, proteome, metabolome, etc. Traditionally, data from each source (e.g., genome) is analyzed in isolation using statistical and machine learning (ML) methods. Integrative analysis of multi-omics and clinical data is key to new biomedical discoveries and advancements in precision medicine. However, data integration poses new computational challenges as well as exacerbates the ones associated with single-omics studies. Specialized computational approaches are required to effectively and efficiently perform integrative analysis of biomedical data acquired from diverse modalities. In this review, we discuss state-of-the-art ML-based approaches for tackling five specific computational challenges associated with integrative analysis: curse of dimensionality, data heterogeneity, missing data, class imbalance and scalability issues

    Extending Tlusty's method to the glycome: Tuning the repertoire of glycan determinants

    Get PDF
    We apply Tlusty's information-theoretic analysis of the genetic code to the glycome, using a cognitive paradigm in which external information sources constrain and tune the glycan code error network, in the context of available metabolic energy. The resulting dynamic model suggests the possibility of observing spontaneous symmetry breaking of the glycan code as a function of metabolic energy intensity. These effects may be currently present, or embedded in evolutionary trajectory, recording large-scale ecosystem resilience shifts in energy availability such as the aerobic transition

    LINKING ENVIRONMENTAL AND MICROBIAL PROCESSES FROM COMMUNITY TO GLOBAL SCALES

    Get PDF
    Life and the environment are inextricably interconnected. From the scale of a single microbe to the entire Earth system, biological and environmental processes have coevolved over billions of years into a complex system of interactions and feedbacks that together produce the geochemical and ecological conditions we observe around us. Community-scale processes result in net biogeochemical fluxes, which vary across regional and global scales in predictable patterns. At the community, regional, and global scale, this dissertation addresses a question central to our understanding of environmental microbial systems: How do microbial community interactions with their environment govern their functional and ecological role in the ecosystem, and how do environmental conditions shape the distribution and functional capacities of microbial genetic diversity? I demonstrate that microbial carbon cycling capacities in warm core ring waters originating from the Gulf Stream during an eddy intrusion event on the Mid-Atlantic Bight continental slope are distinct from those occurring in other shelf and shelf break water masses, illuminating the relationship between marine microbial communities and physical processes at the regional scale. As these eddy intrusion events likely increase in the future, these regional scale interactions have functional and biogeochemical implications in both present and future oceans. At the global scale, I build models to accurately predict genetic diversity of the key marine heterotroph SAR86 from environmental variables, identifying five previously unrecognized ecotypes within the SAR86 clade characterized by distinct environmental distributions, and resulting in the first global-resolution projections of SAR86 ecotype biogeography. From the community to the global scale, each level of inquiry demands solutions tailored to address the key challenges and opportunities unique to it, and new approaches are brought to bear at small and large scales, developing a more effective method to measure microbial activities in sediments to expand the range of environments for which microbial activity measurements are feasible, and providing a data discovery tool that harnesses the potential of publicly available sequencing datasets to scale data-driven discovery to ever more complex microbial systems.Doctor of Philosoph
    • …
    corecore