62,126 research outputs found
Machine Learning and Integrative Analysis of Biomedical Big Data.
Recent developments in high-throughput technologies have accelerated the accumulation of massive amounts of omics data from multiple sources: genome, epigenome, transcriptome, proteome, metabolome, etc. Traditionally, data from each source (e.g., genome) is analyzed in isolation using statistical and machine learning (ML) methods. Integrative analysis of multi-omics and clinical data is key to new biomedical discoveries and advancements in precision medicine. However, data integration poses new computational challenges as well as exacerbates the ones associated with single-omics studies. Specialized computational approaches are required to effectively and efficiently perform integrative analysis of biomedical data acquired from diverse modalities. In this review, we discuss state-of-the-art ML-based approaches for tackling five specific computational challenges associated with integrative analysis: curse of dimensionality, data heterogeneity, missing data, class imbalance and scalability issues
Sparse Probit Linear Mixed Model
Linear Mixed Models (LMMs) are important tools in statistical genetics. When
used for feature selection, they allow to find a sparse set of genetic traits
that best predict a continuous phenotype of interest, while simultaneously
correcting for various confounding factors such as age, ethnicity and
population structure. Formulated as models for linear regression, LMMs have
been restricted to continuous phenotypes. We introduce the Sparse Probit Linear
Mixed Model (Probit-LMM), where we generalize the LMM modeling paradigm to
binary phenotypes. As a technical challenge, the model no longer possesses a
closed-form likelihood function. In this paper, we present a scalable
approximate inference algorithm that lets us fit the model to high-dimensional
data sets. We show on three real-world examples from different domains that in
the setup of binary labels, our algorithm leads to better prediction accuracies
and also selects features which show less correlation with the confounding
factors.Comment: Published version, 21 pages, 6 figure
- …