29,782 research outputs found

    Machine Learning and Integrative Analysis of Biomedical Big Data.

    Get PDF
    Recent developments in high-throughput technologies have accelerated the accumulation of massive amounts of omics data from multiple sources: genome, epigenome, transcriptome, proteome, metabolome, etc. Traditionally, data from each source (e.g., genome) is analyzed in isolation using statistical and machine learning (ML) methods. Integrative analysis of multi-omics and clinical data is key to new biomedical discoveries and advancements in precision medicine. However, data integration poses new computational challenges as well as exacerbates the ones associated with single-omics studies. Specialized computational approaches are required to effectively and efficiently perform integrative analysis of biomedical data acquired from diverse modalities. In this review, we discuss state-of-the-art ML-based approaches for tackling five specific computational challenges associated with integrative analysis: curse of dimensionality, data heterogeneity, missing data, class imbalance and scalability issues

    The challenges of purely mechanistic models in biology and the minimum need for a 'mechanism-plus-X' framework

    Get PDF
    Ever since the advent of molecular biology in the 1970s, mechanical models have become the dogma in the field, where a "true" understanding of any subject is equated to a mechanistic description. This has been to the detriment of the biomedical sciences, where, barring some exceptions, notable new feats of understanding have arguably not been achieved in normal and disease biology, including neurodegenerative disease and cancer pathobiology. I argue for a "mechanism-plus-X" paradigm, where mainstay elements of mechanistic models such as hierarchy and correlation are combined with nomological principles such as general operative rules and generative principles. Depending on the question at hand and the nature of the inquiry, X could range from proven physical laws to speculative biological generalizations, such as the notional principle of cellular synchrony. I argue that the "mechanism-plus-X" approach should ultimately aim to move biological inquiries out of the deadlock of oft-encountered mechanistic pitfalls and reposition biology to its former capacity of illuminating fundamental truths about the world

    Systems biology in animal sciences

    Get PDF
    Systems biology is a rapidly expanding field of research and is applied in a number of biological disciplines. In animal sciences, omics approaches are increasingly used, yielding vast amounts of data, but systems biology approaches to extract understanding from these data of biological processes and animal traits are not yet frequently used. This paper aims to explain what systems biology is and which areas of animal sciences could benefit from systems biology approaches. Systems biology aims to understand whole biological systems working as a unit, rather than investigating their individual components. Therefore, systems biology can be considered a holistic approach, as opposed to reductionism. The recently developed ‘omics’ technologies enable biological sciences to characterize the molecular components of life with ever increasing speed, yielding vast amounts of data. However, biological functions do not follow from the simple addition of the properties of system components, but rather arise from the dynamic interactions of these components. Systems biology combines statistics, bioinformatics and mathematical modeling to integrate and analyze large amounts of data in order to extract a better understanding of the biology from these huge data sets and to predict the behavior of biological systems. A ‘system’ approach and mathematical modeling in biological sciences are not new in itself, as they were used in biochemistry, physiology and genetics long before the name systems biology was coined. However, the present combination of mass biological data and of computational and modeling tools is unprecedented and truly represents a major paradigm shift in biology. Significant advances have been made using systems biology approaches, especially in the field of bacterial and eukaryotic cells and in human medicine. Similarly, progress is being made with ‘system approaches’ in animal sciences, providing exciting opportunities to predict and modulate animal traits

    Pathway-Based Genomics Prediction using Generalized Elastic Net.

    Get PDF
    We present a novel regularization scheme called The Generalized Elastic Net (GELnet) that incorporates gene pathway information into feature selection. The proposed formulation is applicable to a wide variety of problems in which the interpretation of predictive features using known molecular interactions is desired. The method naturally steers solutions toward sets of mechanistically interlinked genes. Using experiments on synthetic data, we demonstrate that pathway-guided results maintain, and often improve, the accuracy of predictors even in cases where the full gene network is unknown. We apply the method to predict the drug response of breast cancer cell lines. GELnet is able to reveal genetic determinants of sensitivity and resistance for several compounds. In particular, for an EGFR/HER2 inhibitor, it finds a possible trans-differentiation resistance mechanism missed by the corresponding pathway agnostic approach

    Integration of molecular network data reconstructs Gene Ontology.

    Get PDF
    Motivation: Recently, a shift was made from using Gene Ontology (GO) to evaluate molecular network data to using these data to construct and evaluate GO. Dutkowski et al. provide the first evidence that a large part of GO can be reconstructed solely from topologies of molecular networks. Motivated by this work, we develop a novel data integration framework that integrates multiple types of molecular network data to reconstruct and update GO. We ask how much of GO can be recovered by integrating various molecular interaction data. Results: We introduce a computational framework for integration of various biological networks using penalized non-negative matrix tri-factorization (PNMTF). It takes all network data in a matrix form and performs simultaneous clustering of genes and GO terms, inducing new relations between genes and GO terms (annotations) and between GO terms themselves. To improve the accuracy of our predicted relations, we extend the integration methodology to include additional topological information represented as the similarity in wiring around non-interacting genes. Surprisingly, by integrating topologies of bakers’ yeasts protein–protein interaction, genetic interaction (GI) and co-expression networks, our method reports as related 96% of GO terms that are directly related in GO. The inclusion of the wiring similarity of non-interacting genes contributes 6% to this large GO term association capture. Furthermore, we use our method to infer new relationships between GO terms solely from the topologies of these networks and validate 44% of our predictions in the literature. In addition, our integration method reproduces 48% of cellular component, 41% of molecular function and 41% of biological process GO terms, outperforming the previous method in the former two domains of GO. Finally, we predict new GO annotations of yeast genes and validate our predictions through GIs profiling. Availability and implementation: Supplementary Tables of new GO term associations and predicted gene annotations are available at http://bio-nets.doc.ic.ac.uk/GO-Reconstruction/. Contact: [email protected] Supplementary information: Supplementary data are available at Bioinformatics online
    • …
    corecore