125,453 research outputs found

    MULTI-DIMENSIONAL ANALYSIS APPROACHES FOR HETEROGENEOUS SINGLE-CELL DATA

    Get PDF
    Improvements in experimental techniques have led to an explosion of information in biology research. The increasing number of measurements comes with challenges in analyzing resulting data, as well as opportunities to obtain deeper insights of biological systems. Conventional average based methods are unfit to analyze high dimensional datasets since they fail to take full advantage of such rich information. More importantly, they are not able to capture the heterogeneity that is prevalent in biological systems. Sophisticated algorithms that are able to utilize all available measurements simultaneously are hence emerging rapidly. These algorithms excel at making full use of information within datasets and revealing detailed heterogeneity. However, there are several important disadvantages of existing algorithms. First, specific knowledge in statistics or machine learning is required to appropriately interpret and tune parameters in these algorithms for future use. This may result in misusage and misinterpretation. Second, using all measurements with equal weighting runs the risk of noise contamination. In addition, information overload has become more common in biology research, with a large volume of irrelevant measurements. Third, regardless of the quality of measurements, analysis methods that simultaneously use a large number of measurements need to avoid the “curse of dimensionality”, which warns that distance estimation and nearest neighbor estimation are not meaningful in high dimensional space. However, most current sophisticated algorithms involve distance estimation and/or nearest neighbor estimation. In this dissertation, my goal is to build analysis methods that are complex enough to capture heterogeneity and at the same time output results in a format that is easy to interpret and familiar to biologists and medical researchers. I tackle the dimension reduction problem by finding not the best subspace but dividing them into multiple subspaces and examine them one by one. I demonstrate my methods with three types of datasets: image-based high-throughput screening data, flow cytometry data, and mass cytometry data. From each dataset, I was able to discover new biological insights as well as re-validate well-established findings with my methods

    Machine Learning and Integrative Analysis of Biomedical Big Data.

    Get PDF
    Recent developments in high-throughput technologies have accelerated the accumulation of massive amounts of omics data from multiple sources: genome, epigenome, transcriptome, proteome, metabolome, etc. Traditionally, data from each source (e.g., genome) is analyzed in isolation using statistical and machine learning (ML) methods. Integrative analysis of multi-omics and clinical data is key to new biomedical discoveries and advancements in precision medicine. However, data integration poses new computational challenges as well as exacerbates the ones associated with single-omics studies. Specialized computational approaches are required to effectively and efficiently perform integrative analysis of biomedical data acquired from diverse modalities. In this review, we discuss state-of-the-art ML-based approaches for tackling five specific computational challenges associated with integrative analysis: curse of dimensionality, data heterogeneity, missing data, class imbalance and scalability issues

    Data-driven modelling of biological multi-scale processes

    Full text link
    Biological processes involve a variety of spatial and temporal scales. A holistic understanding of many biological processes therefore requires multi-scale models which capture the relevant properties on all these scales. In this manuscript we review mathematical modelling approaches used to describe the individual spatial scales and how they are integrated into holistic models. We discuss the relation between spatial and temporal scales and the implication of that on multi-scale modelling. Based upon this overview over state-of-the-art modelling approaches, we formulate key challenges in mathematical and computational modelling of biological multi-scale and multi-physics processes. In particular, we considered the availability of analysis tools for multi-scale models and model-based multi-scale data integration. We provide a compact review of methods for model-based data integration and model-based hypothesis testing. Furthermore, novel approaches and recent trends are discussed, including computation time reduction using reduced order and surrogate models, which contribute to the solution of inference problems. We conclude the manuscript by providing a few ideas for the development of tailored multi-scale inference methods.Comment: This manuscript will appear in the Journal of Coupled Systems and Multiscale Dynamics (American Scientific Publishers

    Understanding Health and Disease with Multidimensional Single-Cell Methods

    Full text link
    Current efforts in the biomedical sciences and related interdisciplinary fields are focused on gaining a molecular understanding of health and disease, which is a problem of daunting complexity that spans many orders of magnitude in characteristic length scales, from small molecules that regulate cell function to cell ensembles that form tissues and organs working together as an organism. In order to uncover the molecular nature of the emergent properties of a cell, it is essential to measure multiple cell components simultaneously in the same cell. In turn, cell heterogeneity requires multiple cells to be measured in order to understand health and disease in the organism. This review summarizes current efforts towards a data-driven framework that leverages single-cell technologies to build robust signatures of healthy and diseased phenotypes. While some approaches focus on multicolor flow cytometry data and other methods are designed to analyze high-content image-based screens, we emphasize the so-called Supercell/SVM paradigm (recently developed by the authors of this review and collaborators) as a unified framework that captures mesoscopic-scale emergence to build reliable phenotypes. Beyond their specific contributions to basic and translational biomedical research, these efforts illustrate, from a larger perspective, the powerful synergy that might be achieved from bringing together methods and ideas from statistical physics, data mining, and mathematics to solve the most pressing problems currently facing the life sciences.Comment: 25 pages, 7 figures; revised version with minor changes. To appear in J. Phys.: Cond. Mat
    corecore