Search CORE

8,661 research outputs found

From Cellular Characteristics to Disease Diagnosis: Uncovering Phenotypes with Supercells

Author: Banavar Jayanth R.
Biancotto Angélique
Candia Julian Marcelo
Cao Kan
Dagur Pradeep
Driscoll Meghan
Losert Wolfgang
Maritan Amos
Maunu Ryan
McCoy Jr. J Philip
Nida Sen H.
Nussenblatt Robert B
Wei Lai
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2013
Field of study

Cell heterogeneity and the inherent complexity due to the interplay of multiple molecular processes within the cell pose difficult challenges for current single-cell biology. We introduce an approach that identifies a disease phenotype from multiparameter single-cell measurements, which is based on the concept of ‘‘supercell statistics’’, a single-cell-based averaging procedure followed by a machine learning classification scheme. We are able to assess the optimal tradeoff between the number of single cells averaged and the number of measurements needed to capture phenotypic differences between healthy and diseased patients, as well as between different diseases that are difficult to diagnose otherwise. We apply our approach to two kinds of single-cell datasets, addressing the diagnosis of a premature aging disorder using images of cell nuclei, as well as the phenotypes of two non-infectious uveitides (the ocular manifestations of Behc¸et’s disease and sarcoidosis) based on multicolor flow cytometry. In the former case, one nuclear shape measurement taken over a group of 30 cells is sufficient to classify samples as healthy or diseased, in agreement with usual laboratory practice. In the latter, our method is able to identify a minimal set of 5 markers that accurately predict Behc¸et’s disease and sarcoidosis. This is the first time that a quantitative phenotypic distinction between these two diseases has been achieved. To obtain this clear phenotypic signature, about one hundred CD8+ T cells need to be measured. Although the molecular markers identified have been reported to be important players in autoimmune disorders, this is the first report pointing out that CD8+ T cells can be used to distinguish two systemic inflammatory diseases. Beyond these specific cases, the approach proposed here is applicable to datasets generated by other kinds of state-of-the-art and forthcoming single-cell technologies, such as multidimensional mass cytometry, single-cell gene expression, and single-cell full genome sequencing techniques.Fil: Candia, Julian Marcelo. University of Maryland; Estados Unidos. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - La Plata. Instituto de Física de Líquidos y Sistemas Biológicos. Universidad Nacional de La Plata. Facultad de Ciencias Exactas. Instituto de Física de Líquidos y Sistemas Biológicos; ArgentinaFil: Maunu, Ryan. University of Maryland; Estados UnidosFil: Driscoll, Meghan. University of Maryland; Estados UnidosFil: Biancotto, Angélique. National Institutes of Health; Estados UnidosFil: Dagur, Pradeep. National Institutes of Health; Estados UnidosFil: McCoy Jr., J Philip. National Institutes of Health; Estados UnidosFil: Nida Sen, H.. National Institutes of Health; Estados UnidosFil: Wei, Lai. National Institutes of Health; Estados UnidosFil: Maritan, Amos. Università di Padova; ItaliaFil: Cao, Kan. University of Maryland; Estados UnidosFil: Nussenblatt, Robert B. National Institutes of Health; Estados UnidosFil: Banavar, Jayanth R.. University of Maryland; Estados UnidosFil: Losert, Wolfgang. University of Maryland; Estados Unido

arXiv.org e-Print Archive

CONICET Digital

Directory of Open Access Journals

Dryad Digital Repository (Duke University)

PubMed Central

Electronic Archiving System

FigShare

Understanding Health and Disease with Multidimensional Single-Cell Methods

Author: Banavar Jayanth R.
Candia Julián
Losert Wolfgang
Publication venue
Publication date: 01/12/2013
Field of study

Current efforts in the biomedical sciences and related interdisciplinary fields are focused on gaining a molecular understanding of health and disease, which is a problem of daunting complexity that spans many orders of magnitude in characteristic length scales, from small molecules that regulate cell function to cell ensembles that form tissues and organs working together as an organism. In order to uncover the molecular nature of the emergent properties of a cell, it is essential to measure multiple cell components simultaneously in the same cell. In turn, cell heterogeneity requires multiple cells to be measured in order to understand health and disease in the organism. This review summarizes current efforts towards a data-driven framework that leverages single-cell technologies to build robust signatures of healthy and diseased phenotypes. While some approaches focus on multicolor flow cytometry data and other methods are designed to analyze high-content image-based screens, we emphasize the so-called Supercell/SVM paradigm (recently developed by the authors of this review and collaborators) as a unified framework that captures mesoscopic-scale emergence to build reliable phenotypes. Beyond their specific contributions to basic and translational biomedical research, these efforts illustrate, from a larger perspective, the powerful synergy that might be achieved from bringing together methods and ideas from statistical physics, data mining, and mathematics to solve the most pressing problems currently facing the life sciences.Comment: 25 pages, 7 figures; revised version with minor changes. To appear in J. Phys.: Cond. Mat

arXiv.org e-Print Archive

CONICET Digital

PubMed Central

Special issue on bio-ontologies and phenotypes

Author: Anika Oellrich
Karin Verspoor
Larisa N. Soldatova
Michel Dumontier
Nigam H. Shah
Nigel Collier
Philippe Rocca-Serra
Tudor Groza
Publication venue: Springer Nature
Publication date: 17/12/2015
Field of study

The bio-ontologies and phenotypes special issue includes eight papers selected from the 11 papers presented at the Bio-Ontologies SIG (Special Interest Group) and the Phenotype Day at ISMB (Intelligent Systems for Molecular Biology) conference in Boston in 2014. The selected papers span a wide range of topics including the automated re-use and update of ontologies, quality assessment of ontological resources, and the systematic description of phenotype variation, driven by manual, semi- and fully automatic means

Maastricht University Research Portal

Springer - Publisher Connector

PubMed Central

University of Melbourne Institutional Repository

Understanding Learned Models by Identifying Important Features at the Right Resolution

Author: Craven Mark
Lee Kyubin
Sood Akshay
Publication venue
Publication date: 20/11/2018
Field of study

In many application domains, it is important to characterize how complex learned models make their decisions across the distribution of instances. One way to do this is to identify the features and interactions among them that contribute to a model's predictive accuracy. We present a model-agnostic approach to this task that makes the following specific contributions. Our approach (i) tests feature groups, in addition to base features, and tries to determine the level of resolution at which important features can be determined, (ii) uses hypothesis testing to rigorously assess the effect of each feature on the model's loss, (iii) employs a hierarchical approach to control the false discovery rate when testing feature groups and individual base features for importance, and (iv) uses hypothesis testing to identify important interactions among features and feature groups. We evaluate our approach by analyzing random forest and LSTM neural network models learned in two challenging biomedical applications.Comment: First two authors contributed equally to this work, Accepted for presentation at the Thirty-Third AAAI Conference on Artificial Intelligence (AAAI-19

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Machine Learning and Integrative Analysis of Biomedical Big Data.

Author: Choi Howard
Chung Neo Christopher
Mirza Bilal
Ping Peipei
Wang Jie
Wang Wei
Publication venue: eScholarship, University of California
Publication date: 01/01/2019
Field of study

Recent developments in high-throughput technologies have accelerated the accumulation of massive amounts of omics data from multiple sources: genome, epigenome, transcriptome, proteome, metabolome, etc. Traditionally, data from each source (e.g., genome) is analyzed in isolation using statistical and machine learning (ML) methods. Integrative analysis of multi-omics and clinical data is key to new biomedical discoveries and advancements in precision medicine. However, data integration poses new computational challenges as well as exacerbates the ones associated with single-omics studies. Specialized computational approaches are required to effectively and efficiently perform integrative analysis of biomedical data acquired from diverse modalities. In this review, we discuss state-of-the-art ML-based approaches for tackling five specific computational challenges associated with integrative analysis: curse of dimensionality, data heterogeneity, missing data, class imbalance and scalability issues

Multidisciplinary Digital Publishing Institute

Ezid

Directory of Open Access Journals

eScholarship - University of California

Recommended from our members

Prediction of inherited genomic susceptibility to 20 common cancer types by a supervised machine-learning method.

Author: Kim Byung-Ju
Kim Sung-Hou
Publication venue: eScholarship, University of California
Publication date: 01/02/2018
Field of study

Prevention and early intervention are the most effective ways of avoiding or minimizing psychological, physical, and financial suffering from cancer. However, such proactive action requires the ability to predict the individual's susceptibility to cancer with a measure of probability. Of the triad of cancer-causing factors (inherited genomic susceptibility, environmental factors, and lifestyle factors), the inherited genomic component may be derivable from the recent public availability of a large body of whole-genome variation data. However, genome-wide association studies have so far showed limited success in predicting the inherited susceptibility to common cancers. We present here a multiple classification approach for predicting individuals' inherited genomic susceptibility to acquire the most likely phenotype among a panel of 20 major common cancer types plus 1 "healthy" type by application of a supervised machine-learning method under competing conditions among the cohorts of the 21 types. This approach suggests that, depending on the phenotypes of 5,919 individuals of "white" ethnic population in this study, (i) the portion of the cohort of a cancer type who acquired the observed type due to mostly inherited genomic susceptibility factors ranges from about 33 to 88% (or its corollary: the portion due to mostly environmental and lifestyle factors ranges from 12 to 67%), and (ii) on an individual level, the method also predicts individuals' inherited genomic susceptibility to acquire the other types ranked with associated probabilities. These probabilities may provide practical information for individuals, heath professionals, and health policymakers related to prevention and/or early intervention of cancer

eScholarship - University of California

Selection of important variables by statistical learning in genome-wide association analysis

Author: Gu C Charles
Yang Wei (Will)
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Genetic analysis of complex diseases demands novel analytical methods to interpret data collected on thousands of variables by genome-wide association studies. The complexity of such analysis is multiplied when one has to consider interaction effects, be they among the genetic variations (G × G) or with environment risk factors (G × E). Several statistical learning methods seem quite promising in this context. Herein we consider applications of two such methods, random forest and Bayesian networks, to the simulated dataset for Genetic Analysis Workshop 16 Problem 3. Our evaluation study showed that an iterative search based on the random forest approach has the potential in selecting important variables, while Bayesian networks can capture some of the underlying causal relationships

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Deep learning for health outcome prediction

Author: Kolbeinsson Arinbjorn
Publication venue: School of Public Health, Imperial College London
Publication date: 01/10/2021
Field of study

Modern medical data contains rich information that allows us to make new types of inferences to predict health outcomes. However, the complexity of modern medical data has rendered many classical analysis approaches insufficient. Machine learning with deep neural networks enables computational models to process raw data and learn useful representations with multiple levels of abstraction. In this thesis, I present novel deep learning methods for health outcome prediction from brain MRI and genomic data. I show that a deep neural network can learn a biomarker from structural brain MRI and that this biomarker provides a useful measure for investigating brain and systemic health, can augment neuroradiological research and potentially serve as a decision-support tool in clinical environments. I also develop two tensor methods for deep neural networks: the first, tensor dropout, for improving the robustness of deep neural networks, and the second, Kronecker machines, for combining multiple sources of data to improve prediction accuracy. Finally, I present a novel deep learning method for predicting polygenic risk scores from genome sequences by leveraging both local and global interactions between genetic variants. These contributions demonstrate the benefits of using deep learning for health outcome prediction in both research and clinical settings.Open Acces

Spiral - Imperial College Digital Repository

The context-dependence of mutations: a linkage of formalisms

Author: Krishna Vinod
Poelwijk Frank J.
Ranganathan Rama
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 22/04/2015
Field of study

Defining the extent of epistasis - the non-independence of the effects of mutations - is essential for understanding the relationship of genotype, phenotype, and fitness in biological systems. The applications cover many areas of biological research, including biochemistry, genomics, protein and systems engineering, medicine, and evolutionary biology. However, the quantitative definitions of epistasis vary among fields, and its analysis beyond just pairwise effects remains obscure in general. Here, we show that different definitions of epistasis are versions of a single mathematical formalism - the weighted Walsh-Hadamard transform. We discuss that one of the definitions, the backgound-averaged epistasis, is the most informative when the goal is to uncover the general epistatic structure of a biological system, a description that can be rather different from the local epistatic structure of specific model systems. Key issues are the choice of effective ensembles for averaging and to practically contend with the vast combinatorial complexity of mutations. In this regard, we discuss possible approaches for optimally learning the epistatic structure of biological systems.Comment: 6 pages, 3 figures, supplementary informatio

arXiv.org e-Print Archive

Directory of Open Access Journals

PubMed Central

FigShare