Search CORE

19,287 research outputs found

Machine Learning and Integrative Analysis of Biomedical Big Data.

Author: Choi Howard
Chung Neo Christopher
Mirza Bilal
Ping Peipei
Wang Jie
Wang Wei
Publication venue: eScholarship, University of California
Publication date: 01/01/2019
Field of study

Recent developments in high-throughput technologies have accelerated the accumulation of massive amounts of omics data from multiple sources: genome, epigenome, transcriptome, proteome, metabolome, etc. Traditionally, data from each source (e.g., genome) is analyzed in isolation using statistical and machine learning (ML) methods. Integrative analysis of multi-omics and clinical data is key to new biomedical discoveries and advancements in precision medicine. However, data integration poses new computational challenges as well as exacerbates the ones associated with single-omics studies. Specialized computational approaches are required to effectively and efficiently perform integrative analysis of biomedical data acquired from diverse modalities. In this review, we discuss state-of-the-art ML-based approaches for tackling five specific computational challenges associated with integrative analysis: curse of dimensionality, data heterogeneity, missing data, class imbalance and scalability issues

Directory of Open Access Journals

eScholarship - University of California

Data-driven modelling of biological multi-scale processes

Author: Hasenauer Jan
Hross Sabrina
Jagiella Nick
Theis Fabian J.
Publication venue
Publication date: 01/01/2015
Field of study

Biological processes involve a variety of spatial and temporal scales. A holistic understanding of many biological processes therefore requires multi-scale models which capture the relevant properties on all these scales. In this manuscript we review mathematical modelling approaches used to describe the individual spatial scales and how they are integrated into holistic models. We discuss the relation between spatial and temporal scales and the implication of that on multi-scale modelling. Based upon this overview over state-of-the-art modelling approaches, we formulate key challenges in mathematical and computational modelling of biological multi-scale and multi-physics processes. In particular, we considered the availability of analysis tools for multi-scale models and model-based multi-scale data integration. We provide a compact review of methods for model-based data integration and model-based hypothesis testing. Furthermore, novel approaches and recent trends are discussed, including computation time reduction using reduced order and surrogate models, which contribute to the solution of inference problems. We conclude the manuscript by providing a few ideas for the development of tailored multi-scale inference methods.Comment: This manuscript will appear in the Journal of Coupled Systems and Multiscale Dynamics (American Scientific Publishers

arXiv.org e-Print Archive

PuSH

Generalized Score Matching for Non-Negative Data

Author: Drton Mathias
Shojaie Ali
Yu Shiqing
Publication venue
Publication date: 01/01/2019
Field of study

A common challenge in estimating parameters of probability density functions is the intractability of the normalizing constant. While in such cases maximum likelihood estimation may be implemented using numerical integration, the approach becomes computationally intensive. The score matching method of Hyv\"arinen [2005] avoids direct calculation of the normalizing constant and yields closed-form estimates for exponential families of continuous distributions over

\mathbb{R}^m

. Hyv\"arinen [2007] extended the approach to distributions supported on the non-negative orthant,

\mathbb{R}_+^m

. In this paper, we give a generalized form of score matching for non-negative data that improves estimation efficiency. As an example, we consider a general class of pairwise interaction models. Addressing an overlooked inexistence problem, we generalize the regularized score matching method of Lin et al. [2016] and improve its theoretical guarantees for non-negative Gaussian graphical models.Comment: 70 pages, 76 figure

arXiv.org e-Print Archive

Copenhagen University Research Information System

A Study Of Computational Problems In Computational Biology And Social Networks: Cancer Informatics And Cascade Modelling

Author: Ma Christopher
Publication venue: eGrove
Publication date: 01/01/2018
Field of study

It is undoubtedly that everything in this world is related and nothing independently exists. Entities interact together to form groups, resulting in many complex networks. Examples involve functional regulation models of proteins in biology, communities of people within social network. Since complex networks are ubiquitous in daily life, network learning had been gaining momentum in a variety of discipline like computer science, economics and biology. This call for new technique in exploring the structure as well as the interactions of network since it provides insight in understanding how the network works and deepening our knowledge of the subject in hand. For example, uncovering proteins modules helps us understand what causes lead to certain disease and how protein co-regulate each others. Therefore, my dissertation takes on problems in computational biology and social network: cancer informatics and cascade model-ling. In cancer informatics, identifying specific genes that cause cancer (driver genes) is crucial in cancer research. The more drivers identified, the more options to treat the cancer with a drug to act on that gene. However, identifying driver gene is not easy. Cancer cells are undergoing rapid mutation and are compromised in regards to the body\u27s normally DNA repair mechanisms. I employed Markov chain, Bayesian network and graphical model to identify cancer drivers. I utilize heterogeneous sources of information to discover cancer drivers and unlocking the mechanism behind cancer. Above all, I encode various pieces of biological information to form a multi-graph and trigger various Markov chains in it and rank the genes in the aftermath. We also leverage probabilistic mixed graphical model to learn the complex and uncertain relationships among various bio-medical data. On the other hand, diffusion of information over the network had drawn up great interest in research community. For example, epidemiologists observe that a person becomes ill but they can neither determine who infected the patient nor the infection rate of each individual. Therefore, it is critical to decipher the mechanism underlying the process since it validates efforts for preventing from virus infections. We come up with a new modeling to model cascade data in three different scenario

Kinetic modelling of in vitro data of PI3K, mTOR1, PTEN enzymes and on-target inhibitors Rapamycin, BEZ235, and LY294002

Author: Bown James L.
Goltsov Alexey
Harrison David J.
Langdon Simon P.
Tashkandi Ghassan
Publication venue
Publication date: 06/11/2016
Field of study

The phosphatidylinositide 3-kinases (PI3K) and mammalian target of rapamycin-1 (mTOR1) are two key targets for anti-cancer therapy. Predicting the response of the PI3K/AKT/mTOR1 signalling pathway to targeted therapy is made difficult because of network complexities. Systems biology models can help explore those complexities but the value of such models is dependent on accurate parameterisation. Motivated by a need to increase accuracy in kinetic parameter estimation, and therefore the predictive power of the model, we present a framework to integrate kinetic data from enzyme assays into a unified enzyme kinetic model. We present exemplar kinetic models of PI3K and mTOR1, calibrated on in vitro enzyme data and founded on Michaelis-Menten (MM) approximation. We describe the effects of an allosteric mTOR1 inhibitor (Rapamycin) and ATP-competitive inhibitors (BEZ2235 and LY294002) that show dual inhibition of mTOR1 and PI3K. We also model the kinetics of phosphatase and tensin homolog (PTEN), which modulates sensitivity of the PI3K/AKT/mTOR1 pathway to these drugs. Model validation with independent data sets allows investigation of enzyme function and drug dose dependencies in a wide range of experimental conditions. Modelling of the mTOR1 kinetics showed that Rapamycin has an IC50 independent of ATP concentration and that it is a selective inhibitor of mTOR1 substrates S6K1 and 4EBP1: it retains 40% of mTOR1 activity relative to 4EBP1 phosphorylation and inhibits completely S6K1 activity. For the dual ATP-competitive inhibitors of mTOR1 and PI3K, LY294002 and BEZ235, we derived the dependence of the IC50 on ATP concentration that allows prediction of the IC50 at different ATP concentrations in enzyme and cellular assays. Comparison of the drug effectiveness in enzyme and cellular assays showed that some features of these drugs arise from signalling modulation beyond the on-target action and MM approximation and require a systems-level consideration of the whole PI3K/PTEN/AKT/mTOR1 network in order to understand mechanisms of drug sensitivity and resistance in different cancer cell lines. We suggest that using these models in systems biology investigation of the PI3K/AKT/mTOR1 signalling in cancer cells can bridge the gap between direct drug target action and the therapeutic response to these drugs and their combinations