Search CORE

154 research outputs found

Automatic Annotation of Spatial Expression Patterns via Sparse Bayesian Factor Models

Author: A Heffel
AA Ogienko
B Adryan
B Krishnapuram
CJC Burges
D Tautz
Daniel L. Mace
DL Mace
E Frise
ES Lein
GE Hinton
H Peng
I Pournara
IG Costa
Iulian Pruteanu-Malinici
Joel S. Bader
JP Carson
JY Pan
K Jaebum
K Mikolajczyk
K Puniyani
M Ashburner
M Gribskov
M West
MN Arbeitman
P Tomancak
P Tomancak
RL Gorsuch
S Ji
S Kumar
S Roy
SC Chen
SD Hooper
SJD Prince
T Joachims
T Sandmann
T Walter
Uwe Ohler
V Ljosa
V Stolc
Publication venue: Public Library of Science
Publication date: 01/01/2011
Field of study

Advances in reporters for gene expression have made it possible to document and quantify expression patterns in 2D–4D. In contrast to microarrays, which provide data for many genes but averaged and/or at low resolution, images reveal the high spatial dynamics of gene expression. Developing computational methods to compare, annotate, and model gene expression based on images is imperative, considering that available data are rapidly increasing. We have developed a sparse Bayesian factor analysis model in which the observed expression diversity of among a large set of high-dimensional images is modeled by a small number of hidden common factors. We apply this approach on embryonic expression patterns from a Drosophila RNA in situ image database, and show that the automatically inferred factors provide for a meaningful decomposition and represent common co-regulation or biological functions. The low-dimensional set of factor mixing weights is further used as features by a classifier to annotate expression patterns with functional categories. On human-curated annotations, our sparse approach reaches similar or better classification of expression patterns at different developmental stages, when compared to other automatic image annotation methods using thousands of hard-to-interpret features. Our study therefore outlines a general framework for large microscopy data sets, in which both the generative model itself, as well as its application for analysis tasks such as automated annotation, can provide insight into biological questions

CiteSeerX

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

MDC Repository

Sparse graphical models for cancer signalling

Author: Hill Steven M. (Mark)
Publication venue
Publication date
Field of study

Protein signalling networks play a key role in cellular function, and their dysregulation is central to many diseases, including cancer. Recent advances in biochemical technology have begun to allow high-throughput, data-driven studies of signalling. In this thesis, we investigate multivariate statistical methods, rooted in sparse graphical models, aimed at probing questions in cancer signalling. First, we propose a Bayesian variable selection method for identifying subsets of proteins that jointly in uence an output of interest, such as drug response. Ancillary biological information is incorporated into inference using informative prior distributions. Prior information is selected and weighted in an automated manner using an empirical Bayes formulation. We present examples of informative pathway and network-based priors, and illustrate the proposed method on both synthetic and drug response data. Second, we use dynamic Bayesian networks to perform structure learning of context-specific signalling network topology from proteomic time-course data. We exploit a connection between variable selection and network structure learning to efficiently carry out exact inference. Existing biology is incorporated using informative network priors, weighted automatically by an empirical Bayes approach. The overall approach is computationally efficient and essentially free of user-set parameters. We show results from an empirical investigation, comparing the approach to several existing methods, and from an application to breast cancer cell line data. Hypotheses are generated regarding novel signalling links, some of which are validated by independent experiments. Third, we describe a network-based clustering approach for the discovery of cancer subtypes that differ in terms of subtype-specific signalling network structure. Model-based clustering is combined with penalised likelihood estimation of undirected graphical models to allow simultaneous learning of cluster assignments and cluster-specific network structure. Results are shown from an empirical investigation comparing several penalisation regimes, and an application to breast cancer proteomic data

Warwick Research Archives Portal Repository

Graphical Models for Multivariate Time-Series

Author: Tomasi Federico
Publication venue: Universit\ue0 degli studi di Genova
Publication date: 14/03/2019
Field of study

Gaussian graphical models have received much attention in the last years, due to their flexibility and expression power. In particular, lots of interests have been devoted to graphical models for temporal data, or dynamical graphical models, to understand the relation of variables evolving in time. While powerful in modelling complex systems, such models suffer from computational issues both in terms of convergence rates and memory requirements, and may fail to detect temporal patterns in case the information on the system is partial. This thesis comprises two main contributions in the context of dynamical graphical models, tackling these two aspects: the need of reliable and fast optimisation methods and an increasing modelling power, which are able to retrieve the model in practical applications. The first contribution consists in a forward-backward splitting (FBS) procedure for Gaussian graphical modelling of multivariate time-series which relies on recent theoretical studies ensuring global convergence under mild assumptions. Indeed, such FBS-based implementation achieves, with fast convergence rates, optimal results with respect to ground truth and standard methods for dynamical network inference. The second main contribution focuses on the problem of latent factors, that influence the system while hidden or unobservable. This thesis proposes the novel latent variable time-varying graphical lasso method, which is able to take into account both temporal dynamics in the data and latent factors influencing the system. This is fundamental for the practical use of graphical models, where the information on the data is partial. Indeed, extensive validation of the method on both synthetic and real applications shows the effectiveness of considering latent factors to deal with incomplete information

Archivio istituzionale della ricerca - Università di Genova

Inferring bifurcations between phenotypes

Author: Szep Grisha
Publication venue
Publication date: 01/09/2022
Field of study

King's Research Portal

Knowledge-Informed Machine Learning for Cancer Diagnosis and Prognosis: A review

Author: Canoll Peter D
Hu Leland S.
Li Jing
Mao Lingchao
Swanson Kristin R
Tran Nhan L
Wang Hairong
Publication venue
Publication date: 12/01/2024
Field of study

Cancer remains one of the most challenging diseases to treat in the medical field. Machine learning has enabled in-depth analysis of rich multi-omics profiles and medical imaging for cancer diagnosis and prognosis. Despite these advancements, machine learning models face challenges stemming from limited labeled sample sizes, the intricate interplay of high-dimensionality data types, the inherent heterogeneity observed among patients and within tumors, and concerns about interpretability and consistency with existing biomedical knowledge. One approach to surmount these challenges is to integrate biomedical knowledge into data-driven models, which has proven potential to improve the accuracy, robustness, and interpretability of model results. Here, we review the state-of-the-art machine learning studies that adopted the fusion of biomedical knowledge and data, termed knowledge-informed machine learning, for cancer diagnosis and prognosis. Emphasizing the properties inherent in four primary data types including clinical, imaging, molecular, and treatment data, we highlight modeling considerations relevant to these contexts. We provide an overview of diverse forms of knowledge representation and current strategies of knowledge integration into machine learning pipelines with concrete examples. We conclude the review article by discussing future directions to advance cancer research through knowledge-informed machine learning.Comment: 41 pages, 4 figures, 2 table

arXiv.org e-Print Archive

Sparse graphical models for cancer signalling

Author: Hill Steven M
Publication venue
Publication date: 01/01/2012
Field of study

Protein signalling networks play a key role in cellular function, and their dysregulation is central to many diseases, including cancer. Recent advances in biochemical technology have begun to allow high-throughput, data-driven studies of signalling. In this thesis, we investigate multivariate statistical methods, rooted in sparse graphical models, aimed at probing questions in cancer signalling. First, we propose a Bayesian variable selection method for identifying subsets of proteins that jointly in uence an output of interest, such as drug response. Ancillary biological information is incorporated into inference using informative prior distributions. Prior information is selected and weighted in an automated manner using an empirical Bayes formulation. We present examples of informative pathwayand network-based priors, and illustrate the proposed method on both synthetic and drug response data. Second, we use dynamic Bayesian networks to perform structure learning of context-specific signalling network topology from proteomic time-course data. We exploit a connection between variable selection and network structure learning to efficiently carry out exact inference. Existing biology is incorporated using informative network priors, weighted automatically by an empirical Bayes approach. The overall approach is computationally efficient and essentially free of user-set parameters. We show results from an empirical investigation, comparing the approach to several existing methods, and from an application to breast cancer cell line data. Hypotheses are generated regarding novel signalling links, some of which are validated by independent experiments. Third, we describe a network-based clustering approach for the discovery of cancer subtypes that differ in terms of subtype-specific signalling network structure. Model-based clustering is combined with penalised likelihood estimation of undirected graphical models to allow simultaneous learning of cluster assignments and cluster-specific network structure. Results are shown from an empirical investigation comparing several penalisation regimes, and an application to breast cancer proteomic data.EThOS - Electronic Theses Online ServiceEngineering and Physical Sciences Research Council (EPSRC)GBUnited Kingdo

OpenGrey Repository

Applying the Free-Energy Principle to Complex Adaptive Systems

Author
Publication venue: 'MDPI AG'
Publication date: 12/08/2022
Field of study

The free energy principle is a mathematical theory of the behaviour of self-organising systems that originally gained prominence as a unified model of the brain. Since then, the theory has been applied to a plethora of biological phenomena, extending from single-celled and multicellular organisms through to niche construction and human culture, and even the emergence of life itself. The free energy principle tells us that perception and action operate synergistically to minimize an organism’s exposure to surprising biological states, which are more likely to lead to decay. A key corollary of this hypothesis is active inference—the idea that all behavior involves the selective sampling of sensory data so that we experience what we expect to (in order to avoid surprises). Simply put, we act upon the world to fulfill our expectations. It is now widely recognized that the implications of the free energy principle for our understanding of the human mind and behavior are far-reaching and profound. To date, however, its capacity to extend beyond our brain—to more generally explain living and other complex adaptive systems—has only just begun to be explored. The aim of this collection is to showcase the breadth of the free energy principle as a unified theory of complex adaptive systems—conscious, social, living, or not

Directory of Open Access Books (DOAB)

Regularisoitu riippuvuuksien mallintaminen geeniekpressio- ja metabolomiikkadatan välillä metabolian säätelyn tutkimuksessa

Author: Osmala Maria
Publication venue
Publication date: 01/01/2011
Field of study

Fusing different high-throughput data sources is an effective way to reveal functions of unknown genes, as well as regulatory relationships between biological components such as genes and metabolites. Dependencies between biological components functioning in the different layers of biological regulation can be investigated using canonical correlation analysis (CCA). However, the properties of the high-throughput bioinformatics data induce many challenges to data analysis: the sample size is often insufficient compared to the dimensionality of the data, and the data pose multi-collinearity due to, for example, co-expressed and co-regulated genes. Therefore, a regularized version of classical CCA has been adopted. An alternative way of introducing regularization to statistical models is to perform Bayesian data analysis with suitable priors. In this thesis, the performance of a new variant of Bayesian CCA called gsCCA is compared to a classical ridge regression regularized CCA (rrCCA) in revealing relevant information shared between two high-throughput data sets. The gsCCA produces a partly similar regulatory effect as the classical CCA but, in addition, the gsCCA introduces a new type of regularization to the data covariance matrices. Both CCA methods are applied to gene expression and metabolic concentration measurements obtained from an oxidative-stress tolerant Arabidopsis thaliana ecotype Col-0, and an oxidative stress sensitive mutant rcd1 as time series under ozone exposure and in a control condition. The aim of this work is to reveal new regulatory mechanisms in the oxidative stress signalling in plants. For the both methods, rrCCA and gsCCA, the thesis illustrates their potential to reveal both already known and new regulatory mechanisms in Arabidopsis thaliana oxidative stress signalling.Bioinformatiikassa erityyppisten mittausaineistojen yhdistäminen on tehokas tapa selvittää tuntemattomien geenien toiminnallisuutta sekä säätelyvuorovaikutuksia eri biologisten komponenttien, kuten geenien ja metaboliittien, välillä. Riippuvuuksia eri biologisilla säätelytasoilla toimivien komponenttien välillä voidaan tutkia kanonisella korrelaatioanalyysilla (canonical correlation analysis, CCA). Bioinformatiikan tietoaineistot aiheuttavat kuitenkin monia haasteita data-analyysille: näytteiden määrä on usein riittämätön verrattuna aineiston piirteiden määrään, ja aineisto on multikollineaarista johtuen esim. yhdessä säädellyistä ja ilmentyvistä geeneistä. Tästä syystä usein käytetään regularisoitua versiota kanonisesta korrelaatioanalyysistä aineiston tilastolliseen analysointiin. Vaihtoehto regularisoidulle analyysille on bayesilainen lähestymistapa yhdessä sopivien priorioletuksien kanssa. Tässä diplomityössä tutkitaan ja vertaillaan uuden bayesilaisen CCA:n sekä klassisen harjanneregressio-regularisoidun CCA:n kykyä löytää oleellinen jaettu informaatio kahden bioinformatiikka-tietoaineiston välillä. Uuden bayesilaisen menetelmän nimi on ryhmittäin harva kanoninen korrelaatioanalyysi. Ryhmittäin harva CCA tuottaa samanlaisen regularisointivaikutuksen kuin harjanneregressio-CCA, mutta lisäksi uusi menetelmä regularisoi tietoaineistojen kovarianssimatriiseja uudella tavalla. Molempia CCA-menetelmiä sovelletaan geenien ilmentymisaineistoon ja metaboliittien konsentraatioaineistoon, jotka on mitattu Arabidopsis thaliana:n hapetus-stressiä sietävästä ekotyypistä Col-0 ja hapetus-stressille herkästä rcd1 mutantista aika-sarjana, sekä otsoni-altistuksessa että kontrolliolosuhteissa. Diplomityö havainnollistaa harjanneregressio-CCA:n ja ryhmittäin harvan CCA:n kykyä paljastaa jo tunnettuja ja mahdollisesti uusia säätelymekanismeja geenien ja metabolittien välillä kasvisolujen viestinnässä hapettavan stressin aikana

Aaltodoc Publication Archive

Recipes for calibration and validation of agent-based models in cancer biomedicine

Author: Axenie Cristian
Bauer Roman
Cogno Nicolò
Vavourakis Vasileios
Publication venue
Publication date: 30/10/2023
Field of study

Computational models and simulations are not just appealing because of their intrinsic characteristics across spatiotemporal scales, scalability, and predictive power, but also because the set of problems in cancer biomedicine that can be addressed computationally exceeds the set of those amenable to analytical solutions. Agent-based models and simulations are especially interesting candidates among computational modelling strategies in cancer research due to their capabilities to replicate realistic local and global interaction dynamics at a convenient and relevant scale. Yet, the absence of methods to validate the consistency of the results across scales can hinder adoption by turning fine-tuned models into black boxes. This review compiles relevant literature to explore strategies to leverage high-fidelity simulations of multi-scale, or multi-level, cancer models with a focus on validation approached as simulation calibration. We argue that simulation calibration goes beyond parameter optimization by embedding informative priors to generate plausible parameter configurations across multiple dimensions

arXiv.org e-Print Archive