7 research outputs found
Differential analysis of biological networks
In cancer research, the comparison of gene expression or DNA methylation
networks inferred from healthy controls and patients can lead to the discovery
of biological pathways associated to the disease. As a cancer progresses, its
signalling and control networks are subject to some degree of localised
re-wiring. Being able to detect disrupted interaction patterns induced by the
presence or progression of the disease can lead to the discovery of novel
molecular diagnostic and prognostic signatures. Currently there is a lack of
scalable statistical procedures for two-network comparisons aimed at detecting
localised topological differences. We propose the dGHD algorithm, a methodology
for detecting differential interaction patterns in two-network comparisons. The
algorithm relies on a statistic, the Generalised Hamming Distance (GHD), for
assessing the degree of topological difference between networks and evaluating
its statistical significance. dGHD builds on a non-parametric permutation
testing framework but achieves computationally efficiency through an asymptotic
normal approximation. We show that the GHD is able to detect more subtle
topological differences compared to a standard Hamming distance between
networks. This results in the dGHD algorithm achieving high performance in
simulation studies as measured by sensitivity and specificity. An application
to the problem of detecting differential DNA co-methylation subnetworks
associated to ovarian cancer demonstrates the potential benefits of the
proposed methodology for discovering network-derived biomarkers associated with
a trait of interest
A new pipeline for structural characterization and classification of RNA-Seq microbiome data
Background
High-throughput sequencing enables the analysis of the composition of numerous biological systems, such as microbial communities. The identification of dependencies within these systems requires the analysis and assimilation of the underlying interaction patterns between all the variables that make up that system. However, this task poses a challenge when considering the compositional nature of the data coming from DNA-sequencing experiments because traditional interaction metrics (e.g., correlation) produce unreliable results when analyzing relative fractions instead of absolute abundances. The compositionality-associated challenges extend to the classification task, as it usually involves the characterization of the interactions between the principal descriptive variables of the datasets. The classification of new samples/patients into binary categories corresponding to dissimilar biological settings or phenotypes (e.g., control and cases) could help researchers in the development of treatments/drugs.
Results
Here, we develop and exemplify a new approach, applicable to compositional data, for the classification of new samples into two groups with different biological settings. We propose a new metric to characterize and quantify the overall correlation structure deviation between these groups and a technique for dimensionality reduction to facilitate graphical representation. We conduct simulation experiments with synthetic data to assess the proposed method’s classification accuracy. Moreover, we illustrate the performance of the proposed approach using Operational Taxonomic Unit (OTU) count tables obtained through 16S rRNA gene sequencing data from two microbiota experiments. Also, compare our method’s performance with that of two state-of-the-art methods.
Conclusions
Simulation experiments show that our method achieves a classification accuracy equal to or greater than 98% when using synthetic data. Finally, our method outperforms the other classification methods with real datasets from gene sequencing experiments
Detection of statistically significant network changes in complex biological networks
Table S1. Description of data: GHD and MRA Results for all the 457 considered transcription factors on the TCGA and Rembrandt datasets. (XLSX 62.7 kb
From condition-specific interactions towards the differential complexome of proteins
While capturing the transcriptomic state of a cell is a comparably simple effort with modern sequencing techniques, mapping protein interactomes and complexomes in a sample-specific manner is currently not feasible on a large scale. To understand crucial biological processes, however, knowledge on the physical interplay between proteins can be more interesting than just their mere expression. In this thesis, we present and demonstrate four software tools that unlock the cellular wiring in a condition-specific manner and promise a deeper understanding of what happens upon cell fate transitions. PPIXpress allows to exploit the abundance of existing expression data to generate specific interactomes, which can even consider alternative splicing events when protein isoforms can be related to the presence of causative protein domain interactions of an underlying model. As an addition to this work, we developed the convenient differential analysis tool PPICompare to determine rewiring events and their causes within the inferred interaction networks between grouped samples. Furthermore, we present a new implementation of the combinatorial protein complex prediction algorithm DACO that features a significantly reduced runtime. This improvement facilitates an application of the method for a large number of samples and the resulting sample-specific complexes can ultimately be assessed quantitatively with our novel differential protein complex analysis tool CompleXChange.Das Transkriptom einer Zelle ist mit modernen Sequenzierungstechniken vergleichsweise einfach zu erfassen. Die Ermittlung von Proteininteraktionen und -komplexen wiederum ist in großem Maßstab derzeit nicht möglich. Um wichtige biologische Prozesse zu verstehen, kann das Zusammenspiel von Proteinen jedoch erheblich interessanter sein als deren reine Expression. In dieser Arbeit stellen wir vier Software-Tools vor, die es ermöglichen solche Interaktionen zustandsbezogen zu betrachten und damit ein tieferes Verständnis darüber versprechen, was in der Zelle bei Veränderungen passiert. PPIXpress ermöglicht es vorhandene Expressionsdaten zu nutzen, um die aktiven Interaktionen in einem biologischen Kontext zu ermitteln. Wenn Proteinvarianten mit Interaktionen von Proteindomänen in Verbindung gebracht werden können, kann hierbei sogar alternatives Spleißen berücksichtigen werden. Als Ergänzung dazu haben wir das komfortable Differenzialanalyse-Tool PPICompare entwickelt, welches Veränderungen des Interaktoms und deren Ursachen zwischen gruppierten Proben bestimmen kann. Darüber hinaus stellen wir eine neue Implementierung des Proteinkomplex-Vorhersagealgorithmus DACO vor, die eine deutlich reduzierte Laufzeit aufweist. Diese Verbesserung ermöglicht die Anwendung der Methode auf eine große Anzahl von Proben. Die damit bestimmten probenspezifischen Komplexe können schließlich mit unserem neuartigen Differenzialanalyse-Tool CompleXChange quantitativ bewertet werden
Developing a framework for semi-automated rule-based modelling for neuroscience research
Dynamic modelling has significantly improved our understanding of the complex
molecular mechanisms underpinning neurobiological processes. The detailed
mechanistic insights these models offer depend on the availability of
a diverse range of experimental observations. Despite the huge increase in
biomolecular data generation from novel high-throughput technologies and
extensive research in bioinformatics and dynamical modelling, efficient creation
of accurate dynamical models remains highly challenging. To study this
problem, three perspectives are considered: comparison of modelling methods,
prioritisation of results and analysis of primary data sets. Firstly, I compare two
models of the DARPP-32 signalling network: a classically defined model with
ordinary differential equations (ODE) and its equivalent, defined using a novel
rule-based (RB) paradigm. The RB model recapitulates the results of the ODE
model, but offers a more expressive and flexible syntax that can efficiently handle
the “combinatorial complexity” commonly found in signalling networks,
and allows ready access to fine-grain details of the emerging system. RB modelling
is particularly well suited to encoding protein-centred features such as
domain information and post-translational modification sites. Secondly, I propose
a new pipeline for prioritisation of molecular species that arise during
model simulation using a recently developed algorithm based on multivariate
mutual information (CorEx) coupled with global sensitivity analysis (GSA) using
the RKappa package. To efficiently evaluate the importance of parameters,
Hilber-Schmidt Independence Criterion (HSIC)-based indices are aggregated
into a weighted network that allows compact analysis of the model across conditions.
Finally, I describe an approach for the development of disease-specific
dynamical models using genes known to be associated with Attention Deficit
Hyperactivity Disorder (ADHD) as an exemplar. Candidate disease genes are
mapped to a selection of datasets that are potentially relevant to the modelling
process (e.g. interactions between proteins and domains, protein-domain and
kinase-substrates mappings) and these are jointly analysed using network clustering
and pathway enrichment analyses to evaluate their coverage and utility
in developing rule-based models