7 research outputs found

    Differential analysis of biological networks

    Get PDF
    In cancer research, the comparison of gene expression or DNA methylation networks inferred from healthy controls and patients can lead to the discovery of biological pathways associated to the disease. As a cancer progresses, its signalling and control networks are subject to some degree of localised re-wiring. Being able to detect disrupted interaction patterns induced by the presence or progression of the disease can lead to the discovery of novel molecular diagnostic and prognostic signatures. Currently there is a lack of scalable statistical procedures for two-network comparisons aimed at detecting localised topological differences. We propose the dGHD algorithm, a methodology for detecting differential interaction patterns in two-network comparisons. The algorithm relies on a statistic, the Generalised Hamming Distance (GHD), for assessing the degree of topological difference between networks and evaluating its statistical significance. dGHD builds on a non-parametric permutation testing framework but achieves computationally efficiency through an asymptotic normal approximation. We show that the GHD is able to detect more subtle topological differences compared to a standard Hamming distance between networks. This results in the dGHD algorithm achieving high performance in simulation studies as measured by sensitivity and specificity. An application to the problem of detecting differential DNA co-methylation subnetworks associated to ovarian cancer demonstrates the potential benefits of the proposed methodology for discovering network-derived biomarkers associated with a trait of interest

    Differential analysis of biological networks

    No full text

    A new pipeline for structural characterization and classification of RNA-Seq microbiome data

    Get PDF
    Background High-throughput sequencing enables the analysis of the composition of numerous biological systems, such as microbial communities. The identification of dependencies within these systems requires the analysis and assimilation of the underlying interaction patterns between all the variables that make up that system. However, this task poses a challenge when considering the compositional nature of the data coming from DNA-sequencing experiments because traditional interaction metrics (e.g., correlation) produce unreliable results when analyzing relative fractions instead of absolute abundances. The compositionality-associated challenges extend to the classification task, as it usually involves the characterization of the interactions between the principal descriptive variables of the datasets. The classification of new samples/patients into binary categories corresponding to dissimilar biological settings or phenotypes (e.g., control and cases) could help researchers in the development of treatments/drugs. Results Here, we develop and exemplify a new approach, applicable to compositional data, for the classification of new samples into two groups with different biological settings. We propose a new metric to characterize and quantify the overall correlation structure deviation between these groups and a technique for dimensionality reduction to facilitate graphical representation. We conduct simulation experiments with synthetic data to assess the proposed method’s classification accuracy. Moreover, we illustrate the performance of the proposed approach using Operational Taxonomic Unit (OTU) count tables obtained through 16S rRNA gene sequencing data from two microbiota experiments. Also, compare our method’s performance with that of two state-of-the-art methods. Conclusions Simulation experiments show that our method achieves a classification accuracy equal to or greater than 98% when using synthetic data. Finally, our method outperforms the other classification methods with real datasets from gene sequencing experiments

    Detection of statistically significant network changes in complex biological networks

    Get PDF
    Table S1. Description of data: GHD and MRA Results for all the 457 considered transcription factors on the TCGA and Rembrandt datasets. (XLSX 62.7 kb

    From condition-specific interactions towards the differential complexome of proteins

    Get PDF
    While capturing the transcriptomic state of a cell is a comparably simple effort with modern sequencing techniques, mapping protein interactomes and complexomes in a sample-specific manner is currently not feasible on a large scale. To understand crucial biological processes, however, knowledge on the physical interplay between proteins can be more interesting than just their mere expression. In this thesis, we present and demonstrate four software tools that unlock the cellular wiring in a condition-specific manner and promise a deeper understanding of what happens upon cell fate transitions. PPIXpress allows to exploit the abundance of existing expression data to generate specific interactomes, which can even consider alternative splicing events when protein isoforms can be related to the presence of causative protein domain interactions of an underlying model. As an addition to this work, we developed the convenient differential analysis tool PPICompare to determine rewiring events and their causes within the inferred interaction networks between grouped samples. Furthermore, we present a new implementation of the combinatorial protein complex prediction algorithm DACO that features a significantly reduced runtime. This improvement facilitates an application of the method for a large number of samples and the resulting sample-specific complexes can ultimately be assessed quantitatively with our novel differential protein complex analysis tool CompleXChange.Das Transkriptom einer Zelle ist mit modernen Sequenzierungstechniken vergleichsweise einfach zu erfassen. Die Ermittlung von Proteininteraktionen und -komplexen wiederum ist in großem Maßstab derzeit nicht möglich. Um wichtige biologische Prozesse zu verstehen, kann das Zusammenspiel von Proteinen jedoch erheblich interessanter sein als deren reine Expression. In dieser Arbeit stellen wir vier Software-Tools vor, die es ermöglichen solche Interaktionen zustandsbezogen zu betrachten und damit ein tieferes Verständnis darüber versprechen, was in der Zelle bei Veränderungen passiert. PPIXpress ermöglicht es vorhandene Expressionsdaten zu nutzen, um die aktiven Interaktionen in einem biologischen Kontext zu ermitteln. Wenn Proteinvarianten mit Interaktionen von Proteindomänen in Verbindung gebracht werden können, kann hierbei sogar alternatives Spleißen berücksichtigen werden. Als Ergänzung dazu haben wir das komfortable Differenzialanalyse-Tool PPICompare entwickelt, welches Veränderungen des Interaktoms und deren Ursachen zwischen gruppierten Proben bestimmen kann. Darüber hinaus stellen wir eine neue Implementierung des Proteinkomplex-Vorhersagealgorithmus DACO vor, die eine deutlich reduzierte Laufzeit aufweist. Diese Verbesserung ermöglicht die Anwendung der Methode auf eine große Anzahl von Proben. Die damit bestimmten probenspezifischen Komplexe können schließlich mit unserem neuartigen Differenzialanalyse-Tool CompleXChange quantitativ bewertet werden

    Developing a framework for semi-automated rule-based modelling for neuroscience research

    Get PDF
    Dynamic modelling has significantly improved our understanding of the complex molecular mechanisms underpinning neurobiological processes. The detailed mechanistic insights these models offer depend on the availability of a diverse range of experimental observations. Despite the huge increase in biomolecular data generation from novel high-throughput technologies and extensive research in bioinformatics and dynamical modelling, efficient creation of accurate dynamical models remains highly challenging. To study this problem, three perspectives are considered: comparison of modelling methods, prioritisation of results and analysis of primary data sets. Firstly, I compare two models of the DARPP-32 signalling network: a classically defined model with ordinary differential equations (ODE) and its equivalent, defined using a novel rule-based (RB) paradigm. The RB model recapitulates the results of the ODE model, but offers a more expressive and flexible syntax that can efficiently handle the “combinatorial complexity” commonly found in signalling networks, and allows ready access to fine-grain details of the emerging system. RB modelling is particularly well suited to encoding protein-centred features such as domain information and post-translational modification sites. Secondly, I propose a new pipeline for prioritisation of molecular species that arise during model simulation using a recently developed algorithm based on multivariate mutual information (CorEx) coupled with global sensitivity analysis (GSA) using the RKappa package. To efficiently evaluate the importance of parameters, Hilber-Schmidt Independence Criterion (HSIC)-based indices are aggregated into a weighted network that allows compact analysis of the model across conditions. Finally, I describe an approach for the development of disease-specific dynamical models using genes known to be associated with Attention Deficit Hyperactivity Disorder (ADHD) as an exemplar. Candidate disease genes are mapped to a selection of datasets that are potentially relevant to the modelling process (e.g. interactions between proteins and domains, protein-domain and kinase-substrates mappings) and these are jointly analysed using network clustering and pathway enrichment analyses to evaluate their coverage and utility in developing rule-based models
    corecore