402 research outputs found

    Methods for protein complex prediction and their contributions towards understanding the organization, function and dynamics of complexes

    Get PDF
    Complexes of physically interacting proteins constitute fundamental functional units responsible for driving biological processes within cells. A faithful reconstruction of the entire set of complexes is therefore essential to understand the functional organization of cells. In this review, we discuss the key contributions of computational methods developed till date (approximately between 2003 and 2015) for identifying complexes from the network of interacting proteins (PPI network). We evaluate in depth the performance of these methods on PPI datasets from yeast, and highlight challenges faced by these methods, in particular detection of sparse and small or sub- complexes and discerning of overlapping complexes. We describe methods for integrating diverse information including expression profiles and 3D structures of proteins with PPI networks to understand the dynamics of complex formation, for instance, of time-based assembly of complex subunits and formation of fuzzy complexes from intrinsically disordered proteins. Finally, we discuss methods for identifying dysfunctional complexes in human diseases, an application that is proving invaluable to understand disease mechanisms and to discover novel therapeutic targets. We hope this review aptly commemorates a decade of research on computational prediction of complexes and constitutes a valuable reference for further advancements in this exciting area.Comment: 1 Tabl

    Discovering Dynamic Protein Complexes from Static Interacomes: Three Challenges

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH

    Bayesian correlated clustering to integrate multiple datasets

    Get PDF
    Motivation: The integration of multiple datasets remains a key challenge in systems biology and genomic medicine. Modern high-throughput technologies generate a broad array of different data types, providing distinct – but often complementary – information. We present a Bayesian method for the unsupervised integrative modelling of multiple datasets, which we refer to as MDI (Multiple Dataset Integration). MDI can integrate information from a wide range of different datasets and data types simultaneously (including the ability to model time series data explicitly using Gaussian processes). Each dataset is modelled using a Dirichlet-multinomial allocation (DMA) mixture model, with dependencies between these models captured via parameters that describe the agreement among the datasets. Results: Using a set of 6 artificially constructed time series datasets, we show that MDI is able to integrate a significant number of datasets simultaneously, and that it successfully captures the underlying structural similarity between the datasets. We also analyse a variety of real S. cerevisiae datasets. In the 2-dataset case, we show that MDI’s performance is comparable to the present state of the art. We then move beyond the capabilities of current approaches and integrate gene expression, ChIP-chip and protein-protein interaction data, to identify a set of protein complexes for which genes are co-regulated during the cell cycle. Comparisons to other unsupervised data integration techniques – as well as to non-integrative approaches – demonstrate that MDI is very competitive, while also providing information that would be difficult or impossible to extract using other methods

    Predicting potential drugs and drug-drug interactions for drug repositioning

    Get PDF
    The purpose of drug repositioning is to predict novel treatments for existing drugs. It saves time and reduces cost in drug discovery, especially in preclinical procedures. In drug repositioning, the challenging objective is to identify reasonable drugs with strong evidence. Recently, benefiting from various types of data and computational strategies, many methods have been proposed to predict potential drugs. Signature-based methods use signatures to describe a specific disease condition and match it with drug-induced transcriptomic profiles. For a disease signature, a list of potential drugs is produced based on matching scores. In many studies, the top drugs on the list are identified as potential drugs and verified in various ways. However, there are a few limitations in existing methods: (1) For many diseases, especially cancers, the tissue samples are often heterogeneous and multiple subtypes are involved. It is challenging to identify a signature from such a group of profiles. (2) Genes are treated as independent elements in many methods, while they may associate with each other in the given condition. (3) The disease signatures cannot identify potential drugs for personalized treatments. In order to address those limitations, I propose three strategies in this dissertation. (1) I employ clustering methods to identify sub-signatures from the heterogeneous dataset, then use a weighting strategy to concatenate them together. (2) I utilize human protein complex (HPC) information to reflect the dependencies among genes and identify an HPC signature to describe a specific type of cancer. (3) I use an HPC strategy to identify signatures for drugs, then predict a list of potential drugs for each patient. Besides predicting potential drugs directly, more indications are essential to enhance my understanding in drug repositioning studies. The interactions between biological and biomedical entities, such as drug-drug interactions (DDIs) and drug-target interactions (DTIs), help study mechanisms behind the repurposed drugs. Machine learning (ML), especially deep learning (DL), are frontier methods in predicting those interactions. Network strategies, such as constructing a network from interactions and studying topological properties, are commonly used to combine with other methods to make predictions. However, the interactions may have different functions, and merging them in a single network may cause some biases. In order to solve it, I construct two networks for two types of DDIs and employ a graph convolutional network (GCN) model to concatenate them together. In this dissertation, the first chapter introduces background information, objectives of studies, and structure of the dissertation. After that, a comprehensive review is provided in Chapter 2. Biological databases, methods and applications in drug repositioning studies, and evaluation metrics are discussed. I summarize three application scenarios in Chapter 2. The first method proposed in Chapter 3 considers the issue of identifying a cancer gene signature and predicting potential drugs. The k-means clustering method is used to identify highly reliable gene signatures. The identified signature is used to match drug profiles and identify potential drugs for the given disease. The second method proposed in Chapter 4 uses human protein complex (HPC) information to identify a protein complex signature, instead of a gene signature. This strategy improves the prediction accuracy in the experiments of cancers. Chapter 5 introduces the signature-based method in personalized cancer medicine. The profiles of a given drug are used to identify a drug signature, under the HPC strategy. Each patient has a profile, which is matched with the drug signature. Each patient has a different list of potential drugs. Chapter 6 propose a graph convolutional network with multi-kernel to predict DDIs. This method constructs two DDI kernels and concatenates them in the GCN model. It achieves higher performance in predicting DDIs than three state-of-the-art methods. In summary, this dissertation has proposed several computational algorithms for drug repositioning. Experimental results have shown that the proposed methods can achieve very good performance

    Growing functional modules from a seed protein via integration of protein interaction and gene expression data

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Nowadays modern biology aims at unravelling the strands of complex biological structures such as the protein-protein interaction (PPI) networks. A key concept in the organization of PPI networks is the existence of dense subnetworks (functional modules) in them. In recent approaches clustering algorithms were applied at these networks and the resulting subnetworks were evaluated by estimating the coverage of well-established protein complexes they contained. However, most of these algorithms elaborate on an unweighted graph structure which in turn fails to elevate those interactions that would contribute to the construction of biologically more valid and coherent functional modules.</p> <p>Results</p> <p>In the current study, we present a method that corroborates the integration of protein interaction and microarray data via the discovery of biologically valid functional modules. Initially the gene expression information is overlaid as weights onto the PPI network and the enriched PPI graph allows us to exploit its topological aspects, while simultaneously highlights enhanced functional association in specific pairs of proteins. Then we present an algorithm that unveils the functional modules of the weighted graph by expanding a kernel protein set, which originates from a given 'seed' protein used as starting-point.</p> <p>Conclusion</p> <p>The integrated data and the concept of our approach provide reliable functional modules. We give proofs based on yeast data that our method manages to give accurate results in terms both of structural coherency, as well as functional consistency.</p

    Statistical analysis and modelling of proteomic and genetic network data illuminate hidden roles of proteins and their connections

    Get PDF
    While many stable protein complexes are known, the dynamic interactome is still underexplored. Experimental techniques such as single-tag affinity purification, aim to close the gap and identify transient interactions, but need better filtering tools to discriminate between true interactors and noise. This thesis develops and contrasts two complementary approaches to the analysis of protein-protein interaction (PPI) networks derived from noisy experiments. The majority of data used for the analysis come from in vitro experiments aggregated from known databases (IntAct, BioGRID, BioPlex), but is also complemented by experiments done by our collaborators from the Ueffing group in the Institute of Ophthalmic Research, Tübingen University (Germany). Chapter 3 presents the statistical approach to the data analysis. It focuses on the case of a single dataset with target and control data in order to determine the significant interactions for the target. The procedure follows an expected trajectory of preprocessing, quality control, statistical testing, correction and discussion of results. The approach is tailored to the specific dataset, experiment design and related assumptions. This is specifically relevant for the missing value imputation where multiple approaches are discussed and a new method, building upon a previous method, is proposed and validated. Chapter 4 presents a different approach for the filtering of experimental results for PPIs. It is a statistic, WeSA (weighted socio-affinity), which improves upon previous methods of scoring PPIs from affinity proteomics data. It uses network analysis techniques to analyse the full PPI network without the need for controls. WeSA is tested on protein-protein networks of varying accuracy, including the curated IntAct dataset, the unfiltered records in BioGRID, and the large BioPlex dataset. The model is also tested against the previous same-goal method. While the function itself proves superior, another major advantage is that it can efficiently combine and compare observations across studies and can therefore be used to aggregate and clean results from incoming experiments in the context of all already available data. In the final part, uses of WeSA beyond wild-type PPI networks are analysed. The framework is proposed as a novel way to effectively compare mechanistic differences between variants of the same protein (e.g. mutant vs wild type). I also explore the use of WeSA to study other biological and non-biological networks such as genome-wide association studies (GWAS) and gene-phenotype associations, with encouraging results. In conclusion, this work presents and compares a variety of mathematical, statistical and computational approaches adapted, combined and/or developed specifically for the task of obtaining a better overview of protein-protein interaction networks. The novel methods performance is validated and, specifically, WeSA, is extensively tested and analysed, including beyond the field of PPI networks
    corecore