11 research outputs found

    Identifying protein complexes directly from high-throughput TAP data with Markov random fields

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Predicting protein complexes from experimental data remains a challenge due to limited resolution and stochastic errors of high-throughput methods. Current algorithms to reconstruct the complexes typically rely on a two-step process. First, they construct an interaction graph from the data, predominantly using heuristics, and subsequently cluster its vertices to identify protein complexes.</p> <p>Results</p> <p>We propose a model-based identification of protein complexes directly from the experimental observations. Our model of protein complexes based on Markov random fields explicitly incorporates false negative and false positive errors and exhibits a high robustness to noise. A model-based quality score for the resulting clusters allows us to identify reliable predictions in the complete data set. Comparisons with prior work on reference data sets shows favorable results, particularly for larger unfiltered data sets. Additional information on predictions, including the source code under the GNU Public License can be found at http://algorithmics.molgen.mpg.de/Static/Supplements/ProteinComplexes.</p> <p>Conclusion</p> <p>We can identify complexes in the data obtained from high-throughput experiments without prior elimination of proteins or weak interactions. The few parameters of our model, which does not rely on heuristics, can be estimated using maximum likelihood without a reference data set. This is particularly important for protein complex studies in organisms that do not have an established reference frame of known protein complexes.</p

    Algorithm to identify protein complexes from high-throughput data

    No full text
    Recent advances in proteomic technologies such as two-hybrid and biochemical purification allow large-scale investigations of protein interactions. The goal of this thesis is to investigate model-based approaches to predict protein complexes from tandem affinity purification experiments. We compare a simple overlapping model to a partitioning model. In addition, we propose a visualization framework to delineate overlapping complexes from experimental data. We propose two models to predict protein complexes from experimental data. Our first model is in some sense the simplest possible one. It is based on frequent itemset mining, which merely counts the incidence of certain sets of proteins within the experimental results. The affinity of two sets of proteins to form clusters is modeled to be independent, regardless of any overlapping members between these sets. Our second model assumes that formation of protein complexes can be reduced to pairwise interactions between proteins. Interactions between proteins are more likely for pairs of proteins if they come from the same cluster. Based on this model, we use Markov Random Field theory to calculate a maximum-likelihood assignment of proteins to clusters

    Algorithmus zur Identifikation von Protein-Komplexen in high-throughput Daten

    No full text
    0\. Title, abstract, table of contents 1 1\. Introduction 11 2\. Previous work 17 3\. Exact frequent itemset model 29 4\. Frequent itemsets with errors 43 5\. Markov Random Fields 65 6\. Visualization of purifications 93 7\. Conclusion 107 8\. Bibliography 109 9\. Appendix 115Recent advances in proteomic technologies such as two-hybrid and biochemical purification allow large-scale investigations of protein interactions. The goal of this thesis is to investigate model-based approaches to predict protein complexes from tandem affinity purification experiments. We compare a simple overlapping model to a partitioning model. In addition, we propose a visualization framework to delineate overlapping complexes from experimental data. We propose two models to predict protein complexes from experimental data. Our first model is in some sense the simplest possible one. It is based on frequent itemset mining, which merely counts the incidence of certain sets of proteins within the experimental results. The affinity of two sets of proteins to form clusters is modeled to be independent, regardless of any overlapping members between these sets. Our second model assumes that formation of protein complexes can be reduced to pairwise interactions between proteins. Interactions between proteins are more likely for pairs of proteins if they come from the same cluster. Based on this model, we use Markov Random Field theory to calculate a maximum-likelihood assignment of proteins to clusters.Neue Forschungsergebnisse zu proteomischen Techniken, zum Beispiel Two-Hybrid und Biochemical Purification, erlauben Untersuchungen von Protein- Interaktionen in grossem Massstab. Diese Arbeit untersucht modellbasierte Ansaetze, um aus Tandem-Affinity-Purification-Experimenten Proteinkomplexe zu berechnen. Wir vergleichen ein einfaches Modell, dass Ueberlappungen zwischen Komplexen zulaesst, mit einem Partitionsmodell. Ausserdem stellen wir ein Visualisierungsverfahren vor, dass ueberlappende Komplexe in experimentell ermittelten Daten darstellt. Wir schlagen zwei Modelle vor, um Proteinkomplexe zu berechnen. Das erste, in gewissem Sinne einfachst moegliche, basiert auf Frequent Itemset Mining und zaehlt das Auftreten von Mengen von Proteinen in den experimentellen Ergebnissen. Wir neben dabei an, dass die Neigung von Proteinen, bestimmte Komplexe zu bilden, fuer unterschiedliche Komplexe statistisch unabhaengig ist, insbesondere auch dann, wenn die gleichen Proteine an den Komplexen beteiligt sind. Komplexe, die einander ueberlappen, sind damit erlaubt. Das zweite Modell stellt das andere Extrem dar und nimmt an, dass Komplexe die Menge von Proteinen partitionieren. Komplexe damit einander nicht ueberlappen koennen und sich Komplexbildung auf rein paarweises Verhalten von Proteinen zurueckfuehren laesst. In diesem Modell ist die Beobachtung einer Interaktion zwischen einem Proteinpaar wahrscheinlicher, wenn beide Proteine miteinander in einem Komplex vorkommen. Beruhend auf diesem Modell nutzen wir Markov Random Fields, um eine Maximum-Likelihood- Schaetzung von Komplexen zu berechnen

    The general hidden markov model library: Analyzing systems with unobservable states

    No full text
    Hidden Markov Models (HMM) are a class of statistical models which are widely used in a broad variety of disciplines for problems as diverse as understanding speech to finding genes which are implicated in causing cancer. Adaption for different problems is done by designing the models and, if necessary, extending the formalism. The General Hidden Markov Model (GHMM) C-library provides production-quality implementations of basic and advanced aspects of HMMs. The architecture is build around the software library, adding wrappers for using the library interactively from the languages Python and R and applications with graphical user interfaces for specific analysis and modeling tasks. We have found, that the GHMM can drastically reduce the effort for tackling novel research questions. We focus on the Graphical Query Language (GQL) application for analyzing experiments which measure the expression (or mRNA) levels of many genes simultaneously over time. Our approach, combining HMMs in a statistical mixture model, using partially supervised learning as the paradigm for training results in a highly effective, robust analysis tool for finding groups of genes sharing the same pattern of expression over time, even in the presence of high levels of noise

    C.: Unfinished Symphonies - songs of 3 1/2 worlds

    No full text
    In this workshop, I introduce some of the reasons for my fascination with alife and my desire to use it for music making. I also talk about some of the joys and sorrows of the collaboration process. Four alife worlds, Feeping Creatures, Spinners, Gakkimon Planet and Listening Sky are described in the order they were made. I talk about what I like and dislike about each one and how I would like to extend them

    The General Hidden Markov Model Library: Analyzing Systems with Unobservable States

    No full text
    The General Hidden Markov Model (GHMM) C-library provides production-quality implementations of basic and advanced aspects of HMMs. The architecture is built around a software library, adding wrappers for using the library interactively from the languages Python and R and applications with graphical user interfaces for specific analysis and modeling tasks. We have found, that the GHMM can drastically reduce the effort for tackling novel research questions. Software available from http://ghmm.or

    Identifying protein complexes directly from high-throughput TAP data with Markov random fields-0

    No full text
    <p><b>Copyright information:</b></p><p>Taken from "Identifying protein complexes directly from high-throughput TAP data with Markov random fields"</p><p>http://www.biomedcentral.com/1471-2105/8/482</p><p>BMC Bioinformatics 2007;8():482-482.</p><p>Published online 19 Dec 2007</p><p>PMCID:PMC2222659.</p><p></p>ive rate is set to 0.005 and the false negative rates is 0.2 or 0.5. With = 0.2 (2(a), 2(b)), MRF can recover the true clustering with the minimum negative log-likelihood which is taken on for 11 clusters. Notice that any more clusters do not reduce the cost any further; additional clusters simply remain empty. For = 0.5, the accuracy is worse and needs more empty clusters to reach convergence. In 2(c) and 2(d) the convergence rate fluctuates more
    corecore