11 research outputs found
Identifying protein complexes directly from high-throughput TAP data with Markov random fields
<p>Abstract</p> <p>Background</p> <p>Predicting protein complexes from experimental data remains a challenge due to limited resolution and stochastic errors of high-throughput methods. Current algorithms to reconstruct the complexes typically rely on a two-step process. First, they construct an interaction graph from the data, predominantly using heuristics, and subsequently cluster its vertices to identify protein complexes.</p> <p>Results</p> <p>We propose a model-based identification of protein complexes directly from the experimental observations. Our model of protein complexes based on Markov random fields explicitly incorporates false negative and false positive errors and exhibits a high robustness to noise. A model-based quality score for the resulting clusters allows us to identify reliable predictions in the complete data set. Comparisons with prior work on reference data sets shows favorable results, particularly for larger unfiltered data sets. Additional information on predictions, including the source code under the GNU Public License can be found at http://algorithmics.molgen.mpg.de/Static/Supplements/ProteinComplexes.</p> <p>Conclusion</p> <p>We can identify complexes in the data obtained from high-throughput experiments without prior elimination of proteins or weak interactions. The few parameters of our model, which does not rely on heuristics, can be estimated using maximum likelihood without a reference data set. This is particularly important for protein complex studies in organisms that do not have an established reference frame of known protein complexes.</p
Algorithm to identify protein complexes from high-throughput data
Recent advances in proteomic technologies such as two-hybrid and biochemical purification allow large-scale investigations of protein interactions. The goal of this thesis is to investigate model-based approaches to predict protein complexes from tandem affinity purification experiments. We compare a simple overlapping model to a partitioning model. In addition, we propose a visualization framework to delineate overlapping complexes from experimental data. We propose two models to predict protein complexes from experimental data. Our first model is in some sense the simplest possible one. It is based on frequent itemset mining, which merely counts the incidence of certain sets of proteins within the experimental results. The affinity of two sets of proteins to form clusters is modeled to be independent, regardless of any overlapping members between these sets. Our second model assumes that formation of protein complexes can be reduced to pairwise interactions between proteins. Interactions between proteins are more likely for pairs of proteins if they come from the same cluster. Based on this model, we use Markov Random Field theory to calculate a maximum-likelihood assignment of proteins to clusters
Algorithmus zur Identifikation von Protein-Komplexen in high-throughput Daten
0\. Title, abstract, table of contents 1
1\. Introduction 11
2\. Previous work 17
3\. Exact frequent itemset model 29
4\. Frequent itemsets with errors 43
5\. Markov Random Fields 65
6\. Visualization of purifications 93
7\. Conclusion 107
8\. Bibliography 109
9\. Appendix 115Recent advances in proteomic technologies such as two-hybrid and biochemical
purification allow large-scale investigations of protein interactions. The
goal of this thesis is to investigate model-based approaches to predict
protein complexes from tandem affinity purification experiments. We compare a
simple overlapping model to a partitioning model. In addition, we propose a
visualization framework to delineate overlapping complexes from experimental
data. We propose two models to predict protein complexes from experimental
data. Our first model is in some sense the simplest possible one. It is based
on frequent itemset mining, which merely counts the incidence of certain sets
of proteins within the experimental results. The affinity of two sets of
proteins to form clusters is modeled to be independent, regardless of any
overlapping members between these sets. Our second model assumes that
formation of protein complexes can be reduced to pairwise interactions between
proteins. Interactions between proteins are more likely for pairs of proteins
if they come from the same cluster. Based on this model, we use Markov Random
Field theory to calculate a maximum-likelihood assignment of proteins to
clusters.Neue Forschungsergebnisse zu proteomischen Techniken, zum Beispiel Two-Hybrid
und Biochemical Purification, erlauben Untersuchungen von Protein-
Interaktionen in grossem Massstab. Diese Arbeit untersucht modellbasierte
Ansaetze, um aus Tandem-Affinity-Purification-Experimenten Proteinkomplexe zu
berechnen. Wir vergleichen ein einfaches Modell, dass Ueberlappungen zwischen
Komplexen zulaesst, mit einem Partitionsmodell. Ausserdem stellen wir ein
Visualisierungsverfahren vor, dass ueberlappende Komplexe in experimentell
ermittelten Daten darstellt. Wir schlagen zwei Modelle vor, um Proteinkomplexe
zu berechnen. Das erste, in gewissem Sinne einfachst moegliche, basiert auf
Frequent Itemset Mining und zaehlt das Auftreten von Mengen von Proteinen in
den experimentellen Ergebnissen. Wir neben dabei an, dass die Neigung von
Proteinen, bestimmte Komplexe zu bilden, fuer unterschiedliche Komplexe
statistisch unabhaengig ist, insbesondere auch dann, wenn die gleichen
Proteine an den Komplexen beteiligt sind. Komplexe, die einander ueberlappen,
sind damit erlaubt. Das zweite Modell stellt das andere Extrem dar und nimmt
an, dass Komplexe die Menge von Proteinen partitionieren. Komplexe damit
einander nicht ueberlappen koennen und sich Komplexbildung auf rein paarweises
Verhalten von Proteinen zurueckfuehren laesst. In diesem Modell ist die
Beobachtung einer Interaktion zwischen einem Proteinpaar wahrscheinlicher,
wenn beide Proteine miteinander in einem Komplex vorkommen. Beruhend auf
diesem Modell nutzen wir Markov Random Fields, um eine Maximum-Likelihood-
Schaetzung von Komplexen zu berechnen
The general hidden markov model library: Analyzing systems with unobservable states
Hidden Markov Models (HMM) are a class of statistical models which are widely used in a broad variety of disciplines for problems as diverse as understanding speech to finding genes which are implicated in causing cancer. Adaption for different problems is done by designing the models and, if necessary, extending the formalism. The General Hidden Markov Model (GHMM) C-library provides production-quality implementations of basic and advanced aspects of HMMs. The architecture is build around the software library, adding wrappers for using the library interactively from the languages Python and R and applications with graphical user interfaces for specific analysis and modeling tasks. We have found, that the GHMM can drastically reduce the effort for tackling novel research questions. We focus on the Graphical Query Language (GQL) application for analyzing experiments which measure the expression (or mRNA) levels of many genes simultaneously over time. Our approach, combining HMMs in a statistical mixture model, using partially supervised learning as the paradigm for training results in a highly effective, robust analysis tool for finding groups of genes sharing the same pattern of expression over time, even in the presence of high levels of noise
C.: Unfinished Symphonies - songs of 3 1/2 worlds
In this workshop, I introduce some of the reasons for my fascination with alife and my desire to use it for music making. I also talk about some of the joys and sorrows of the collaboration process. Four alife worlds, Feeping Creatures, Spinners, Gakkimon Planet and Listening Sky are described in the order they were made. I talk about what I like and dislike about each one and how I would like to extend them
The General Hidden Markov Model Library: Analyzing Systems with Unobservable States
The General Hidden Markov Model (GHMM) C-library provides production-quality implementations of basic and advanced aspects of HMMs. The architecture is built around a software library, adding wrappers for using the library interactively from the languages Python and R and applications with graphical user interfaces for specific analysis and modeling tasks. We have found, that the GHMM can drastically reduce the effort for tackling novel research questions. Software available from http://ghmm.or
Identifying protein complexes directly from high-throughput TAP data with Markov random fields-0
<p><b>Copyright information:</b></p><p>Taken from "Identifying protein complexes directly from high-throughput TAP data with Markov random fields"</p><p>http://www.biomedcentral.com/1471-2105/8/482</p><p>BMC Bioinformatics 2007;8():482-482.</p><p>Published online 19 Dec 2007</p><p>PMCID:PMC2222659.</p><p></p>ive rate is set to 0.005 and the false negative rates is 0.2 or 0.5. With = 0.2 (2(a), 2(b)), MRF can recover the true clustering with the minimum negative log-likelihood which is taken on for 11 clusters. Notice that any more clusters do not reduce the cost any further; additional clusters simply remain empty. For = 0.5, the accuracy is worse and needs more empty clusters to reach convergence. In 2(c) and 2(d) the convergence rate fluctuates more