67 research outputs found
Bayesian machine learning methods for predicting protein-peptide interactions and detecting mosaic structures in DNA sequences alignments
Short well-defined domains known as peptide recognition modules (PRMs) regulate many important protein-protein interactions involved in the formation of macromolecular complexes
and biochemical pathways. High-throughput experiments like yeast two-hybrid and phage
display are expensive and intrinsically noisy, therefore it would be desirable to target informative interactions and pursue in silico approaches. We propose a probabilistic discriminative
approach for predicting PRM-mediated protein-protein interactions from sequence data. The
model suffered from over-fitting, so Laplacian regularisation was found to be important in
achieving a reasonable generalisation performance. A hybrid approach yielded the best performance, where the binding site motifs were initialised with the predictions of a generative
model. We also propose another discriminative model which can be applied to all sequences
present in the organism at a significantly lower computational cost. This is due to its additional
assumption that the underlying binding sites tend to be similar.It is difficult to distinguish between the binding site motifs of the PRM due to the small
number of instances of each binding site motif. However, closely related species are expected
to share similar binding sites, which would be expected to be highly conserved. We investigated
rate variation along DNA sequence alignments, modelling confounding effects such as recombination. Traditional approaches to phylogenetic inference assume that a single phylogenetic
tree can represent the relationships and divergences between the taxa. However, taxa sequences
exhibit varying levels of conservation, e.g. due to regulatory elements and active binding sites,
and certain bacteria and viruses undergo interspecific recombination. We propose a phylogenetic factorial hidden Markov model to infer recombination and rate variation. We examined
the performance of our model and inference scheme on various synthetic alignments, and compared it to state of the art breakpoint models. We investigated three DNA sequence alignments:
one of maize actin genes, one bacterial (Neisseria), and the other of HIV-1. Inference is carried
out in the Bayesian framework, using Reversible Jump Markov Chain Monte Carlo
Discovering Domain-Domain Interactions toward Genome-Wide Protein Interaction and Function Predictions
To fully understand the underlying mechanisms of living cells, it is essential to delineate the intricate interactions between the cell proteins at a genome scale. Insights into the protein functions will enrich our understanding in human diseases and contribute to future drug developments. My dissertation focuses on the development and optimization of machine learning algorithms to study protein-protein interactions and protein function annotations through discovery of domain-domain interactions. First of all, I developed a novel domain-based random decision forest framework (RDFF) that explored all possible domain module pairs in mediating protein interactions. RDFF achieved higher sensitivity (79.78%) and specificity (64.38%) in interaction predictions of S. cerevisiae proteins compared to the popular Maximum Likelihood Estimation (MLE) approach. RDFF can also infer interactions for both single-domain pairs and domain module pairs. Secondly, I proposed cross-species interacting domain patterns (CSIDOP) approach that not only increased fidelity of existing functional annotations, but also proposed novel annotations for unknown proteins. CSIDOP accurately determined functions for 95.42% of proteins in H. sapiens using 2,972 GO `molecular function' terms. In contrast, most existing methods can only achieve accuracies of 50% to 75% using much smaller number of categories. Additionally, we were able to assign novel annotations to 181 unknown H. sapiens proteins. Finally, I implemented a web-based system, called PINFUN, which enables users to make online protein-protein interaction and protein function predictions based on a large-scale collection of known and putative domain interactions
Protein function prediction via protein-protein interaction - a Support Vector Machine approach
Master'sMASTER OF SCIENC
Protein Structure
Since the dawn of recorded history, and probably even before, men and women have been grasping at the mechanisms by which they themselves exist. Only relatively recently, did this grasp yield anything of substance, and only within the last several decades did the proteins play a pivotal role in this existence. In this expose on the topic of protein structure some of the current issues in this scientific field are discussed. The aim is that a non-expert can gain some appreciation for the intricacies involved, and in the current state of affairs. The expert meanwhile, we hope, can gain a deeper understanding of the topic
- …