25 research outputs found
Identifying non-crystallographic symmetry in protein electron-density maps: a feature-based approach
Fragment Based Protein Active Site Analysis Using Markov Random Field Combinations of Stereochemical Feature-Based Classifications
Recent improvements in structural genomics efforts have greatly increased the
number of hypothetical proteins in the Protein Data Bank. Several computational
methodologies have been developed to determine the function of these proteins but
none of these methods have been able to account successfully for the diversity in
the sequence and structural conformations observed in proteins that have the same
function. An additional complication is the
flexibility in both the protein active site
and the ligand.
In this dissertation, novel approaches to deal with both the ligand flexibility
and the diversity in stereochemistry have been proposed. The active site analysis
problem is formalized as a classification problem in which, for a given test protein,
the goal is to predict the class of ligand most likely to bind the active site based
on its stereochemical nature and thereby define its function. Traditional methods
that have adapted a similar methodology have struggled to account for the
flexibility
observed in large ligands. Therefore, I propose a novel fragment-based approach to
dealing with larger ligands. The advantage of the fragment-based methodology is
that considering the protein-ligand interactions in a piecewise manner does not affect
the active site patterns, and it also provides for a way to account for the problems
associated with
flexible ligands. I also propose two feature-based methodologies to account for the diversity observed
in sequences and structural conformations among proteins with the same function.
The feature-based methodologies provide detailed descriptions of the active site
stereochemistry and are capable of identifying stereochemical patterns within the
active site despite the diversity.
Finally, I propose a Markov Random Field approach to combine the individual
ligand fragment classifications (based on the stereochemical descriptors) into a single
multi-fragment ligand class. This probabilistic framework combines the information
provided by stereochemical features with the information regarding geometric constraints
between ligand fragments to make a final ligand class prediction.
The feature-based fragment identification methodology had an accuracy of 84%
across a diverse set of ligand fragments and the mrf analysis was able to succesfully
combine the various ligand fragments (identified by feature-based analysis) into one
final ligand based on statistical models of ligand fragment distances. This novel
approach to protein active site analysis was additionally tested on 3 proteins with very
low sequence and structural similarity to other proteins in the PDB (a challenge for
traditional methods) and in each of these cases, this approach successfully identified
the cognate ligand. This approach addresses the two main issues that affect the
accuracy of current automated methodologies in protein function assignment
TEXTAL™: Artificial Intelligence Techniques for Automated Protein Structure Determination
X-ray crystallography is the most widely used method for determining the three-dimensional structures of proteins and other macromolecules. One of the most difficult steps in crystallography is interpreting the 3D image of the electron density cloud surrounding the protein. This is often done manually by crystallographers and is very time-consuming and error-prone. The difficulties stem from the fact that the domain knowledge required for interpreting electron density data is uncertain. Thus crystallographers often have to resort to intuitions and heuristics for decision-making. The problem is compounded by the fact that in most cases, data available is noisy and blurred. TEXTAL is a system designed to automate this challenging process of inferring the atomic structure of proteins from electron density data