132 research outputs found
Prioritizing orphan proteins for further study using phylogenomics and gene expression profiles in Streptomyces coelicolor
BACKGROUND:Streptomyces coelicolor, a model organism of antibiotic producing bacteria, has one of the largest genomes of the bacterial kingdom, including 7825 predicted protein coding genes. A large number of these genes, nearly 34%, are functionally orphan (hypothetical proteins with unknown function). However, in gene expression time course data, many of these functionally orphan genes show interesting expression patterns.RESULTS:In this paper, we analyzed all functionally orphan genes of Streptomyces coelicolor and identified a list of "high priority" orphans by combining gene expression analysis and additional phylogenetic information (i.e. the level of evolutionary conservation of each protein).CONCLUSIONS:The prioritized orphan genes are promising candidates to be examined experimentally in the lab for further characterization of their functio
Exploring the Evolution of Novel Enzyme Functions within Structurally Defined Protein Superfamilies
In order to understand the evolution of enzyme reactions and to gain an overview of biological catalysis we have combined sequence and structural data to generate phylogenetic trees in an analysis of 276 structurally defined enzyme superfamilies, and used these to study how enzyme functions have evolved. We describe in detail the analysis of two superfamilies to illustrate different paradigms of enzyme evolution. Gathering together data from all the superfamilies supports and develops the observation that they have all evolved to act on a diverse set of substrates, whilst the evolution of new chemistry is much less common. Despite that, by bringing together so much data, we can provide a comprehensive overview of the most common and rare types of changes in function. Our analysis demonstrates on a larger scale than previously studied, that modifications in overall chemistry still occur, with all possible changes at the primary level of the Enzyme Commission (E.C.) classification observed to a greater or lesser extent. The phylogenetic trees map out the evolutionary route taken within a superfamily, as well as all the possible changes within a superfamily. This has been used to generate a matrix of observed exchanges from one enzyme function to another, revealing the scale and nature of enzyme evolution and that some types of exchanges between and within E.C. classes are more prevalent than others. Surprisingly a large proportion (71%) of all known enzyme functions are performed by this relatively small set of 276 superfamilies. This reinforces the hypothesis that relatively few ancient enzymatic domain superfamilies were progenitors for most of the chemistry required for life
Bhageerath: an energy based web enabled computer software suite for limiting the search space of tertiary structures of small globular proteins
We describe here an energy based computer software suite for narrowing down the search space of tertiary structures of small globular proteins. The protocol comprises eight different computational modules that form an automated pipeline. It combines physics based potentials with biophysical filters to arrive at 10 plausible candidate structures starting from sequence and secondary structure information. The methodology has been validated here on 50 small globular proteins consisting of 2–3 helices and strands with known tertiary structures. For each of these proteins, a structure within 3–6 Å RMSD (root mean square deviation) of the native has been obtained in the 10 lowest energy structures. The protocol has been web enabled and is accessible at
EC-BLAST: a tool to automatically search and compare enzyme reactions.
We present EC-BLAST (http://www.ebi.ac.uk/thornton-srv/software/rbl/), an algorithm and Web tool for quantitative similarity searches between enzyme reactions at three levels: bond change, reaction center and reaction structure similarity. It uses bond changes and reaction patterns for all known biochemical reactions derived from atom-atom mapping across each reaction. EC-BLAST has the potential to improve enzyme classification, identify previously uncharacterized or new biochemical transformations, improve the assignment of enzyme function to sequences, and assist in enzyme engineering
Predicting Positive p53 Cancer Rescue Regions Using Most Informative Positive (MIP) Active Learning
Many protein engineering problems involve finding mutations that produce proteins
with a particular function. Computational active learning is an attractive
approach to discover desired biological activities. Traditional active learning
techniques have been optimized to iteratively improve classifier accuracy, not
to quickly discover biologically significant results. We report here a novel
active learning technique, Most Informative Positive (MIP), which is tailored to
biological problems because it seeks novel and informative positive results. MIP
active learning differs from traditional active learning methods in two ways:
(1) it preferentially seeks Positive (functionally active) examples; and (2) it
may be effectively extended to select gene regions suitable for high throughput
combinatorial mutagenesis. We applied MIP to discover mutations in the tumor
suppressor protein p53 that reactivate mutated p53 found in human cancers. This
is an important biomedical goal because p53 mutants have been
implicated in half of all human cancers, and restoring active p53 in tumors
leads to tumor regression. MIP found Positive (cancer rescue) p53 mutants
in silico using 33% fewer experiments than
traditional non-MIP active learning, with only a minor decrease in classifier
accuracy. Applying MIP to in vivo experimentation yielded
immediate Positive results. Ten different p53 mutations found in human cancers
were paired in silico with all possible single amino acid
rescue mutations, from which MIP was used to select a Positive Region predicted
to be enriched for p53 cancer rescue mutants. In vivo assays
showed that the predicted Positive Region: (1) had significantly more
(p<0.01) new strong cancer rescue mutants than control regions (Negative,
and non-MIP active learning); (2) had slightly more new strong cancer rescue
mutants than an Expert region selected for purely biological considerations; and
(3) rescued for the first time the previously unrescuable p53 cancer mutant
P152L
Detection of Alpha-Rod Protein Repeats Using a Neural Network and Application to Huntingtin
A growing number of solved protein structures display an elongated structural
domain, denoted here as alpha-rod, composed of stacked pairs of anti-parallel
alpha-helices. Alpha-rods are flexible and expose a large surface, which makes
them suitable for protein interaction. Although most likely originating by
tandem duplication of a two-helix unit, their detection using sequence
similarity between repeats is poor. Here, we show that alpha-rod repeats can be
detected using a neural network. The network detects more repeats than are
identified by domain databases using multiple profiles, with a low level of
false positives (<10%). We identify alpha-rod repeats in
approximately 0.4% of proteins in eukaryotic genomes. We then
investigate the results for all human proteins, identifying alpha-rod repeats
for the first time in six protein families, including proteins STAG1-3, SERAC1,
and PSMD1-2 & 5. We also characterize a short version of these repeats
in eight protein families of Archaeal, Bacterial, and Fungal species. Finally,
we demonstrate the utility of these predictions in directing experimental work
to demarcate three alpha-rods in huntingtin, a protein mutated in
Huntington's disease. Using yeast two hybrid analysis and an
immunoprecipitation technique, we show that the huntingtin fragments containing
alpha-rods associate with each other. This is the first definition of domains in
huntingtin and the first validation of predicted interactions between fragments
of huntingtin, which sets up directions toward functional characterization of
this protein. An implementation of the repeat detection algorithm is available
as a Web server with a simple graphical output: http://www.ogic.ca/projects/ard. This can be further visualized
using BiasViz, a graphic tool for representation of multiple sequence
alignments
Solution Structure and Phylogenetics of Prod1, a Member of the Three-Finger Protein Superfamily Implicated in Salamander Limb Regeneration
Prod1 is a cell-surface molecule of the three-finger protein (TFP) superfamily involved in the specification of newt limb PD identity. The TFP superfamily is a highly diverse group of metazoan proteins that includes snake venom toxins, mammalian transmembrane receptors and miscellaneous signaling molecules..The available data suggest that Prod1, and thereby its role in encoding PD identity, is restricted to salamanders. The lack of comparable limb-regenerative capability in other adult vertebrates could be correlated with the absence of the Prod1 gene
Novel Peptide-Mediated Interactions Derived from High-Resolution 3-Dimensional Structures
Many biological responses to intra- and extracellular stimuli are regulated through complex networks of transient protein interactions where a globular domain in one protein recognizes a linear peptide from another, creating a relatively small contact interface. These peptide stretches are often found in unstructured regions of proteins, and contain a consensus motif complementary to the interaction surface displayed by their binding partners. While most current methods for the de novo discovery of such motifs exploit their tendency to occur in disordered regions, our work here focuses on another observation: upon binding to their partner domain, motifs adopt a well-defined structure. Indeed, through the analysis of all peptide-mediated interactions of known high-resolution three-dimensional (3D) structure, we found that the structure of the peptide may be as characteristic as the consensus motif, and help identify target peptides even though they do not match the established patterns. Our analyses of the structural features of known motifs reveal that they tend to have a particular stretched and elongated structure, unlike most other peptides of the same length. Accordingly, we have implemented a strategy based on a Support Vector Machine that uses this features, along with other structure-encoded information about binding interfaces, to search the set of protein interactions of known 3D structure and to identify unnoticed peptide-mediated interactions among them. We have also derived consensus patterns for these interactions, whenever enough information was available, and compared our results with established linear motif patterns and their binding domains. Finally, to cross-validate our identification strategy, we scanned interactome networks from four model organisms with our newly derived patterns to see if any of them occurred more often than expected. Indeed, we found significant over-representations for 64 domain-motif interactions, 46 of which had not been described before, involving over 6,000 interactions in total for which we could suggest the molecular details determining the binding
An Atlas of the Thioredoxin Fold Class Reveals the Complexity of Function-Enabling Adaptations
The group of proteins that contain a thioredoxin (Trx) fold is huge and diverse. Assessment of the variation in catalytic machinery of Trx fold proteins is essential in providing a foundation for understanding their functional diversity and predicting the function of the many uncharacterized members of the class. The proteins of the Trx fold class retain common features—including variations on a dithiol CxxC active site motif—that lead to delivery of function. We use protein similarity networks to guide an analysis of how structural and sequence motifs track with catalytic function and taxonomic categories for 4,082 representative sequences spanning the known superfamilies of the Trx fold. Domain structure in the fold class is varied and modular, with 2.8% of sequences containing more than one Trx fold domain. Most member proteins are bacterial. The fold class exhibits many modifications to the CxxC active site motif—only 56.8% of proteins have both cysteines, and no functional groupings have absolute conservation of the expected catalytic motif. Only a small fraction of Trx fold sequences have been functionally characterized. This work provides a global view of the complex distribution of domains and catalytic machinery throughout the fold class, showing that each superfamily contains remnants of the CxxC active site. The unifying context provided by this work can guide the comparison of members of different Trx fold superfamilies to gain insight about their structure-function relationships, illustrated here with the thioredoxins and peroxiredoxins
- …