357 research outputs found
Structure- and context-based analysis of the GxGYxYP family reveals a new putative class of glycoside hydrolase.
BackgroundGut microbiome metagenomics has revealed many protein families and domains found largely or exclusively in that environment. Proteins containing the GxGYxYP domain are over-represented in the gut microbiota, and are found in Polysaccharide Utilization Loci in the gut symbiont Bacteroides thetaiotaomicron, suggesting their involvement in polysaccharide metabolism, but little else is known of the function of this domain.ResultsGenomic context and domain architecture analyses support a role for the GxGYxYP domain in carbohydrate metabolism. Sparse occurrences in eukaryotes are the result of lateral gene transfer. The structure of the GxGYxYP domain-containing protein encoded by the BT2193 locus reveals two structural domains, the first composed of three divergent repeats with no recognisable homology to previously solved structures, the second a more familiar seven-stranded β/α barrel. Structure-based analyses including conservation mapping localise a presumed functional site to a cleft between the two domains of BT2193. Matching to a catalytic site template from a GH9 cellulase and other analyses point to a putative catalytic triad composed of Glu272, Asp331 and Asp333.ConclusionsWe suggest that GxGYxYP-containing proteins constitute a novel glycoside hydrolase family of as yet unknown specificity
Structural genomics analysis of uncharacterized protein families overrepresented in human gut bacteria identifies a novel glycoside hydrolase.
BackgroundBacteroides spp. form a significant part of our gut microbiome and are well known for optimized metabolism of diverse polysaccharides. Initial analysis of the archetypal Bacteroides thetaiotaomicron genome identified 172 glycosyl hydrolases and a large number of uncharacterized proteins associated with polysaccharide metabolism.ResultsBT_1012 from Bacteroides thetaiotaomicron VPI-5482 is a protein of unknown function and a member of a large protein family consisting entirely of uncharacterized proteins. Initial sequence analysis predicted that this protein has two domains, one on the N- and one on the C-terminal. A PSI-BLAST search found over 150 full length and over 90 half size homologs consisting only of the N-terminal domain. The experimentally determined three-dimensional structure of the BT_1012 protein confirms its two-domain architecture and structural analysis of both domains suggests their specific functions. The N-terminal domain is a putative catalytic domain with significant similarity to known glycoside hydrolases, the C-terminal domain has a beta-sandwich fold typically found in C-terminal domains of other glycosyl hydrolases, however these domains are typically involved in substrate binding. We describe the structure of the BT_1012 protein and discuss its sequence-structure relationship and their possible functional implications.ConclusionsStructural and sequence analyses of the BT_1012 protein identifies it as a glycosyl hydrolase, expanding an already impressive catalog of enzymes involved in polysaccharide metabolism in Bacteroides spp. Based on this we have renamed the Pfam families representing the two domains found in the BT_1012 protein, PF13204 and PF12904, as putative glycoside hydrolase and glycoside hydrolase-associated C-terminal domain respectively
A New Simulated Annealing Algorithm for the Multiple Sequence Alignment Problem: The approach of Polymers in a Random Media
We proposed a probabilistic algorithm to solve the Multiple Sequence
Alignment problem. The algorithm is a Simulated Annealing (SA) that exploits
the representation of the Multiple Alignment between sequences as a
directed polymer in dimensions. Within this representation we can easily
track the evolution in the configuration space of the alignment through local
moves of low computational cost. At variance with other probabilistic
algorithms proposed to solve this problem, our approach allows for the creation
and deletion of gaps without extra computational cost. The algorithm was tested
aligning proteins from the kinases family. When D=3 the results are consistent
with those obtained using a complete algorithm. For where the complete
algorithm fails, we show that our algorithm still converges to reasonable
alignments. Moreover, we study the space of solutions obtained and show that
depending on the number of sequences aligned the solutions are organized in
different ways, suggesting a possible source of errors for progressive
algorithms.Comment: 7 pages and 11 figure
Optimal contact map alignment of protein–protein interfaces
The long-standing problem of constructing protein structure alignments is of central importance in computational biology. The main goal is to provide an alignment of residue correspondences, in order to identify homologous residues across chains. A critical next step of this is the alignment of protein complexes and their interfaces. Here, we introduce the program CMAPi, a two-dimensional dynamic programming algorithm that, given a pair of protein complexes, optimally aligns the contact maps of their interfaces: it produces polynomial-time near-optimal alignments in the case of multiple complexes. We demonstrate the efficacy of our algorithm on complexes from PPI families listed in the SCOPPI database and from highly divergent cytokine families. In comparison to existing techniques, CMAPi generates more accurate alignments of interacting residues within families of interacting proteins, especially for sequences with low similarity. While previous methods that use an all-atom based representation of the interface have been successful, CMAPi's use of a contact map representation allows it to be more tolerant to conformational changes and thus to align more of the interaction surface. These improved interface alignments should enhance homology modeling and threading methods for predicting PPIs by providing a basis for generating template profiles for sequence–structure alignment
Simplified amino acid alphabets based on deviation of conditional probability from random background
The primitive data for deducing the Miyazawa-Jernigan contact energy or
BLOSUM score matrix consists of pair frequency counts. Each amino acid
corresponds to a conditional probability distribution. Based on the deviation
of such conditional probability from random background, a scheme for reduction
of amino acid alphabet is proposed. It is observed that evident discrepancy
exists between reduced alphabets obtained from raw data of the
Miyazawa-Jernigan's and BLOSUM's residue pair counts. Taking homologous
sequence database SCOP40 as a test set, we detect homology with the obtained
coarse-grained substitution matrices. It is verified that the reduced alphabets
obtained well preserve information contained in the original 20-letter
alphabet.Comment: 9 pages,3figure
Novel genes dramatically alter regulatory network topology in amphioxus
Domain rearrangements in the innate immune network of amphioxus suggests that domain shuffling has shaped the evolution of immune systems
Deriving amino acid contact potentials from their frequencies of occurence in proteins: a lattice model study
The possibility of deriving the contact potentials between amino acids from
their frequencies of occurence in proteins is discussed in evolutionary terms.
This approach allows the use of traditional thermodynamics to describe such
frequencies and, consequently, to develop a strategy to include in the
calculations correlations due to the spatial proximity of the amino acids and
to their overall tendency of being conserved in proteins. Making use of a
lattice model to describe protein chains and defining a "true" potential, we
test these strategies by selecting a database of folding model sequences,
deriving the contact potentials from such sequences and comparing them with the
"true" potential. Taking into account correlations allows for a markedly better
prediction of the interaction potentials
TOPSAN: a dynamic web database for structural genomics
The Open Protein Structure Annotation Network (TOPSAN) is a web-based collaboration platform for exploring and annotating structures determined by structural genomics efforts. Characterization of those structures presents a challenge since the majority of the proteins themselves have not yet been characterized. Responding to this challenge, the TOPSAN platform facilitates collaborative annotation and investigation via a user-friendly web-based interface pre-populated with automatically generated information. Semantic web technologies expand and enrich TOPSAN’s content through links to larger sets of related databases, and thus, enable data integration from disparate sources and data mining via conventional query languages. TOPSAN can be found at http://www.topsan.org
Maximum Cliques in Protein Structure Comparison
Computing the similarity between two protein structures is a crucial task in
molecular biology, and has been extensively investigated. Many protein
structure comparison methods can be modeled as maximum clique problems in
specific k-partite graphs, referred here as alignment graphs. In this paper, we
propose a new protein structure comparison method based on internal distances
(DAST) which is posed as a maximum clique problem in an alignment graph. We
also design an algorithm (ACF) for solving such maximum clique problems. ACF is
first applied in the context of VAST, a software largely used in the National
Center for Biotechnology Information, and then in the context of DAST. The
obtained results on real protein alignment instances show that our algorithm is
more than 37000 times faster than the original VAST clique solver which is
based on Bron & Kerbosch algorithm. We furthermore compare ACF with one of the
fastest clique finder, recently conceived by Ostergard. On a popular benchmark
(the Skolnick set) we observe that ACF is about 20 times faster in average than
the Ostergard's algorithm
- …