Search CORE

Structural genomics analysis of uncharacterized protein families overrepresented in human gut bacteria identifies a novel glycoside hydrolase.

Author: Axelrod Herbert L
Chang Yuanyuan
Eberhardt Ruth Y
Godzik Adam
Li Zhanwen
Rigden Daniel J
Sheydina Anna
Zmasek Christian C
Publication venue: eScholarship, University of California
Publication date: 01/01/2014
Field of study

BackgroundBacteroides spp. form a significant part of our gut microbiome and are well known for optimized metabolism of diverse polysaccharides. Initial analysis of the archetypal Bacteroides thetaiotaomicron genome identified 172 glycosyl hydrolases and a large number of uncharacterized proteins associated with polysaccharide metabolism.ResultsBT_1012 from Bacteroides thetaiotaomicron VPI-5482 is a protein of unknown function and a member of a large protein family consisting entirely of uncharacterized proteins. Initial sequence analysis predicted that this protein has two domains, one on the N- and one on the C-terminal. A PSI-BLAST search found over 150 full length and over 90 half size homologs consisting only of the N-terminal domain. The experimentally determined three-dimensional structure of the BT_1012 protein confirms its two-domain architecture and structural analysis of both domains suggests their specific functions. The N-terminal domain is a putative catalytic domain with significant similarity to known glycoside hydrolases, the C-terminal domain has a beta-sandwich fold typically found in C-terminal domains of other glycosyl hydrolases, however these domains are typically involved in substrate binding. We describe the structure of the BT_1012 protein and discuss its sequence-structure relationship and their possible functional implications.ConclusionsStructural and sequence analyses of the BT_1012 protein identifies it as a glycosyl hydrolase, expanding an already impressive catalog of enzymes involved in polysaccharide metabolism in Bacteroides spp. Based on this we have renamed the Pfam families representing the two domains found in the BT_1012 protein, PF13204 and PF12904, as putative glycoside hydrolase and glycoside hydrolase-associated C-terminal domain respectively

Springer - Publisher Connector

A New Simulated Annealing Algorithm for the Multiple Sequence Alignment Problem: The approach of Polymers in a Random Media

Author: A. Godzik
D. Gunsfield
J. Kim
M. Hernández-Guía
M. Ishikawa
M. S. Waterman
P. Pevzner
R. Durbin
R. Mulet
S. Geman
S. Rodríguez-Pérez
Publication venue: 'American Physical Society (APS)'
Publication date: 10/01/2005
Field of study

We proposed a probabilistic algorithm to solve the Multiple Sequence Alignment problem. The algorithm is a Simulated Annealing (SA) that exploits the representation of the Multiple Alignment between

D

sequences as a directed polymer in

D

dimensions. Within this representation we can easily track the evolution in the configuration space of the alignment through local moves of low computational cost. At variance with other probabilistic algorithms proposed to solve this problem, our approach allows for the creation and deletion of gaps without extra computational cost. The algorithm was tested aligning proteins from the kinases family. When D=3 the results are consistent with those obtained using a complete algorithm. For

D>3

where the complete algorithm fails, we show that our algorithm still converges to reasonable alignments. Moreover, we study the space of solutions obtained and show that depending on the number of sequences aligned the solutions are organized in different ways, suggesting a possible source of errors for progressive algorithms.Comment: 7 pages and 11 figure

The Signal for Signaling, Found

Author: Adam Godzik
C. Erec Stebbins
J Stavrinides
Marcin Grynberg
R Arnold
R Rosqvist
R Samudrala
Publication venue: Public Library of Science
Publication date: 01/04/2009
Field of study

Directory of Open Access Journals

Optimal contact map alignment of protein–protein interfaces

Author: B. Berger
Bowie
Dunbrack
Edgar
Godzik
Higgins
J. Bienkowska
Lu
Lu
Pieper
Smith
V. Pulim
Winter
Xu
Publication venue: Oxford University Press
Publication date: 01/07/2008
Field of study

The long-standing problem of constructing protein structure alignments is of central importance in computational biology. The main goal is to provide an alignment of residue correspondences, in order to identify homologous residues across chains. A critical next step of this is the alignment of protein complexes and their interfaces. Here, we introduce the program CMAPi, a two-dimensional dynamic programming algorithm that, given a pair of protein complexes, optimally aligns the contact maps of their interfaces: it produces polynomial-time near-optimal alignments in the case of multiple complexes. We demonstrate the efficacy of our algorithm on complexes from PPI families listed in the SCOPPI database and from highly divergent cytokine families. In comparison to existing techniques, CMAPi generates more accurate alignments of interacting residues within families of interacting proteins, especially for sequences with low similarity. While previous methods that use an all-atom based representation of the interface have been successful, CMAPi's use of a contact map representation allows it to be more tolerant to conformational changes and thus to align more of the interaction surface. These improved interface alignments should enhance homology modeling and threading methods for predicting PPIs by providing a basis for generating template profiles for sequence–structure alignment

Boston University Institutional Repository (OpenBU)

DSpace@MIT

Simplified amino acid alphabets based on deviation of conditional probability from random background

Author: A. Godzik
A.G. Murzin
C.E. Schafmeister
D.S. Riddle
Di Liu
H.S. Chan
J. Wang
Ji Qi
K.W. Plaxco
L.R. Murphy
M. Munson
S. Henikoff
S. Miyazawa
S.E. Brenner
S.F. Altschul
S.F. Altschul
Wei-Mou Zheng
Xin Liu
Publication venue: 'American Physical Society (APS)'
Publication date: 01/01/2002
Field of study

The primitive data for deducing the Miyazawa-Jernigan contact energy or BLOSUM score matrix consists of pair frequency counts. Each amino acid corresponds to a conditional probability distribution. Based on the deviation of such conditional probability from random background, a scheme for reduction of amino acid alphabet is proposed. It is observed that evident discrepancy exists between reduced alphabets obtained from raw data of the Miyazawa-Jernigan's and BLOSUM's residue pair counts. Taking homologous sequence database SCOP40 as a test set, we detect homology with the obtained coarse-grained substitution matrices. It is verified that the reduced alphabets obtained well preserve information contained in the original 20-letter alphabet.Comment: 9 pages,3figure

CERN Document Server

Novel genes dramatically alter regulatory network topology in amphioxus

Author: Dishaw Larry J
Godzik Adam
Litman Gary W
Mueller M Gail
Ye Yuzhen
Zhang Qing
Zmasek Christian M
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Domain rearrangements in the innate immune network of amphioxus suggests that domain shuffling has shaped the evolution of immune systems

Springer - Publisher Connector

Deriving amino acid contact potentials from their frequencies of occurence in proteins: a lattice model study

Author: Abkkevich V I
Anfinsen C B
Betancourt M R
Bryngelson J D
Creighton T E
D Provasi
Derrida B
Finkelstein A V
G Tiana
Godzik A
Greiner W
Kolinski A
Li H
M Colombo
Miyazawa S
R A Broglia
Rost B
Shakhnovich E I
Shimada J
Sippl M J
Zhang L
Publication venue: 'IOP Publishing'
Publication date: 01/01/2004
Field of study

The possibility of deriving the contact potentials between amino acids from their frequencies of occurence in proteins is discussed in evolutionary terms. This approach allows the use of traditional thermodynamics to describe such frequencies and, consequently, to develop a strategy to include in the calculations correlations due to the spatial proximity of the amino acids and to their overall tendency of being conserved in proteins. Making use of a lattice model to describe protein chains and defining a "true" potential, we test these strategies by selecting a database of folding model sequences, deriving the contact potentials from such sequences and comparing them with the "true" potential. Taking into account correlations allows for a markedly better prediction of the interaction potentials

AIR Universita degli studi di Milano

TOPSAN: a dynamic web database for structural genomics

Author: A. Godzik
Bernstein
C. Bakolitsa
C. M. Zmasek
D. Weekes
Hodis
J. Wooley
K. Ellrott
Norvell
S. Sri Krishna
Weekes
Publication venue: Oxford University Press
Publication date: 01/01/2011
Field of study

The Open Protein Structure Annotation Network (TOPSAN) is a web-based collaboration platform for exploring and annotating structures determined by structural genomics efforts. Characterization of those structures presents a challenge since the majority of the proteins themselves have not yet been characterized. Responding to this challenge, the TOPSAN platform facilitates collaborative annotation and investigation via a user-friendly web-based interface pre-populated with automatically generated information. Semantic web technologies expand and enrich TOPSAN’s content through links to larger sets of related databases, and thus, enable data integration from disparate sources and data mining via conventional query languages. TOPSAN can be found at http://www.topsan.org

Maximum Cliques in Protein Structure Comparison

Author: A. Andreeva
A. Caprara
A. Godzik
D. Strickland
E. Tomita
I. Lerman
J. Konc
J. Martin
J.F. Gibrat
M. Fredman
M. Sierk
P.R.J. Östergård
R. Andonov
R. Karp
Publication venue
Publication date: 01/01/2009
Field of study

Computing the similarity between two protein structures is a crucial task in molecular biology, and has been extensively investigated. Many protein structure comparison methods can be modeled as maximum clique problems in specific k-partite graphs, referred here as alignment graphs. In this paper, we propose a new protein structure comparison method based on internal distances (DAST) which is posed as a maximum clique problem in an alignment graph. We also design an algorithm (ACF) for solving such maximum clique problems. ACF is first applied in the context of VAST, a software largely used in the National Center for Biotechnology Information, and then in the context of DAST. The obtained results on real protein alignment instances show that our algorithm is more than 37000 times faster than the original VAST clique solver which is based on Bron & Kerbosch algorithm. We furthermore compare ACF with one of the fastest clique finder, recently conceived by Ostergard. On a popular benchmark (the Skolnick set) we observe that ACF is about 20 times faster in average than the Ostergard's algorithm

HAL-CentraleSupelec