Search CORE

82 research outputs found

CMASA: an accurate algorithm for detecting local protein structural similarity and its application to enzyme catalytic site annotation

Author: A Andreeva
A Stark
A Stark
BW Matthews
CJ Sigrist
CT Porter
E Krissinel
ED Scheeff
G Ausiello
GJ Kleywegt
Gong-Hua Li
H Ago
HM Berman
I Boltes
IN Shindyalov
JA Barker
JA Gerlt
JC Lagarias
Jing-Fei Huang
JS Fetrow
JW Torrance
K Kinoshita
L Holm
P Chen
PF Gherardini
RA Laskowski
RD Finn
RV Spriggs
S Schmitt
SF Altschul
SF Altschul
T Fawcett
T Madej
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background The rapid development of structural genomics has resulted in many "unknown function" proteins being deposited in Protein Data Bank (PDB), thus, the functional prediction of these proteins has become a challenge for structural bioinformatics. Several sequence-based and structure-based methods have been developed to predict protein function, but these methods need to be improved further, such as, enhancing the accuracy, sensitivity, and the computational speed. Here, an accurate algorithm, the CMASA (Contact MAtrix based local Structural Alignment algorithm), has been developed to predict unknown functions of proteins based on the local protein structural similarity. This algorithm has been evaluated by building a test set including 164 enzyme families, and also been compared to other methods. Results The evaluation of CMASA shows that the CMASA is highly accurate (0.96), sensitive (0.86), and fast enough to be used in the large-scale functional annotation. Comparing to both sequence-based and global structure-based methods, not only the CMASA can find remote homologous proteins, but also can find the active site convergence. Comparing to other local structure comparison-based methods, the CMASA can obtain the better performance than both FFF (a method using geometry to predict protein function) and SPASM (a local structure alignment method); and the CMASA is more sensitive than PINTS and is more accurate than JESS (both are local structure alignment methods). The CMASA was applied to annotate the enzyme catalytic sites of the non-redundant PDB, and at least 166 putative catalytic sites have been suggested, these sites can not be observed by the Catalytic Site Atlas (CSA). Conclusions The CMASA is an accurate algorithm for detecting local protein structural similarity, and it holds several advantages in predicting enzyme active sites. The CMASA can be used in large-scale enzyme active site annotation. The CMASA can be available by the mail-based server (<url>http://159.226.149.45/other1/CMASA/CMASA.htm</url>).</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Regression applied to protein binding site prediction and comparison with classification

Author: A Gutteridge
A Sali
A Shulman-Peleg
Benoît Macq
C Chen
C Zhang
E Vittinghoff
G Bartlett
G Zhang
H Berman
H Chen
H Zhang
H Zhou
H Zhou
J Drews
J Fetrow
J Giard
J Mintseris
J Watson
Jean-Luc Gala
Joachim Giard
JR Bradford
JR Bradford
Jérôme Ambroise
M Shatsky
ML Connolly
N Li
N Petrova
N Tuncbag
O Keskin
O Keskin
O Keskin
R Tobias
RG Coleman
S Gunn
S Jones
S Jones
S Liang
S Qin
Y Murakami
Y Xie
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background The structural genomics centers provide hundreds of protein structures of unknown function. Therefore, developing methods enabling the determination of a protein function automatically is imperative. The determination of a protein function can be achieved by studying the network of its physical interactions. In this context, identifying a potential binding site between proteins is of primary interest. In the literature, methods for predicting a potential binding site location generally are based on classification tools. The aim of this paper is to show that regression tools are more efficient than classification tools for patches based binding site predictors. For this purpose, we developed a patches based binding site localization method usable with either regression or classification tools. Results We compared predictive performances of regression tools with performances of machine learning classifiers. Using leave-one-out cross-validation, we showed that regression tools provide better predictions than classification ones. Among regression tools, Multilayer Perceptron ranked highest in the quality of predictions. We compared also the predictive performance of our patches based method using Multilayer Perceptron with the performance of three other methods usable through a web server. Our method performed similarly to the other methods. Conclusion Regression is more efficient than classification when applied to our binding site localization method. When it is possible, using regression instead of classification for other existing binding site predictors will probably improve results. Furthermore, the method presented in this work is flexible because the size of the predicted binding site is adjustable. This adaptability is useful when either false positive or negative rates have to be limited.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

DIAL UCLouvain

Mining protein loops using a structural alphabet and statistical exceptionality

Author: A Dembo
A Efimov
A Golovin
A Sacan
A Via
AC Camproux
AC Camproux
AC Camproux
Anne-Claude Camproux
AR Panchenko
AR Panchenko
B Oliva
BJ Polacco
BL Sibanda
BL Sibanda
BL Sibanda
BW Matthews
C Kiss
CG Hunter
CM Venkatachalam
D Leader
D Stuart
DF Burke
E Rocha
EG Hutchinson
EJ Milner-White
EJ Milner-White
F den Hollander
G Ausiello
G Ausiello
G Nuel
G Nuel
G Nuel
G Pugalenthi
GD Rose
Gregory Nuel
J Espadaler
J Martin
J Martin
J van Helden
J Wojcik
JF Leszczynski
JM Kwasigroch
JS Fetrow
JS Richardson
Juliette Martin
JW Sammon
JW Torrance
KC Chou
L Regad
LE Donate
Leslie Regad
LN Johnson
LR Rabiner
LS Bernstein
M Hollander
M Mönnigmann
M Saraste
MY Leung
N Colloc'h
N Fernandez-Fuentes
N Fernandez-Fuentes
O Sander
P Fuchs
PA Rice
PN Lewis
R Kolodny
S Karlin
S Kim
S Kullback
S Sourice
SA Benner
SA Benner
SD Rufino
V Pavone
W Kabsch
W Li
W Li
WL DeLano
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background Protein loops encompass 50% of protein residues in available three-dimensional structures. These regions are often involved in protein functions, e.g. binding site, catalytic pocket... However, the description of protein loops with conventional tools is an uneasy task. Regular secondary structures, helices and strands, have been widely studied whereas loops, because they are highly variable in terms of sequence and structure, are difficult to analyze. Due to data sparsity, long loops have rarely been systematically studied. Results We developed a simple and accurate method that allows the description and analysis of the structures of short and long loops using structural motifs without restriction on loop length. This method is based on the structural alphabet HMM-SA. HMM-SA allows the simplification of a three-dimensional protein structure into a one-dimensional string of states, where each state is a four-residue prototype fragment, called structural letter. The difficult task of the structural grouping of huge data sets is thus easily accomplished by handling structural letter strings as in conventional protein sequence analysis. We systematically extracted all seven-residue fragments in a bank of 93000 protein loops and grouped them according to the structural-letter sequence, named structural word. This approach permits a systematic analysis of loops of all sizes since we consider the structural motifs of seven residues rather than complete loops. We focused the analysis on highly recurrent words of loops (observed more than 30 times). Our study reveals that 73% of loop-lengths are covered by only 3310 highly recurrent structural words out of 28274 observed words). These structural words have low structural variability (mean RMSd of 0.85 Å). As expected, half of these motifs display a flanking-region preference but interestingly, two thirds are shared by short (less than 12 residues) and long loops. Moreover, half of recurrent motifs exhibit a significant level of amino-acid conservation with at least four significant positions and 87% of long loops contain at least one such word. We complement our analysis with the detection of statistically over-represented patterns of structural letters as in conventional DNA sequence analysis. About 30% (930) of structural words are over-represented, and cover about 40% of loop lengths. Interestingly, these words exhibit lower structural variability and higher sequential specificity, suggesting structural or functional constraints. Conclusions We developed a method to systematically decompose and study protein loops using recurrent structural motifs. This method is based on the structural alphabet HMM-SA and not on structural alignment and geometrical parameters. We extracted meaningful structural motifs that are found in both short and long loops. To our knowledge, it is the first time that pattern mining helps to increase the signal-to-noise ratio in protein loops. This finding helps to better describe protein loops and might permit to decrease the complexity of long-loop analysis. Detailed results are available at <url>http://www.mti.univ-paris-diderot.fr/publication/supplementary/2009/ACCLoop/</url>.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

Protein Docking by the Interface Structure Similarity: How Much Structure Is Needed?

Author: A Gursoy
A Rossi
A Sali
A Stark
A Tovchigrechko
AD Wilkins
AK Arakaki
AS Aytuna
D Douguet
D Korkin
D Kozakov
D La
D Petrey
D Reichmann
EM Mitchell
F Glaser
F Pazos
G Nicola
H Hasegawa
HX Zhou
IA Vakser
Ilya A. Vakser
J Fernandez-Recio
J Hunjan
J Janin
J Konc
JS Fetrow
L Lu
M Gao
M Zacharias
MA Larkin
MF Lensink
N Tuncbag
O Keskin
O Keskin
Ozlem Keskin
P Chakrabarti
Petras J. Kundrotas
PJ Kundrotas
PJ Kundrotas
PJ Kundrotas
QC Zhang
R Sinha
RB Russell
Rohita Sinha
S Gunther
S Jones
SA Cammer
TA Binkowski
U Ogmen
Y Gao
Y Ofran
Y Zhang
Y Zhang
Publication venue: Public Library of Science
Publication date: 01/01/2011
Field of study

The increasing availability of co-crystallized protein-protein complexes provides an opportunity to use template-based modeling for protein-protein docking. Structure alignment techniques are useful in detection of remote target-template similarities. The size of the structure involved in the alignment is important for the success in modeling. This paper describes a systematic large-scale study to find the optimal definition/size of the interfaces for the structure alignment-based docking applications. The results showed that structural areas corresponding to the cutoff values <12 Å across the interface inadequately represent structural details of the interfaces. With the increase of the cutoff beyond 12 Å, the success rate for the benchmark set of 99 protein complexes, did not increase significantly for higher accuracy models, and decreased for lower-accuracy models. The 12 Å cutoff was optimal in our interface alignment-based docking, and a likely best choice for the large-scale (e.g., on the scale of the entire genome) applications to protein interaction networks. The results provide guidelines for the docking approaches, including high-throughput applications to modeled structures

CiteSeerX

Crossref

Directory of Open Access Journals

KU ScholarWorks

PubMed Central

Recruitment of rare 3-grams at functional sites: Is this a mechanism for increasing enzyme specificity?

Abstract Background A wealth of unannotated and functionally unknown protein sequences has accumulated in recent years with rapid progresses in sequence genomics, giving rise to ever increasing demands for developing methods to efficiently assess functional sites. Sequence and structure conservations have traditionally been the major criteria adopted in various algorithms to identify functional sites. Here, we focus on the distributions of the 203 different types of <it>3</it>-grams (or triplets of sequentially contiguous amino acid) in the entire space of sequences accumulated to date in the UniProt database, and focus in particular on the rare <it>3</it>-grams distinguished by their high entropy-based information content. Results Comparison of the UniProt distributions with those observed near/at the active sites on a non-redundant dataset of 59 enzyme/ligand complexes shows that the active sites preferentially recruit <it>3</it>-grams distinguished by their low frequency in the UniProt. Three cases, Src kinase, hemoglobin, and tyrosyl-tRNA synthetase, are discussed in details to illustrate the biological significance of the results. Conclusion The results suggest that recruitment of rare <it>3</it>-grams may be an efficient mechanism for increasing specificity at functional sites. Rareness/scarcity emerges as a feature that may assist in identifying key sites for proteins function, providing information complementary to that derived from sequence alignments. In addition it provides us (for the first time) with a means of identifying potentially functional sites from sequence information alone, when sequence conservation properties are not available.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Protein structure search and local structure characterization

Author: A Andreeva
AC Camproux
AG de Brevern
AG de Brevern
AG de Brevern
AR Ortiz
B Offmann
B Rost
C Benros
C Bystroff
CA Orengo
D Baker
E Appella
F Birzele
F Guyon
G Pollastri
HM Berman
IN Shindyalo
J Garnier
J Schuchhardt
J Vesanto
JA Hartigan
JM Yang
JS Fetrow
L Holm
M Carpentier
M Dudev
M Tyagi
M Tyagi
M Tyagi
NJ Mulder
O Sander
R Unger
S Henikoff
Shih-Yen Ku
T Madej
TL Bailey
TM Mitchell
TN Petersen
U Hobohm
VS Gowri
W Humphrey
WM Zheng
WR Pearson
Y Liu
Y Ye
Yuh-Jyh Hu
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background Structural similarities among proteins can provide valuable insight into their functional mechanisms and relationships. As the number of available three-dimensional (3D) protein structures increases, a greater variety of studies can be conducted with increasing efficiency, among which is the design of protein structural alphabets. Structural alphabets allow us to characterize local structures of proteins and describe the global folding structure of a protein using a one-dimensional (1D) sequence. Thus, 1D sequences can be used to identify structural similarities among proteins using standard sequence alignment tools such as BLAST or FASTA. Results We used self-organizing maps in combination with a minimum spanning tree algorithm to determine the optimum size of a structural alphabet and applied the k-means algorithm to group protein fragnts into clusters. The centroids of these clusters defined the structural alphabet. We also developed a flexible matrix training system to build a substitution matrix (TRISUM-169) for our alphabet. Based on FASTA and using TRISUM-169 as the substitution matrix, we developed the SA-FAST alignment tool. We compared the performance of SA-FAST with that of various search tools in database-scale search tasks and found that SA-FAST was highly competitive in all tests conducted. Further, we evaluated the performance of our structural alphabet in recognizing specific structural domains of EGF and EGF-like proteins. Our method successfully recovered more EGF sub-domains using our structural alphabet than when using other structural alphabets. SA-FAST can be found at <url>http://140.113.166.178/safast/</url>. Conclusion The goal of this project was two-fold. First, we wanted to introduce a modular design pipeline to those who have been working with structural alphabets. Secondly, we wanted to open the door to researchers who have done substantial work in biological sequences but have yet to enter the field of protein structure research. Our experiments showed that by transforming the structural representations from 3D to 1D, several 1D-based tools can be applied to structural analysis, including similarity searches and structural motif finding.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Active Nuclear Receptors Exhibit Highly Correlated AF-2 Domain Motions

Author: A Amadei
A Tamrazi
AI Shulman
AM Brzozowski
B Brooks
B Goodwin
BA Johnson
BC Kallenberger
BR Brooks
Brenda R. S. Temple
BS Everitt
C Carlberg
C Handschin
DA Case
Denise G. Teotico
E Hustert
F Tama
Feng Ding
GD Schuler
H Greschik
H Huang
H Wang
J Orans
J Staudinger
Jacquelyn S. Fetrow
JF Gibrat
JP Renaud
K Hinsen
K Suhre
L Celik
LA Arnold
LA Arnold
M Rueda
Matthew R. Redinbo
MD Krasowski
MJ Chalmers
MM Tirion
Monica L. Frazier
Nikolay V. Dokholyan
O Marques
RE Watkins
RE Watkins
RE Watkins
RW Harrison
S Ekins
S Sharma
SA Kliewer
SA Kliewer
SM Noble
U Essman
WL Jorgensen
Y Duan
Y Xue
Y Xue
YA Elhaji
Publication venue: Public Library of Science
Publication date: 01/01/2008
Field of study

Nuclear receptor ligand binding domains (LBDs) convert ligand binding events into changes in gene expression by recruiting transcriptional coregulators to a conserved activation function-2 (AF-2) surface. While most nuclear receptor LBDs form homo- or heterodimers, the human nuclear receptor pregnane X receptor (PXR) forms a unique and essential homodimer and is proposed to assemble into a functional heterotetramer with the retinoid X receptor (RXR). How the homodimer interface, which is located 30 Å from the AF-2, would affect function at this critical surface has remained unclear. By using 20- to 30-ns molecular dynamics simulations on PXR in various oligomerization states, we observed a remarkably high degree of correlated motion in the PXR–RXR heterotetramer, most notably in the four helices that create the AF-2 domain. The function of such correlation may be to create “active-capable” receptor complexes that are ready to bind to transcriptional coactivators. Indeed, we found in additional simulations that active-capable receptor complexes involving other orphan or steroid nuclear receptors also exhibit highly correlated AF-2 domain motions. We further propose a mechanism for the transmission of long-range motions through the nuclear receptor LBD to the AF-2 surface. Taken together, our findings indicate that long-range motions within the LBD scaffold are critical to nuclear receptor function by promoting a mobile AF-2 state ready to bind coactivators

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Carolina Digital Repository

Filling Empty Seats: How Status and Organizational Hierarchies Affect Exploration Versus Exploitation in Team Design

Author: Ancona D. J.
Ancona D. J.
Arthur M. B.
Balio T.
Berger J.
Blau P. M.
Bordwell D.
Burgelman R. A.
Caves R. E.
Crafton D.
Fabrizio Perretti
Faulkner R. R.
Fetrow A. G.
Finler J. W.
Fleming L.
Floyd S. W.
Giacomo Negro
Goldman W.
Goode W. J.
Guzzo R. A.
Hackman J. R.
Hamel G.
Hampton B. B.
Hannan M. T.
Huy Q. N.
Ilgen D. R.
Jackson S. E.
Jackson S. E.
Jones C.
Jones G. R.
Katila R.
Kawin B. F.
Korn E. L.
Krackhardt D.
Kremer M.
Lenski G. E.
Levy E.
Lewis H. T.
Long J. S.
Louis M. R.
March J. G.
March J. G.
March J. G.
McGrath J. E.
McGrath J. E.
McKelvey R. D.
McPherson M.
Merton R. K.
Mezias J. M.
Miller D.
Miller D.
Moreland R. L.
Morrison E. W.
Nash J. R.
Nohria N.
Parsons T.
Parsons T.
Phillips D.
Podolny J. M.
Podolny J. M.
Reagans R.
Ridgeway C. L.
Rivkin J. W.
Rollag K.
Rosten L. C.
Shale R.
Siggelkow N.
Silver A.
Simmel G.
Sorensen J. B.
Stinchcombe A. L.
Thompson J. D.
Tushman M. L.
Weber M.
Webster M.
Wegener B.
Wilemon D. L.
Williams K. Y.
Winship C.
Publication venue: 'Academy of Management'
Publication date
Field of study

Crossref

Uso de plantas com finalidade medicinal por pessoas vivendo com HIV/ AIDS em terapia antirretroviral

Crossref

A horizontal alignment tool for numerical trend discovery in sequence data: application to protein hydropathy.

Author: A Andreeva
A Krogh
A Roy
A Schlessinger
AB Robinson
AG Murzin
AG Murzin
AJ Tebben
B Vroling
C Chothia
C Sander
DA Liberles
DM Engelman
DN Reshef
DT Jones
DW Buchan
E Cascales
EI Lutter
G Lebon
I Yomtovian
IN Shindyalov
IN Shindyalov
J Gu
J Hollien
J Kyte
J Skolnick
J Soeding
J Soeding
Jacquelyn S. Fetrow
James O. Wrabl
JC Wootten
JD Clements
JM Chandonia
JP Bannantine
JR Hill
JS Lolkema
JS Lolkema
K Henzler-Wildman
K Khafizov
KD Pruitt
KR Vinothkumar
L Aravind
L Holm
L Holm
L Holm
L Kali
LN Kinch
M dos Reis
N Tokuriki
Omar Hadzipasic
PN Bryan
PS Spencer
RI Sadreyev
S Neumann
S Topiol
SF Altschul
SF Altschul
SS Krishna
T Liu
T Tuller
V Alva
Vincent J. Hilser
W Kabsch
WA Cramer
WC Wong
WR Pearson
Y Bai
Y Bai
Y Huang
Y Jia
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 10/10/2013
Field of study

PMC3794901An algorithm is presented that returns the optimal pairwise gapped alignment of two sets of signed numerical sequence values. One distinguishing feature of this algorithm is a flexible comparison engine (based on both relative shape and absolute similarity measures) that does not rely on explicit gap penalties. Additionally, an empirical probability model is developed to estimate the significance of the returned alignment with respect to randomized data. The algorithm's utility for biological hypothesis formulation is demonstrated with test cases including database search and pairwise alignment of protein hydropathy. However, the algorithm and probability model could possibly be extended to accommodate other diverse types of protein or nucleic acid data, including positional thermodynamic stability and mRNA translation efficiency. The algorithm requires only numerical values as input and will readily compare data other than protein hydropathy. The tool is therefore expected to complement, rather than replace, existing sequence and structure based tools and may inform medical discovery, as exemplified by proposed similarity between a chlamydial ORFan protein and bacterial colicin pore-forming domain. The source code, documentation, and a basic web-server application are available.JH Libraries Open Access Fun

Crossref

Directory of Open Access Journals

PubMed Central

JScholarship

FigShare