Search CORE

9 research outputs found

Discriminative structural approaches for enzyme active-site prediction

Author: A Stark
A Stark
AC Wallace
B Colson
CS Wright
DG Kendall
EC Webb
GJ Kleywegt
JA Barker
JS Fetrow
JW Torrance
KC Chou
L Holm
M Gribskov
N Nagano
N Nagano
Nozomi Nagano
PF Gherardini
RA Laskowski
T Hastie
T Kato
T Kato
T Kato
Tsuyoshi Kato
VA Ivanisenko
Y Loewenstein
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background Predicting enzyme active-sites in proteins is an important issue not only for protein sciences but also for a variety of practical applications such as drug design. Because enzyme reaction mechanisms are based on the local structures of enzyme active-sites, various template-based methods that compare local structures in proteins have been developed to date. In comparing such local sites, a simple measurement, RMSD, has been used so far. Results This paper introduces new machine learning algorithms that refine the similarity/deviation for comparison of local structures. The similarity/deviation is applied to two types of applications, single template analysis and multiple template analysis. In the single template analysis, a single template is used as a query to search proteins for active sites, whereas a protein structure is examined as a query to discover the possible active-sites using a set of templates in the multiple template analysis. Conclusions This paper experimentally illustrates that the machine learning algorithms effectively improve the similarity/deviation measurements for both the analyses.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Mining protein loops using a structural alphabet and statistical exceptionality

Author: A Dembo
A Efimov
A Golovin
A Sacan
A Via
AC Camproux
AC Camproux
AC Camproux
Anne-Claude Camproux
AR Panchenko
AR Panchenko
B Oliva
BJ Polacco
BL Sibanda
BL Sibanda
BL Sibanda
BW Matthews
C Kiss
CG Hunter
CM Venkatachalam
D Leader
D Stuart
DF Burke
E Rocha
EG Hutchinson
EJ Milner-White
EJ Milner-White
F den Hollander
G Ausiello
G Ausiello
G Nuel
G Nuel
G Nuel
G Pugalenthi
GD Rose
Gregory Nuel
J Espadaler
J Martin
J Martin
J van Helden
J Wojcik
JF Leszczynski
JM Kwasigroch
JS Fetrow
JS Richardson
Juliette Martin
JW Sammon
JW Torrance
KC Chou
L Regad
LE Donate
Leslie Regad
LN Johnson
LR Rabiner
LS Bernstein
M Hollander
M Mönnigmann
M Saraste
MY Leung
N Colloc'h
N Fernandez-Fuentes
N Fernandez-Fuentes
O Sander
P Fuchs
PA Rice
PN Lewis
R Kolodny
S Karlin
S Kim
S Kullback
S Sourice
SA Benner
SA Benner
SD Rufino
V Pavone
W Kabsch
W Li
W Li
WL DeLano
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background Protein loops encompass 50% of protein residues in available three-dimensional structures. These regions are often involved in protein functions, e.g. binding site, catalytic pocket... However, the description of protein loops with conventional tools is an uneasy task. Regular secondary structures, helices and strands, have been widely studied whereas loops, because they are highly variable in terms of sequence and structure, are difficult to analyze. Due to data sparsity, long loops have rarely been systematically studied. Results We developed a simple and accurate method that allows the description and analysis of the structures of short and long loops using structural motifs without restriction on loop length. This method is based on the structural alphabet HMM-SA. HMM-SA allows the simplification of a three-dimensional protein structure into a one-dimensional string of states, where each state is a four-residue prototype fragment, called structural letter. The difficult task of the structural grouping of huge data sets is thus easily accomplished by handling structural letter strings as in conventional protein sequence analysis. We systematically extracted all seven-residue fragments in a bank of 93000 protein loops and grouped them according to the structural-letter sequence, named structural word. This approach permits a systematic analysis of loops of all sizes since we consider the structural motifs of seven residues rather than complete loops. We focused the analysis on highly recurrent words of loops (observed more than 30 times). Our study reveals that 73% of loop-lengths are covered by only 3310 highly recurrent structural words out of 28274 observed words). These structural words have low structural variability (mean RMSd of 0.85 Å). As expected, half of these motifs display a flanking-region preference but interestingly, two thirds are shared by short (less than 12 residues) and long loops. Moreover, half of recurrent motifs exhibit a significant level of amino-acid conservation with at least four significant positions and 87% of long loops contain at least one such word. We complement our analysis with the detection of statistically over-represented patterns of structural letters as in conventional DNA sequence analysis. About 30% (930) of structural words are over-represented, and cover about 40% of loop lengths. Interestingly, these words exhibit lower structural variability and higher sequential specificity, suggesting structural or functional constraints. Conclusions We developed a method to systematically decompose and study protein loops using recurrent structural motifs. This method is based on the structural alphabet HMM-SA and not on structural alignment and geometrical parameters. We extracted meaningful structural motifs that are found in both short and long loops. To our knowledge, it is the first time that pattern mining helps to increase the signal-to-noise ratio in protein loops. This finding helps to better describe protein loops and might permit to decrease the complexity of long-loop analysis. Detailed results are available at <url>http://www.mti.univ-paris-diderot.fr/publication/supplementary/2009/ACCLoop/</url>.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

Protein structure search and local structure characterization

Author: A Andreeva
AC Camproux
AG de Brevern
AG de Brevern
AG de Brevern
AR Ortiz
B Offmann
B Rost
C Benros
C Bystroff
CA Orengo
D Baker
E Appella
F Birzele
F Guyon
G Pollastri
HM Berman
IN Shindyalo
J Garnier
J Schuchhardt
J Vesanto
JA Hartigan
JM Yang
JS Fetrow
L Holm
M Carpentier
M Dudev
M Tyagi
M Tyagi
M Tyagi
NJ Mulder
O Sander
R Unger
S Henikoff
Shih-Yen Ku
T Madej
TL Bailey
TM Mitchell
TN Petersen
U Hobohm
VS Gowri
W Humphrey
WM Zheng
WR Pearson
Y Liu
Y Ye
Yuh-Jyh Hu
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background Structural similarities among proteins can provide valuable insight into their functional mechanisms and relationships. As the number of available three-dimensional (3D) protein structures increases, a greater variety of studies can be conducted with increasing efficiency, among which is the design of protein structural alphabets. Structural alphabets allow us to characterize local structures of proteins and describe the global folding structure of a protein using a one-dimensional (1D) sequence. Thus, 1D sequences can be used to identify structural similarities among proteins using standard sequence alignment tools such as BLAST or FASTA. Results We used self-organizing maps in combination with a minimum spanning tree algorithm to determine the optimum size of a structural alphabet and applied the k-means algorithm to group protein fragnts into clusters. The centroids of these clusters defined the structural alphabet. We also developed a flexible matrix training system to build a substitution matrix (TRISUM-169) for our alphabet. Based on FASTA and using TRISUM-169 as the substitution matrix, we developed the SA-FAST alignment tool. We compared the performance of SA-FAST with that of various search tools in database-scale search tasks and found that SA-FAST was highly competitive in all tests conducted. Further, we evaluated the performance of our structural alphabet in recognizing specific structural domains of EGF and EGF-like proteins. Our method successfully recovered more EGF sub-domains using our structural alphabet than when using other structural alphabets. SA-FAST can be found at <url>http://140.113.166.178/safast/</url>. Conclusion The goal of this project was two-fold. First, we wanted to introduce a modular design pipeline to those who have been working with structural alphabets. Secondly, we wanted to open the door to researchers who have done substantial work in biological sequences but have yet to enter the field of protein structure research. Our experiments showed that by transforming the structural representations from 3D to 1D, several 1D-based tools can be applied to structural analysis, including similarity searches and structural motif finding.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Recruitment of rare 3-grams at functional sites: Is this a mechanism for increasing enzyme specificity?

Abstract Background A wealth of unannotated and functionally unknown protein sequences has accumulated in recent years with rapid progresses in sequence genomics, giving rise to ever increasing demands for developing methods to efficiently assess functional sites. Sequence and structure conservations have traditionally been the major criteria adopted in various algorithms to identify functional sites. Here, we focus on the distributions of the 203 different types of <it>3</it>-grams (or triplets of sequentially contiguous amino acid) in the entire space of sequences accumulated to date in the UniProt database, and focus in particular on the rare <it>3</it>-grams distinguished by their high entropy-based information content. Results Comparison of the UniProt distributions with those observed near/at the active sites on a non-redundant dataset of 59 enzyme/ligand complexes shows that the active sites preferentially recruit <it>3</it>-grams distinguished by their low frequency in the UniProt. Three cases, Src kinase, hemoglobin, and tyrosyl-tRNA synthetase, are discussed in details to illustrate the biological significance of the results. Conclusion The results suggest that recruitment of rare <it>3</it>-grams may be an efficient mechanism for increasing specificity at functional sites. Rareness/scarcity emerges as a feature that may assist in identifying key sites for proteins function, providing information complementary to that derived from sequence alignments. In addition it provides us (for the first time) with a means of identifying potentially functional sites from sequence information alone, when sequence conservation properties are not available.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Farm characteristics and management routines related to cow longevity: a survey among Swedish dairy farmers

Author: A Ruston
A Vries De
AB Kudahl
AC Boulton
Anki Roth
C Svensson
DB Rubin
E Strandberg
F Beaudeau
FR Allaire
H Bergeå
Ian Dohoo
IR Dohoo
J Ettema
J Fetrow
J Kaler
J Textor
JE Duval
JR Knapp
JS Brickell
K Alvåsen
KA Weigel
Karin Alvåsen
M Vries de
MDP Schneider
MM Kelleher
PJ Pinedo
PJ Rajala-Schultz
PJ Rajala-Schultz
PJ Rajala-Schultz
PM VanRaden
RAJ Nicholas
SC Archer
T Ahlman
T Chamberlain
T Pritchard
Ulf Emanuelson
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Recommended from our members

InterCarb: A Community Effort to Improve Interlaboratory Standardization of the Carbonate Clumped Isotope Thermometer Using Carbonate Standards.

Author: Affek HP
Anderson N
Bajnai D
Barkan E
Bergmann KD
Bernasconi SM
Beverly E
Blamart D
Bonifacie M
Burgener L
Calmels D
Chaduteau C
Clog M
Davidheiser-Kroll B
Davies A
Daëron M
Dux F
Eiler J
Elliott B
Fetrow AC
Fiebig J
Goldberg S
Hermoso M
Huntington KW
Hyland E
Ingalls M
Jaggi M
John CM
Jost AB
Katz S
Kelson J
Kluge T
Kocken IJ
Laskar A
Leutert TJ
Liang D
Lucarelli J
Mackey TJ
Mangenot X
Meckler AN
Meinicke N
Modestou SE
Murray S
Müller IA
Neary A
Packard N
Passey BH
Pelletier E
Petersen S
Piasecki A
Schauer A
Snell KE
Swart PK
Tripati A
Upadhyay D
Vennemann T
Winkelstern I
Yarian D
Yoshida N
Zhang N
Ziegler M
Publication venue: eScholarship, University of California
Publication date: 01/05/2021
Field of study

Increased use and improved methodology of carbonate clumped isotope thermometry has greatly enhanced our ability to interrogate a suite of Earth-system processes. However, interlaboratory discrepancies in quantifying carbonate clumped isotope (Δ47) measurements persist, and their specific sources remain unclear. To address interlaboratory differences, we first provide consensus values from the clumped isotope community for four carbonate standards relative to heated and equilibrated gases with 1,819 individual analyses from 10 laboratories. Then we analyzed the four carbonate standards along with three additional standards, spanning a broad range of δ47 and Δ47 values, for a total of 5,329 analyses on 25 individual mass spectrometers from 22 different laboratories. Treating three of the materials as known standards and the other four as unknowns, we find that the use of carbonate reference materials is a robust method for standardization that yields interlaboratory discrepancies entirely consistent with intralaboratory analytical uncertainties. Carbonate reference materials, along with measurement and data processing practices described herein, provide the carbonate clumped isotope community with a robust approach to achieve interlaboratory agreement as we continue to use and improve this powerful geochemical tool. We propose that carbonate clumped isotope data normalized to the carbonate reference materials described in this publication should be reported as Δ47 (I-CDES) values for Intercarb-Carbon Dioxide Equilibrium Scale

eScholarship - University of California