Search CORE

23 research outputs found

Mining protein loops using a structural alphabet and statistical exceptionality

Author: A Dembo
A Efimov
A Golovin
A Sacan
A Via
AC Camproux
AC Camproux
AC Camproux
Anne-Claude Camproux
AR Panchenko
AR Panchenko
B Oliva
BJ Polacco
BL Sibanda
BL Sibanda
BL Sibanda
BW Matthews
C Kiss
CG Hunter
CM Venkatachalam
D Leader
D Stuart
DF Burke
E Rocha
EG Hutchinson
EJ Milner-White
EJ Milner-White
F den Hollander
G Ausiello
G Ausiello
G Nuel
G Nuel
G Nuel
G Pugalenthi
GD Rose
Gregory Nuel
J Espadaler
J Martin
J Martin
J van Helden
J Wojcik
JF Leszczynski
JM Kwasigroch
JS Fetrow
JS Richardson
Juliette Martin
JW Sammon
JW Torrance
KC Chou
L Regad
LE Donate
Leslie Regad
LN Johnson
LR Rabiner
LS Bernstein
M Hollander
M Mönnigmann
M Saraste
MY Leung
N Colloc'h
N Fernandez-Fuentes
N Fernandez-Fuentes
O Sander
P Fuchs
PA Rice
PN Lewis
R Kolodny
S Karlin
S Kim
S Kullback
S Sourice
SA Benner
SA Benner
SD Rufino
V Pavone
W Kabsch
W Li
W Li
WL DeLano
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background Protein loops encompass 50% of protein residues in available three-dimensional structures. These regions are often involved in protein functions, e.g. binding site, catalytic pocket... However, the description of protein loops with conventional tools is an uneasy task. Regular secondary structures, helices and strands, have been widely studied whereas loops, because they are highly variable in terms of sequence and structure, are difficult to analyze. Due to data sparsity, long loops have rarely been systematically studied. Results We developed a simple and accurate method that allows the description and analysis of the structures of short and long loops using structural motifs without restriction on loop length. This method is based on the structural alphabet HMM-SA. HMM-SA allows the simplification of a three-dimensional protein structure into a one-dimensional string of states, where each state is a four-residue prototype fragment, called structural letter. The difficult task of the structural grouping of huge data sets is thus easily accomplished by handling structural letter strings as in conventional protein sequence analysis. We systematically extracted all seven-residue fragments in a bank of 93000 protein loops and grouped them according to the structural-letter sequence, named structural word. This approach permits a systematic analysis of loops of all sizes since we consider the structural motifs of seven residues rather than complete loops. We focused the analysis on highly recurrent words of loops (observed more than 30 times). Our study reveals that 73% of loop-lengths are covered by only 3310 highly recurrent structural words out of 28274 observed words). These structural words have low structural variability (mean RMSd of 0.85 Å). As expected, half of these motifs display a flanking-region preference but interestingly, two thirds are shared by short (less than 12 residues) and long loops. Moreover, half of recurrent motifs exhibit a significant level of amino-acid conservation with at least four significant positions and 87% of long loops contain at least one such word. We complement our analysis with the detection of statistically over-represented patterns of structural letters as in conventional DNA sequence analysis. About 30% (930) of structural words are over-represented, and cover about 40% of loop lengths. Interestingly, these words exhibit lower structural variability and higher sequential specificity, suggesting structural or functional constraints. Conclusions We developed a method to systematically decompose and study protein loops using recurrent structural motifs. This method is based on the structural alphabet HMM-SA and not on structural alignment and geometrical parameters. We extracted meaningful structural motifs that are found in both short and long loops. To our knowledge, it is the first time that pattern mining helps to increase the signal-to-noise ratio in protein loops. This finding helps to better describe protein loops and might permit to decrease the complexity of long-loop analysis. Detailed results are available at <url>http://www.mti.univ-paris-diderot.fr/publication/supplementary/2009/ACCLoop/</url>.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

Exact distribution of a pattern in a set of random sequences generated by a Markov source: applications to biological data

Author: A Bairoch
A Denise
AC Camproux
Anne-Claude Camproux
AP Godbole
B Prum
C Gautier
DA Benson
DL Antzoulakos
E Rocha
G Churchill
G Nuel
G Nuel
G Nuel
G Nuel
G Nuel
G Nuelg
G Reinert
G Reinert
GD Stormo
Gregory Nuel
J Becq
J Do
J Fu
J Kleffe
J Martin
J Van Helden
JAD Aston
JC Fu
JC Fu
JE Hopcroft
JM Claverie
Juliette Martin
JW Fickett
K Liolios
L Regad
Leslie Regad
M Crochemore
M Reignier
M Thomas-Chollier
MC Frith
ME Lladser
MX Geske
MY Leung
N Hulo
P Nicodème
P Nicolas
P Pevzner
P Ribeca
R Cowan
S Karlin
S Sourice
T Erhardsson
V Boeva
V Boeva
V Stefanov
V Stefanov
VT Stefanov
YM Chang
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background In bioinformatics it is common to search for a pattern of interest in a potentially large set of rather short sequences (upstream gene regions, proteins, exons, etc.). Although many methodological approaches allow practitioners to compute the distribution of a pattern count in a random sequence generated by a Markov source, no specific developments have taken into account the counting of occurrences in a set of independent sequences. We aim to address this problem by deriving efficient approaches and algorithms to perform these computations both for low and high complexity patterns in the framework of homogeneous or heterogeneous Markov models. Results The latest advances in the field allowed us to use a technique of optimal Markov chain embedding based on deterministic finite automata to introduce three innovative algorithms. Algorithm 1 is the only one able to deal with heterogeneous models. It also permits to avoid any product of convolution of the pattern distribution in individual sequences. When working with homogeneous models, Algorithm 2 yields a dramatic reduction in the complexity by taking advantage of previous computations to obtain moment generating functions efficiently. In the particular case of low or moderate complexity patterns, Algorithm 3 exploits power computation and binary decomposition to further reduce the time complexity to a logarithmic scale. All these algorithms and their relative interest in comparison with existing ones were then tested and discussed on a toy-example and three biological data sets: structural patterns in protein loop structures, PROSITE signatures in a bacterial proteome, and transcription factors in upstream gene regions. On these data sets, we also compared our exact approaches to the tempting approximation that consists in concatenating the sequences in the data set into a single sequence. Conclusions Our algorithms prove to be effective and able to handle real data sets with multiple sequences, as well as biological patterns of interest, even when the latter display a high complexity (PROSITE signatures for example). In addition, these exact algorithms allow us to avoid the edge effect observed under the single sequence approximation, which leads to erroneous results, especially when the marginal distribution of the model displays a slow convergence toward the stationary distribution. We end up with a discussion on our method and on its potential improvements.</p

HAL Evry

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

Incorporation of Local Structural Preference Potential Improves Fold Recognition

Author: A Poleksic
A Sali
AC Camproux
AC Camproux
AE Torda
Aiping Wu
AR Panchenko
Bostjan Kobe
CG Hunter
CG Hunter
CS Pettitt
DT Jones
DT Jones
DW Rice
E Lindahl
G Schenk
G Wang
H Zhou
H Zhou
H Zhou
HY Zhou
J Cheng
J Lundstrom
J Moult
J Peng
J Shi
J Soding
J Xu
J Xu
JL Sussman
JS Yang
K Ginalski
K Ginalski
K Ginalski
K Kanou
K Karplus
KT Simons
L Jaroszewski
Liqing Tian
LJ McGuffin
MA Marti-Renom
ML Tress
N Fernandez-Fuentes
N Fernandez-Fuentes
O Sander
O Zimmermann
P Lackner
P Rotkiewicz
PJ Silva
R Das
R Karchin
R Sadreyev
RM Bennett-Lovsey
RX Yan
S Liu
S Wu
SB Needleman
SF Altschul
SF Altschul
SR Eddy
ST Wu
Taijiao Jiang
TF Smith
TP Li
W Boomsma
W Kabsch
W Zhang
Xiaoxi Dong
Y Hou
Y Zhang
Y Zhang
Y Zhang
Yang Cao
YL An
Yun Hu
Z Wang
Publication venue: Public Library of Science
Publication date: 01/01/2010
Field of study

Fold recognition, or threading, is a popular protein structure modeling approach that uses known structure templates to build structures for those of unknown. The key to the success of fold recognition methods lies in the proper integration of sequence, physiochemical and structural information. Here we introduce another type of information, local structural preference potentials of 3-residue and 9-residue fragments, for fold recognition. By combining the two local structural preference potentials with the widely used sequence profile, secondary structure information and hydrophobic score, we have developed a new threading method called FR-t5 (fold recognition by use of 5 terms). In benchmark testings, we have found the consideration of local structural preference potentials in FR-t5 not only greatly enhances the alignment accuracy and recognition sensitivity, but also significantly improves the quality of prediction models

CiteSeerX

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Protein structure search and local structure characterization

Author: A Andreeva
AC Camproux
AG de Brevern
AG de Brevern
AG de Brevern
AR Ortiz
B Offmann
B Rost
C Benros
C Bystroff
CA Orengo
D Baker
E Appella
F Birzele
F Guyon
G Pollastri
HM Berman
IN Shindyalo
J Garnier
J Schuchhardt
J Vesanto
JA Hartigan
JM Yang
JS Fetrow
L Holm
M Carpentier
M Dudev
M Tyagi
M Tyagi
M Tyagi
NJ Mulder
O Sander
R Unger
S Henikoff
Shih-Yen Ku
T Madej
TL Bailey
TM Mitchell
TN Petersen
U Hobohm
VS Gowri
W Humphrey
WM Zheng
WR Pearson
Y Liu
Y Ye
Yuh-Jyh Hu
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background Structural similarities among proteins can provide valuable insight into their functional mechanisms and relationships. As the number of available three-dimensional (3D) protein structures increases, a greater variety of studies can be conducted with increasing efficiency, among which is the design of protein structural alphabets. Structural alphabets allow us to characterize local structures of proteins and describe the global folding structure of a protein using a one-dimensional (1D) sequence. Thus, 1D sequences can be used to identify structural similarities among proteins using standard sequence alignment tools such as BLAST or FASTA. Results We used self-organizing maps in combination with a minimum spanning tree algorithm to determine the optimum size of a structural alphabet and applied the k-means algorithm to group protein fragnts into clusters. The centroids of these clusters defined the structural alphabet. We also developed a flexible matrix training system to build a substitution matrix (TRISUM-169) for our alphabet. Based on FASTA and using TRISUM-169 as the substitution matrix, we developed the SA-FAST alignment tool. We compared the performance of SA-FAST with that of various search tools in database-scale search tasks and found that SA-FAST was highly competitive in all tests conducted. Further, we evaluated the performance of our structural alphabet in recognizing specific structural domains of EGF and EGF-like proteins. Our method successfully recovered more EGF sub-domains using our structural alphabet than when using other structural alphabets. SA-FAST can be found at <url>http://140.113.166.178/safast/</url>. Conclusion The goal of this project was two-fold. First, we wanted to introduce a modular design pipeline to those who have been working with structural alphabets. Secondly, we wanted to open the door to researchers who have done substantial work in biological sequences but have yet to enter the field of protein structure research. Our experiments showed that by transforming the structural representations from 3D to 1D, several 1D-based tools can be applied to structural analysis, including similarity searches and structural motif finding.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Chicken genome analysis reveals novel genes encoding biotin-binding proteins related to avidin family

Author: A Pagano
A Sali
AC Camproux
AE Kel
D Sanchez
DR Flower
DR Flower
GJ Barton
H Nielsen
HB White III
HB White III
HM Berman
HR Nordlund
HR Nordlund
HW Meslar
JV Lehtonen
L Bush
L Bush
LW Hillier
M Wilchek
MJ Wallén
MK Ahlroth
MK Ahlroth
MK Ahlroth
MK Ahlroth
ML Gope
MS Johnson
MS Johnson
N Subramanian
NM Green
O Livnah
O Livnah
OH Laitinen
OH Laitinen
OH Laitinen
OH Laitinen
OH Laitinen
P Tuohimaa
PB Seshagiri
PC Weber
PE Boardman
RA Keinänen
S Freitag
S Kumar
S Kumar
SC Gill
SW Cowan
T Sano
VB Bajic
VP Hytönen
WE Stumph
WL DeLano
Publication venue: BioMed Central
Publication date: 01/01/2005
Field of study

BACKGROUND: A chicken egg contains several biotin-binding proteins (BBPs), whose complete DNA and amino acid sequences are not known. In order to identify and characterise these genes and proteins we studied chicken cDNAs and genes available in the NCBI database and chicken genome database using the reported N-terminal amino acid sequences of chicken egg-yolk BBPs as search strings. RESULTS: Two separate hits showing significant homology for these N-terminal sequences were discovered. For one of these hits, the chromosomal location in the immediate proximity of the avidin gene family was found. Both of these hits encode proteins having high sequence similarity with avidin suggesting that chicken BBPs are paralogous to avidin family. In particular, almost all residues corresponding to biotin binding in avidin are conserved in these putative BBP proteins. One of the found DNA sequences, however, seems to encode a carboxy-terminal extension not present in avidin. CONCLUSION: We describe here the predicted properties of the putative BBP genes and proteins. Our present observations link BBP genes together with avidin gene family and shed more light on the genetic arrangement and variability of this family. In addition, comparative modelling revealed the potential structural elements important for the functional and structural properties of the putative BBP proteins

Repository for Publications and Research Data

Jyväskylä University Digital Archive

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

TamPub Julkaisuarkisto - TamPub Institutional Repository

Trepo - Institutional Repository of Tampere University

Archivio istituzionale della ricerca - Università di Padova

Improving model construction of profile HMMs for remote homology detection through structural alignment

Author: A Andreeva
A Bateman
A Krogh
A Krogh
AC Camproux
Alberto MR Dávila
B Brejova
B Knudsen
B Qian
C Bystroff
C Do
C Notredame
D Feng
D Haft
F Altschul
F Goyon
Gerson Zaverucha
H Mamitsuka
I Letunic
J Espadaler
J Gough
J Park
J Shi
J Söding
J Thompson
JD Thompson
JR Beck
Juliana S Bernardes
K Bae
K Karplus
K Karplus
K Katoh
K Lin
K Mizuguchi
K Sjolander
L Holm
L Rabiner
M Gribskov
M Helen
M Madera
M Mendel
M Wistrand
M Wistrand
O Sullivan
P Bourne
P Nuin
R Edgar
R Hughey
R Hughey
R Karchin
S Altschul
S Eddy
S Jones
T Attwood
T Mitchell
V Alexandrov
Vítor S Costa
W Majoros
W Taylor
WR Pearson
Y Hou
Y Hou
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

Abstract Background Remote homology detection is a challenging problem in Bioinformatics. Arguably, profile Hidden Markov Models (pHMMs) are one of the most successful approaches in addressing this important problem. pHMM packages present a relatively small computational cost, and perform particularly well at recognizing remote homologies. This raises the question of whether structural alignments could impact the performance of pHMMs trained from proteins in the <it>Twilight Zone</it>, as structural alignments are often more accurate than sequence alignments at identifying motifs and functional residues. Next, we assess the impact of using structural alignments in pHMM performance. Results We used the SCOP database to perform our experiments. Structural alignments were obtained using the 3DCOFFEE and MAMMOTH-mult tools; sequence alignments were obtained using CLUSTALW, TCOFFEE, MAFFT and PROBCONS. We performed leave-one-family-out cross-validation over super-families. Performance was evaluated through ROC curves and paired two tailed t-test. Conclusion We observed that pHMMs derived from structural alignments performed significantly better than pHMMs derived from sequence alignment in low-identity regions, mainly below 20%. We believe this is because structural alignment tools are better at focusing on the important patterns that are more often conserved through evolution, resulting in higher quality pHMMs. On the other hand, sensitivity of these tools is still quite low for these low-identity regions. Our results suggest a number of possible directions for improvements in this area.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Detection of a Fourth Orbivirus Non-Structural Protein

The genus Orbivirus includes both insect and tick-borne viruses. The orbivirus genome, composed of 10 segments of dsRNA, encodes 7 structural proteins (VP1–VP7) and 3 non-structural proteins (NS1–NS3). An open reading frame (ORF) that spans almost the entire length of genome segment-9 (Seg-9) encodes VP6 (the viral helicase). However, bioinformatic analysis recently identified an overlapping ORF (ORFX) in Seg-9. We show that ORFX encodes a new non-structural protein, identified here as NS4. Western blotting and confocal fluorescence microscopy, using antibodies raised against recombinant NS4 from Bluetongue virus (BTV, which is insect-borne), or Great Island virus (GIV, which is tick-borne), demonstrate that these proteins are synthesised in BTV or GIV infected mammalian cells, respectively. BTV NS4 is also expressed in Culicoides insect cells. NS4 forms aggregates throughout the cytoplasm as well as in the nucleus, consistent with identification of nuclear localisation signals within the NS4 sequence. Bioinformatic analyses indicate that NS4 contains coiled-coils, is related to proteins that bind nucleic acids, or are associated with membranes and shows similarities to nucleolar protein UTP20 (a processome subunit). Recombinant NS4 of GIV protects dsRNA from degradation by endoribonucleases of the RNAse III family, indicating that it interacts with dsRNA. However, BTV NS4, which is only half the putative size of the GIV NS4, did not protect dsRNA from RNAse III cleavage. NS4 of both GIV and BTV protect DNA from degradation by DNAse. NS4 was found to associate with lipid droplets in cells infected with BTV or GIV or transfected with a plasmid expressing NS4

CiteSeerX

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Oxford University Research Archive