Search CORE

Tableau-based protein substructure search using quadratic programming

Author: A Abyzov
A Caprara
A Caprara
A Guerler
A Harrison
AG Murzin
Alex Stivala
AM Lesk
Anthony Wirth
AP Kamat
AP Singh
AS Konagurthu
AS Konagurthu
B Kolbeck
B Thiruv
BK Koo
D Fischer
D Frishman
D Gilbert
DA Pelta
E Anderson
E Krissinel
GM Torrance
HK Ho
HM Berman
I Majumdar
J Jung
J Shapiro
JA Casbon
JA Hanley
JF Gibrat
JJ Dongarra
L Holm
ML Sierk
O Carugo
Peter J Stuckey
PR Elliott
S Kirillova
S Shi
SB Needleman
SS Krishna
T Hamelryck
T Madej
T Sing
TA Davis
TA Davis
TA Davis
TA Davis
V Sam
W Kabsch
W Xie
Y Ye
Y Ye
Y Ye
Z Gáspári
Z Li
Publication venue: BioMed Central
Publication date: 01/05/2009
Field of study

Abstract Background Searching for proteins that contain similar substructures is an important task in structural biology. The exact solution of most formulations of this problem, including a recently published method based on tableaux, is too slow for practical use in scanning a large database. Results We developed an improved method for detecting substructural similarities in proteins using tableaux. Tableaux are compared efficiently by solving the quadratic program (QP) corresponding to the quadratic integer program (QIP) formulation of the extraction of maximally-similar tableaux. We compare the accuracy of the method in classifying protein folds with some existing techniques. Conclusion We find that including constraints based on the separation of secondary structure elements increases the accuracy of protein structure search using maximally-similar subtableau extraction, to a level where it has comparable or superior accuracy to existing techniques. We demonstrate that our implementation is able to search a structural database in a matter of hours on a standard PC.</p

Springer - Publisher Connector

University of Melbourne Institutional Repository

Fast and accurate protein substructure searching with simulated annealing and GPUs

Abstract Background Searching a database of protein structures for matches to a query structure, or occurrences of a structural motif, is an important task in structural biology and bioinformatics. While there are many existing methods for structural similarity searching, faster and more accurate approaches are still required, and few current methods are capable of substructure (motif) searching. Results We developed an improved heuristic for tableau-based protein structure and substructure searching using simulated annealing, that is as fast or faster and comparable in accuracy, with some widely used existing methods. Furthermore, we created a parallel implementation on a modern graphics processing unit (GPU). Conclusions The GPU implementation achieves up to 34 times speedup over the CPU implementation of tableau-based structure search with simulated annealing, making it one of the fastest available methods. To the best of our knowledge, this is the first application of a GPU to the protein structural search problem.</p

CiteSeerX

Springer - Publisher Connector

University of Melbourne Institutional Repository

Recommended from our members

Network modularity and local environment similarity as descriptors of protein structure

Author: Grant William
Publication venue: University of Cambridge
Publication date: 01/12/2019
Field of study

As the number of solved protein structures increases, the opportunities for meta-analysis of this dataset increase too. Here we explore two approaches for analysing protein structure, both starting from the three-dimensional co-ordinates of each atom within the structure, which are then abstracted into a more useful form. The first method transforms the protein into a network in which its amino acids are the nodes, and where the edges are generated using a simple proximity test. By applying the Infomap community detection algorithm, we can fragment the protein into highly intra-connected subregions - these subregions are compact and globular, and can be compared with known structural and functional subunits of the protein (also known as domains). By performing this fragmentation process systematically across a large set of proteins, and checking for structurally conserved fragments, we can search for novel candidate domains. This method for automatically decomposing a protein into compact substructures may also be useful in coarse-graining molecular dynamics, analysing the protein’s topology, in de novo protein design, or in fitting electron density maps derived from single particle electron microscopy. The second method calculates a descriptor for each atom of the protein based on its local environment, known as a Smooth Overlap of Atomic Positions (SOAP) descriptor. Using these descriptors we can perform overall comparisons of the subregions identified above. In addition, by comparing the descriptors of a set of proteins known to share common structural or functional features (such as binding of a particular ligand), we can automatically identify the most highly conserved atoms of the set. These atoms may line ligand binding pockets or correspond to allosteric sites, which could inform drug design

Apollo (Cambridge)

A structure filter for the Eukaryotic Linear Motif Resource

Author: A Salsmann
A Stein
AG Murzin
Allegra Via
AW Fenton
B Brannetti
B Petersen
C Chica
Cathryn M Gould
Christine Gemünd
CJ Sigrist
CM Gould
D Durocher
E Faraggi
E Gasteiger
E Petsalaki
ED Lowe
EK Hui
F Diella
H Dinkel
H Naderi-Manesh
HM Berman
J Kadlec
K Machida
K Roovers
M Fuxreiter
M Sheng
Manuela Helmer-Citterich
MB Yaffe
MC Rodriguez
MJ Macias
NE Davey
O Hantschel
P Puntervoll
R Apweiler
R Linding
RJ Edwards
S Balla
S Miller
SE Brenner
SF Altschul
SS Shapiro
SW Cowan-Jacob
T Pawson
Team RDC
TJ Gibson
Toby J Gibson
V Neduva
W Kabsch
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background Many proteins are highly modular, being assembled from globular domains and segments of natively disordered polypeptides. Linear motifs, short sequence modules functioning independently of protein tertiary structure, are most abundant in natively disordered polypeptides but are also found in accessible parts of globular domains, such as exposed loops. The prediction of novel occurrences of known linear motifs attempts the difficult task of distinguishing functional matches from stochastically occurring non-functional matches. Although functionality can only be confirmed experimentally, confidence in a putative motif is increased if a motif exhibits attributes associated with functional instances such as occurrence in the correct taxonomic range, cellular compartment, conservation in homologues and accessibility to interacting partners. Several tools now use these attributes to classify putative motifs based on confidence of functionality. Results Current methods assessing motif accessibility do not consider much of the information available, either predicting accessibility from primary sequence or regarding any motif occurring in a globular region as low confidence. We present a method considering accessibility and secondary structural context derived from experimentally solved protein structures to rectify this situation. Putatively functional motif occurrences are mapped onto a representative domain, given that a high quality reference SCOP domain structure is available for the protein itself or a close relative. Candidate motifs can then be scored for solvent-accessibility and secondary structure context. The scores are calibrated on a benchmark set of experimentally verified motif instances compared with a set of random matches. A combined score yields 3-fold enrichment for functional motifs assigned to high confidence classifications and 2.5-fold enrichment for random motifs assigned to low confidence classifications. The structure filter is implemented as a pipeline with both a graphical interface via the ELM resource <url>http://elm.eu.org/</url> and through a Web Service protocol. Conclusion New occurrences of known linear motifs require experimental validation as the bioinformatics tools currently have limited reliability. The ELM structure filter will aid users assessing candidate motifs presenting in globular structural regions. Most importantly, it will help users to decide whether to expend their valuable time and resources on experimental testing of interesting motif candidates.</p

Archivio della ricerca- Università di Roma La Sapienza

ART

canSAR: an integrated cancer public translational research and drug discovery resource

Author: Al-Lazikani Bissan
Bulusu Krishna C.
Halling-Brown Mark D.
Patel Mishal
Tym Joe E.
Publication venue: Oxford University Press
Publication date: 01/01/2011
Field of study

canSAR is a fully integrated cancer research and drug discovery resource developed to utilize the growing publicly available biological annotation, chemical screening, RNA interference screening, expression, amplification and 3D structural data. Scientists can, in a single place, rapidly identify biological annotation of a target, its structural characterization, expression levels and protein interaction data, as well as suitable cell lines for experiments, potential tool compounds and similarity to known drug targets. canSAR has, from the outset, been completely use-case driven which has dramatically influenced the design of the back-end and the functionality provided through the interfaces. The Web interface at http://cansar.icr.ac.uk provides flexible, multipoint entry into canSAR. This allows easy access to the multidisciplinary data within, including target and compound synopses, bioactivity views and expert tools for chemogenomic, expression and protein interaction network data

CiteSeerX

Institute of Cancer Research Repository

HMMerThread: Detecting Remote, Functional Conserved Domains in Entire Genomes by Combining Relaxed Sequence-Database Searches with Fold Recognition

Author: A Bauer
A Gattiker
A Hildebrand
A Lupas
A Marchler-Bauer
AG Murzin
AJ McNairn
AW Tai
B Habermann
BD Rowland
Bianca Hermine Habermann
BR Sevetson
C Chothia
C Hertz-Fowler
C Mooney
C Ostermeier
CA Kim
CA Orengo
CE Lawrence
Charles Richard Bradshaw
CR Bradshaw
CT Eggers
D Gebauer
D Ivanov
D Kim
D Wilson
DT Jones
E Quevillon
EL Tudor
EM Ross
EM Zdobnov
EW Sayers
F Verni
G Apic
H Takatsu
I Letunic
J Amberger
J Gough
J Gough
J Moult
J Schultz
J Skolnick
J Skolnick
J Soding
JC Wootton
JD Thompson
JM Cherry
JM Peters
JW Wang
K Hofmann
K Karplus
K Katoh
K Mochizuki
K Nasmyth
K Suzuki-Utsunomiya
KD Pruitt
L Aravind
L Stein
LA Kelley
LA Kelley
LJ McGuffin
LL Burns-Hamuro
M Ashburner
M Fukuda
M Oyen
M Remm
Matthias Stefan Mueller
MJ Sippl
MS Nielsen
MW Russo
NJ Mulder
O Lohi
O Lohi
Peter Csermely
R Gandhi
R Puertollano
RA Goldstein
RB Ray
RB Ray
RD Finn
RD Finn
RD Finn
Robert Henschel
S Hadano
S Kammerer
S Kueng
S Lee
S Li
S Tweedie
S Wu
SE Brenner
SF Altschul
SR Eddy
T Sutani
TK Chatterjee
TS Prasad
Vineeth Surendranath
VJ Lannoy
WG Tingley
Y Zhang
Y Zhu
Publication venue: Public Library of Science
Publication date: 01/01/2011
Field of study

Conserved domains in proteins are one of the major sources of functional information for experimental design and genome-level annotation. Though search tools for conserved domain databases such as Hidden Markov Models (HMMs) are sensitive in detecting conserved domains in proteins when they share sufficient sequence similarity, they tend to miss more divergent family members, as they lack a reliable statistical framework for the detection of low sequence similarity. We have developed a greatly improved HMMerThread algorithm that can detect remotely conserved domains in highly divergent sequences. HMMerThread combines relaxed conserved domain searches with fold recognition to eliminate false positive, sequence-based identifications. With an accuracy of 90%, our software is able to automatically predict highly divergent members of conserved domain families with an associated 3-dimensional structure. We give additional confidence to our predictions by validation across species. We have run HMMerThread searches on eight proteomes including human and present a rich resource of remotely conserved domains, which adds significantly to the functional annotation of entire proteomes. We find ∼4500 cross-species validated, remotely conserved domain predictions in the human proteome alone. As an example, we find a DNA-binding domain in the C-terminal part of the A-kinase anchor protein 10 (AKAP10), a PKA adaptor that has been implicated in cardiac arrhythmias and premature cardiac death, which upon stress likely translocates from mitochondria to the nucleus/nucleolus. Based on our prediction, we propose that with this HLH-domain, AKAP10 is involved in the transcriptional control of stress response. Further remotely conserved domains we discuss are examples from areas such as sporulation, chromosome segregation and signalling during immune response. The HMMerThread algorithm is able to automatically detect the presence of remotely conserved domains in proteins based on weak sequence similarity. Our predictions open up new avenues for biological and medical studies. Genome-wide HMMerThread domains are available at http://vm1-hmmerthread.age.mpg.de

Public Library of Science (PLOS)