Search CORE

233 research outputs found

A fast indexing approach for protein structure comparison

Author: Arun S Konagurthu
James Bailey
Kotagiri Ramamohanarao
Lei Zhang
Publication venue: Springer Nature
Publication date: 01/01/2010
Field of study

BACKGROUND: Protein structure comparison is a fundamental task in structural biology. While the number of known protein structures has grown rapidly over the last decade, searching a large database of protein structures is still relatively slow using existing methods. There is a need for new techniques which can rapidly compare protein structures, whilst maintaining high matching accuracy. RESULTS: We have developed IR Tableau, a fast protein comparison algorithm, which leverages the tableau representation to compare protein tertiary structures. IR tableau compares tableaux using information retrieval style feature indexing techniques. Experimental analysis on the ASTRAL SCOP protein structural domain database demonstrates that IR Tableau achieves two orders of magnitude speedup over the search times of existing methods, while producing search results of comparable accuracy. CONCLUSION: We show that it is possible to obtain very significant speedups for the protein structure comparison problem, by employing an information retrieval style approach for indexing proteins. The comparison accuracy achieved is also strong, thus opening the way for large scale processing of very large protein structure databases

Springer - Publisher Connector

PubMed Central

University of Melbourne Institutional Repository

A fast indexing approach for protein structure comparison

Author: A Lesk
A Stivala
A Tversky
AG Murzin
AM Lesk
AP Kamat
Arun S Konagurthu
AS Konagurthu
AS Konagurthu
CA Orengo
E Krissinel
ES Shih
ES Shih
ESC Shih
FM Richards
HM Berman
I Michalopoulos
J Shapiro
James Bailey
JF Gibrat
Kotagiri Ramamohanarao
L Holm
Lei Zhang
M Carpentier
O Carugo
P Jaccard
S Kirillova
SE Brenner
SF Altschul
T Madej
W Lo
W Lo
W Lo
WL Delano
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

ProDis-ContSHC: learning protein dissimilarity measures and hierarchical context coherently for protein-protein comparison in protein database retrieval

Author: Gao Xin
Li Yongping
Wang Jingyan
Wang Quanquan
Publication venue: BioMed Central
Publication date: 01/01/2012
Field of study

Abstract Background The need to retrieve or classify protein molecules using structure or sequence-based similarity measures underlies a wide range of biomedical applications. Traditional protein search methods rely on a pairwise dissimilarity/similarity measure for comparing a pair of proteins. This kind of pairwise measures suffer from the limitation of neglecting the distribution of other proteins and thus cannot satisfy the need for high accuracy of the retrieval systems. Recent work in the machine learning community has shown that exploiting the global structure of the database and learning the contextual dissimilarity/similarity measures can improve the retrieval performance significantly. However, most existing contextual dissimilarity/similarity learning algorithms work in an unsupervised manner, which does not utilize the information of the known class labels of proteins in the database. Results In this paper, we propose a novel protein-protein dissimilarity learning algorithm, ProDis-ContSHC. ProDis-ContSHC regularizes an existing dissimilarity measure <it>dij </it>by considering the contextual information of the proteins. The context of a protein is defined by its neighboring proteins. The basic idea is, for a pair of proteins (<it>i</it>, <it>j</it>), if their context <inline-formula><m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="1471-2105-13-S7-S2-i1"><m:mi mathvariant="script">N</m:mi><m:mrow><m:mo class="MathClass-open">(</m:mo><m:mrow><m:mi>i</m:mi></m:mrow><m:mo class="MathClass-close">)</m:mo></m:mrow></m:math></inline-formula> and <inline-formula><m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="1471-2105-13-S7-S2-i2"><m:mi mathvariant="script">N</m:mi><m:mrow><m:mo class="MathClass-open">(</m:mo><m:mrow><m:mi>j</m:mi></m:mrow><m:mo class="MathClass-close">)</m:mo></m:mrow></m:math></inline-formula> is similar to each other, the two proteins should also have a high similarity. We implement this idea by regularizing <it>dij </it>by a factor learned from the context <inline-formula><m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="1471-2105-13-S7-S2-i3"><m:mi mathvariant="script">N</m:mi><m:mrow><m:mo class="MathClass-open">(</m:mo><m:mrow><m:mi>i</m:mi></m:mrow><m:mo class="MathClass-close">)</m:mo></m:mrow></m:math></inline-formula> and <inline-formula><m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="1471-2105-13-S7-S2-i4"><m:mi mathvariant="script">N</m:mi><m:mrow><m:mo class="MathClass-open">(</m:mo><m:mrow><m:mi>j</m:mi></m:mrow><m:mo class="MathClass-close">)</m:mo></m:mrow></m:math></inline-formula>. Moreover, we divide the context to hierarchial sub-context and get the contextual dissimilarity vector for each protein pair. Using the class label information of the proteins, we select the relevant (a pair of proteins that has the same class labels) and irrelevant (with different labels) protein pairs, and train an SVM model to distinguish between their contextual dissimilarity vectors. The SVM model is further used to learn a supervised regularizing factor. Finally, with the new Supervised learned Dissimilarity measure, we update the Protein Hierarchial Context Coherently in an iterative algorithm--ProDis-ContSHC. We test the performance of ProDis-ContSHC on two benchmark sets, i.e., the ASTRAL 1.73 database and the FSSP/DALI database. Experimental results demonstrate that plugging our supervised contextual dissimilarity measures into the retrieval systems significantly outperforms the context-free dissimilarity/similarity measures and other unsupervised contextual dissimilarity measures that do not use the class label information. Conclusions Using the contextual proteins with their class labels in the database, we can improve the accuracy of the pairwise dissimilarity/similarity measures dramatically for the protein retrieval tasks. In this work, for the first time, we propose the idea of supervised contextual dissimilarity learning, resulting in the ProDis-ContSHC algorithm. Among different contextual dissimilarity learning approaches that can be used to compare a pair of proteins, ProDis-ContSHC provides the highest accuracy. Finally, ProDis-ContSHC compares favorably with other methods reported in the recent literature.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Efficient and automated large-scale detection of structural relationships in proteins with a flexible aligner

Author: Damien P. Devos
Felipe Rodriguez-Valenzuela
Fernando I. Gutiérrez
Francisco Melo
Ignacio L. Ibarra
Publication venue: Springer Nature
Publication date: 01/01/2016
Field of study

Springer - Publisher Connector

Tableau-based protein substructure search using quadratic programming

Author: A Abyzov
A Caprara
A Caprara
A Guerler
A Harrison
AG Murzin
Alex Stivala
AM Lesk
Anthony Wirth
AP Kamat
AP Singh
AS Konagurthu
AS Konagurthu
B Kolbeck
B Thiruv
BK Koo
D Fischer
D Frishman
D Gilbert
DA Pelta
E Anderson
E Krissinel
GM Torrance
HK Ho
HM Berman
I Majumdar
J Jung
J Shapiro
JA Casbon
JA Hanley
JF Gibrat
JJ Dongarra
L Holm
ML Sierk
O Carugo
Peter J Stuckey
PR Elliott
S Kirillova
S Shi
SB Needleman
SS Krishna
T Hamelryck
T Madej
T Sing
TA Davis
TA Davis
TA Davis
TA Davis
V Sam
W Kabsch
W Xie
Y Ye
Y Ye
Y Ye
Z Gáspári
Z Li
Publication venue: BioMed Central
Publication date: 01/05/2009
Field of study

Abstract Background Searching for proteins that contain similar substructures is an important task in structural biology. The exact solution of most formulations of this problem, including a recently published method based on tableaux, is too slow for practical use in scanning a large database. Results We developed an improved method for detecting substructural similarities in proteins using tableaux. Tableaux are compared efficiently by solving the quadratic program (QP) corresponding to the quadratic integer program (QIP) formulation of the extraction of maximally-similar tableaux. We compare the accuracy of the method in classifying protein folds with some existing techniques. Conclusion We find that including constraints based on the separation of secondary structure elements increases the accuracy of protein structure search using maximally-similar subtableau extraction, to a level where it has comparable or superior accuracy to existing techniques. We demonstrate that our implementation is able to search a structural database in a matter of hours on a standard PC.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

University of Melbourne Institutional Repository

Fast and accurate protein substructure searching with simulated annealing and GPUs

Abstract Background Searching a database of protein structures for matches to a query structure, or occurrences of a structural motif, is an important task in structural biology and bioinformatics. While there are many existing methods for structural similarity searching, faster and more accurate approaches are still required, and few current methods are capable of substructure (motif) searching. Results We developed an improved heuristic for tableau-based protein structure and substructure searching using simulated annealing, that is as fast or faster and comparable in accuracy, with some widely used existing methods. Furthermore, we created a parallel implementation on a modern graphics processing unit (GPU). Conclusions The GPU implementation achieves up to 34 times speedup over the CPU implementation of tableau-based structure search with simulated annealing, making it one of the fastest available methods. To the best of our knowledge, this is the first application of a GPU to the protein structural search problem.</p

CiteSeerX

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

University of Melbourne Institutional Repository

Reduced representation of protein structure: implications on efficiency and scope of detection of structural similarity

Author: Lee Hwee Kuan
Mihalek Ivana
Zhang Zong Hong
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background Computational comparison of two protein structures is the starting point of many methods that build on existing knowledge, such as structure modeling (including modeling of protein complexes and conformational changes), molecular replacement, or annotation by structural similarity. In a commonly used strategy, significant effort is invested in matching two sets of atoms. In a complementary approach, a global descriptor is assigned to the overall structure, thus losing track of the substructures within. Results Using a small set of geometric features, we define a reduced representation of protein structure, together with an optimizing function for matching two representations, to provide a pre-filtering stage in a database search. We show that, in a straightforward implementation, the representation performs well in terms of resolution in the space of protein structures, and its ability to make new predictions. Conclusions Perhaps unexpectedly, a substantial discriminating power already exists at the level of main features of protein structure, such as directions of secondary structural elements, possibly constrained by their sequential order. This can be used toward efficient comparison of protein (sub)structures, allowing for various degrees of conformational flexibility within the compared pair, which in turn can be used for modeling by homology of protein structure and dynamics.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

ScholarBank@NUS

University of Queensland eSpace

deconSTRUCT: general purpose protein database search on the substructure level

Author: Bharatham Kavitha
Mihalek Ivana
Sherman Westley A.
Zhang Zong Hong
Publication venue: Oxford University Press
Publication date: 01/07/2010
Field of study

deconSTRUCT webserver offers an interface to a protein database search engine, usable for a general purpose detection of similar protein (sub)structures. Initially, it deconstructs the query structure into its secondary structure elements (SSEs) and reassembles the match to the target by requiring a (tunable) degree of similarity in the direction and sequential order of SSEs. Hierarchical organization and judicious use of the information about protein structure enables deconSTRUCT to achieve the sensitivity and specificity of the established search engines at orders of magnitude increased speed, without tying up irretrievably the substructure information in the form of a hash. In a post-processing step, a match on the level of the backbone atoms is constructed. The results presented to the user consist of the list of the matched SSEs, the transformation matrix for rigid superposition of the structures and several ways of visualization, both downloadable and implemented as a web-browser plug-in. The server is available at http://epsf.bmad.bii.a-star.edu.sg/struct_server.html

PubMed Central

University of Queensland eSpace

Minimum message length inference of secondary structure from protein coordinate data

Author: A. M. Lesk
A. S. Konagurthu
Colloc'h
Cuff
Dupuis
Fodje
Frishman
Kabsch
King
L. Allison
Lesk
Levitt
Majumdar
Martin
Pauling
Richards
Richardson
Sklenar
Srinivasan
Taylor
Zhang
Publication venue: Oxford University Press
Publication date: 01/01/2012
Field of study

Motivation: Secondary structure underpins the folding pattern and architecture of most proteins. Accurate assignment of the secondary structure elements is therefore an important problem. Although many approximate solutions of the secondary structure assignment problem exist, the statement of the problem has resisted a consistent and mathematically rigorous definition. A variety of comparative studies have highlighted major disagreements in the way the available methods define and assign secondary structure to coordinate data

Crossref

PubMed Central

Monash University Research Portal

Exploration des structures secondaires de l’ARN

Author: Glouzon Jean-Pierre Séhi
Publication venue: 'Universite de Sherbrooke'
Publication date: 01/01/2017
Field of study

À l’ère du numérique, valoriser les données en leur donnant un sens est un enjeu capital pour supporter la prise de décision stratégique et cela dans divers domaines, notamment dans le domaine du marketing numérique ou de la santé, ou encore, dans notre contexte, pour une meilleure compréhension de la biologie des structures des acides nucléiques. L’un des défis majeurs de la biologie structurale concerne l’étude des structures des acides ribonucléiques (ARN), les effets de ces structures et de leurs altérations sur leurs fonctions. Contribuer à cet enjeu important est l’objectif de cette thèse. Celle-ci s’inscrit principalement dans le développement de méthodes et d’outils pour l’exploration efficace des structures secondaires d’ARN. En effet, explorer les structures secondaires d’ARN contribue à lever le voile sur leur fonction et permet de mieux cerner leur implication spécifique au sein des processus cellulaires. Dans ce contexte nous avons développé le modèle des super-n-motifs qui contribue à une meilleure représentation de la complexité structurale des ARN et offre un moyen efficace d’évaluer la similarité des structures d’ARN en tenant compte de cette complexité. Le modèle des super-n-motifs facilite l’étude des ARN dont le rôle est inconnu. Il permet de poser des hypothèses sur la ou les fonctions des ARN lorsque ceux-ci partagent une similarité structurale sans équivoque. Nous avons aussi développé la plateforme structurexplor pour faciliter l’exploration des structures secondaires, c’est-à-dire de permettre, en quelques clics, de caractériser les populations de structures d’ARN en, par exemple, faisant ressortir les groupes d’ARN partageant des structures similaires. La mise en œuvre du modèle des super-n-motifs et de la plateforme structurexplor a contribué à une meilleure compréhension de la phylogénie structurale des viroïdes qui sont des agents pathogènes à ARN attaquant les plantes, phylogénie jusqu’alors basée que sur leurs séquences

Savoirs UdeS