Search CORE

359 research outputs found

Linear-time protein 3-D structure searching with insertions and deletions

Author: ACR Martin
AI Jewett
B Zhu
C Gergely
CH Chionh
D Bu
D Goldman
DG Corneil
DW Eggert
E Krissinel
F Zu-Kang
G Navarro
GH Golub
H Hasegawa
HA Kramers
HM Berman
I Eidhammer
IN Shindyalov
Jesper Jansson
JT Schwartz
KS Arun
Kunihiko Sadakane
L Holm
M Comin
M Shatsky
P Koehl
PG de Gennes
PJ Flory
RH Boyd
RH Lathrop
T Shibuya
T Shibuya
Tetsuo Shibuya
W Kabsch
W Kabsch
WR Taylor
Z Aung
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background Two biomolecular 3-D structures are said to be similar if the RMSD (root mean square deviation) between the two molecules' sequences of 3-D coordinates is less than or equal to some given constant bound. Tools for searching for similar structures in biomolecular 3-D structure databases are becoming increasingly important in the structural biology of the post-genomic era. Results We consider an important, fundamental problem of reporting all substructures in a 3-D structure database of chain molecules (such as proteins) which are similar to a given query 3-D structure, with consideration of indels (<it>i.e.</it>, insertions and deletions). This problem has been believed to be very difficult but its exact computational complexity has not been known. In this paper, we first prove that the problem in unbounded dimensions is NP-hard. We then propose a new algorithm that dramatically improves the average-case time complexity of the problem in 3-D in case the number of indels <it>k </it>is bounded by a constant. Our algorithm solves the above problem for a query of size <it>m </it>and a database of size <it>N </it>in average-case <it>O</it>(<it>N</it>) time, whereas the time complexity of the previously best algorithm was <it>O</it>(<it>Nm</it><it>k</it>+1). Conclusions Our results show that although the problem of searching for similar structures in a database based on the RMSD measure with indels is NP-hard in the case of unbounded dimensions, it can be solved in 3-D by a simple average-case linear time algorithm when the number of indels is bounded by a constant.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Graph theoretic methods for the analysis of structural relationships in biological macromolecules

Author: Altschul
Artymiuk
Artymiuk
Artymiuk
Artymiuk
Artymiuk
Barnard
Baxevanis
Benning
Berman
Bernstein
Brint
Brint
Bron
Bruno
Bryant
Crandell
Dean
Diestel
Doubet
Fan
Feizi
Figueras
Flores
Gardiner
Gati
Good
Gray
Groves
Gruer
Gund
Hagadone
Harrison
Holden
Hutchinson
Jasanoff
Johnson
Kanna
Klausner
Kleywegt
Koch
Kraulis
Lengauer
Lesk
Martin
Martin
McGregor
Messmer
Mitchell
Ollis
Pickering
Ray
Raymond
Read
Salton
Samudrala
Sayle
Simon
Srere
Sussenguth
Tesmer
Tinoco
Trinajstic
Tsukada
Ullmann
van Rijsbergen
Willett
Willett
Willett
Willett
Williams
Wilson
Zhang
Publication venue: 'Wiley'
Publication date: 01/01/2005
Field of study

Subgraph isomorphism and maximum common subgraph isomorphism algorithms from graph theory provide an effective and an efficient way of identifying structural relationships between biological macromolecules. They thus provide a natural complement to the pattern matching algorithms that are used in bioinformatics to identify sequence relationships. Examples are provided of the use of graph theory to analyze proteins for which three-dimensional crystallographic or NMR structures are available, focusing on the use of the Bron-Kerbosch clique detection algorithm to identify common folding motifs and of the Ullmann subgraph isomorphism algorithm to identify patterns of amino acid residues. Our methods are also applicable to other types of biological macromolecule, such as carbohydrate and nucleic acid structures

CiteSeerX

Crossref

White Rose Research Online

Sussex Research Online

TeachOpenCADD: a teaching platform for computer-aided drug design using open source packages and data

Author: Driller Maximilian
Morger Andrea
Sydow Dominique
Volkamer Andrea
Publication venue
Publication date: 01/01/2019
Field of study

Owing to the increase in freely available software and data for cheminformatics and structural bioinformatics, research for computer-aided drug design (CADD) is more and more built on modular, reproducible, and easy-to-share pipelines. While documentation for such tools is available, there are only a few freely accessible examples that teach the underlying concepts focused on CADD, especially addressing users new to the field. Here, we present TeachOpenCADD, a teaching platform developed by students for students, using open source compound and protein data as well as basic and CADD-related Python packages. We provide interactive Jupyter notebooks for central CADD topics, integrating theoretical background and practical code. TeachOpenCADD is freely available on GitHub: https://github.com/volkamerlab/TeachOpenCAD

Institutional Repository of the Freie Universität Berlin

Directory of Open Access Journals

Efficient protein alignment algorithm for protein search

Author: Bin Fu
Bmc Bioinformatics
Zaixin Lu
Zhiyu Zhao
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

CiteSeerX

Crossref

PubMed Central

A Comprehensive System for Identifying Internal Repeat Substructures of Proteins

Author: [[alternative]]許輝煌
Kao Hua-ying
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

[[abstract]]Repetitive substructures within a protein play an important role in understanding protein folding and stability, biological function, and genome evolution. About 25% of all proteins contain repeat structures for eukaryote species and most of them do not have the resolved structural information yet. Therefore, this study aimed to design a comprehensive system for identifying internal repeats either from a protein sequence or structural information. In this study, we have curated a set of internal repeat units as a benchmark dataset for performing both sequence and structural alignment with respect to the query sequence or structure. Except for the traditional BLAST algorithms on amino acid sequence or the optimal structural superposition approaches on structures, a novel method employing the predicted secondary structure element information for internal repeat identification was proposed. Sequences were firstly transformed into Length Encoded Secondary Structure (LESS) profiles and followed by autocorrelation analyses. From the primary experimental results, the developed Internal Repeat Identification System (IRIS) can successfully identify internal repeats from those known protein structures, and the web system is freely available at http://iris.cs.ntou.edu.tw/.[[conferencetype]]國際[[conferencedate]]20100215~20100218[[iscallforpapers]]Y[[conferencelocation]]Krakow, Polan

Crossref

Tamkang University Institutional Repository

New Algorithms for Protein Structure Comparison and Protein Structure Prediction

Author: Lu Zaixin
Publication venue: ScholarWorks @ UTRGV
Publication date: 01/07/2010
Field of study

Proteins show a great variety of 3D conformations, which can be used to infer their evolutionary relationship and to classify them into more general groups; therefore algorithms of protein structure alignment, protein similarity search and protein structure prediction are very helpful for protein biologists. We developed new algorithms for the problems in this field. The algorithms are tested with structures from the Protein Data Bank (PDB) and SCOP, a Structure Classification of Protein Database. The experimental results show that our tools are more efficient than some well known systems for finding similar protein structures and predicting protein structures

Scholarworks@UTRGV Univ. of Texas RioGrande Valley

Efficient search and comparison algorithms for 3D protein binding site retrieval and structure alignment from large-scale databases

Author: Pang Bin, 1971-
Publication venue: 'University of Missouri Libraries'
Publication date
Field of study

Finding similar 3D structures is crucial for discovering potential structural, evolutionary, and functional relationships among proteins. As the number of known protein structures has dramatically increased, traditional methods can no longer provide the life science community with the adequate informatics capability needed to conduct large-scale and complex analyses. A suite of high-throughput and accurate protein structure search and comparison methods is essential. To meet the needs of the community, we develop several bioinformatics methods for protein binding site comparison and global structure alignment. First, we developed an efficient protein binding site search that is based on extracting geometric features both locally and globally. The main idea of this work was to capture spatial relationships among landmarks of binding site surfaces and bfuild a vocabulary of visual words to represent the characteristics of the surfaces. A vector model was then used to speed up the search of similar surfaces that share similar visual words with the query interface. Second, we developed an approach for accurate protein binding site comparison. Our algorithm provides an accurate binding site alignment by applying a two-level heuristic process which progressively refines alignment results from coarse surface point level to accurate residue atom level. This setting allowed us to explore different combinations of pairs of corresponding residues, thus improving the alignment quality of the binding site surfaces. Finally, we introduced a parallel algorithm for global protein structure alignment. Specifically, to speed up the time-consuming structure alignment process of protein 3D structures, we designed a parallel protein structure alignment framework to exploit the parallelism of Graphics Processing Units (GPUs). As a general-purpose GPU platform, the framework is capable of parallelizing traditional structure alignment algorithms. Our findings can be applied in various research areas, such as prediction of protein inte

University of Missouri: MOspace

A data science approach to pattern discovery in complex structures with applications in bioinformatics

Author: Hua Lei
Publication venue: Digital Commons @ NJIT
Publication date: 31/05/2016
Field of study

Pattern discovery aims to find interesting, non-trivial, implicit, previously unknown and potentially useful patterns in data. This dissertation presents a data science approach for discovering patterns or motifs from complex structures, particularly complex RNA structures. RNA secondary and tertiary structure motifs are very important in biological molecules, which play multiple vital roles in cells. A lot of work has been done on RNA motif annotation. However, pattern discovery in RNA structure is less studied. In the first part of this dissertation, an ab initio algorithm, named DiscoverR, is introduced for pattern discovery in RNA secondary structures. This algorithm works by representing RNA secondary structures as ordered labeled trees and performs tree pattern discovery using a quadratic time dynamic programming algorithm. The algorithm is able to identify and extract the largest common substructures from two RNA molecules of different sizes, without prior knowledge of locations and topologies of these substructures. One application of DiscoverR is to locate the RNA structural elements in genomes. Experimental results show that this tool complements the currently used approaches for mining conserved structural RNAs in the human genome. DiscoverR can also be extended to find repeated regions in an RNA secondary structure. Specifically, this extended method is used to detect structural repeats in the 3\u27-untranslated region of a protein kinase gene

Digital Commons @ New Jersey Institute of Technology (NJIT)