Search CORE

Identification of new homologs of PD-(D/E)XK nucleases by support vector machines trained on data derived from profile–profile alignments

Author: Ando
Ando
Ban
Dias
Dlakic
Feder
Finn
Gonzalez
Goudenege
Hadden
Halligan
Hildebrand
Holm
Janulaitis
Jones
Jurenaite-Urbanaviciene
Kinch
Knizewski
Kosinski
Kovall
Kyte
Li
Lin
Lu
Lukacs
Margelevičius
Margelevičius
Menon
Middleton
Mindaugas Laganeckas
Mindaugas Margelevičius
Motackova
Murzin
Nishino
Nishino
Orlowski
Pingoud
Reed
Rimseliene
Rimseliene
Roberts
Skirgaila
Söding
Tamulaitiene
Tamulaitis
Townson
Tsutakawa
Vapnik
Venclovas
Venclovas
Waterhouse
Xiang
Yang
Yuan
Zhou
Česlovas Venclovas
Publication venue: Oxford University Press
Publication date
Field of study

PD-(D/E)XK nucleases, initially represented by only Type II restriction enzymes, now comprise a large and extremely diverse superfamily of proteins. They participate in many different nucleic acids transactions including DNA degradation, recombination, repair and RNA processing. Different PD-(D/E)XK families, although sharing a structurally conserved core, typically display little or no detectable sequence similarity except for the active site motifs. This makes the identification of new superfamily members using standard homology search techniques challenging. To tackle this problem, we developed a method for the detection of PD-(D/E)XK families based on the binary classification of profile–profile alignments using support vector machines (SVMs). Using a number of both superfamily-specific and general features, SVMs were trained to identify true positive alignments of PD-(D/E)XK representatives. With this method we identified several PFAM families of uncharacterized proteins as putative new members of the PD-(D/E)XK superfamily. In addition, we assigned several unclassified restriction enzymes to the PD-(D/E)XK type. Results show that the new method is able to make confident assignments even for alignments that have statistically insignificant scores. We also implemented the method as a freely accessible web server at http://www.ibt.lt/bioinformatics/software/pdexk/

Detection of distant evolutionary relationships between protein families using theory of sequence profile-profile comparison

Author: A Dembo
A Kryshtafovych
A Zemla
A Šali
AA Schaffer
AG Murzin
AY Mitrophanov
G Yona
H Cheng
J Söding
JC Wootton
JM Chandonia
L Holm
L Rychlewski
Mindaugas Margelevičius
MO Dayhoff
N Siew
PZ Kozbial
R Arratia
R Bundschuh
R Kolodny
R Sadreyev
RC Edgar
RI Sadreyev
RL Tatusov
S Henikoff
S Henikoff
S Karlin
S Karlin
S Pietrokovski
SF Altschul
SF Altschul
TF Smith
TT Lee
Y Qi
Y Wang
Y Zhang
Česlovas Venclovas
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background Detection of common evolutionary origin (homology) is a primary means of inferring protein structure and function. At present, comparison of protein families represented as sequence profiles is arguably the most effective homology detection strategy. However, finding the best way to represent evolutionary information of a protein sequence family in the profile, to compare profiles and to estimate the biological significance of such comparisons, remains an active area of research. Results Here, we present a new homology detection method based on sequence profile-profile comparison. The method has a number of new features including position-dependent gap penalties and a global score system. Position-dependent gap penalties provide a more biologically relevant way to represent and align protein families as sequence profiles. The global score system enables an analytical solution of the statistical parameters needed to estimate the statistical significance of profile-profile similarities. The new method, together with other state-of-the-art profile-based methods (HHsearch, COMPASS and PSI-BLAST), is benchmarked in all-against-all comparison of a challenging set of SCOP domains that share at most 20% sequence identity. For benchmarking, we use a reference ("gold standard") free model-based evaluation framework. Evaluation results show that at the level of protein domains our method compares favorably to all other tested methods. We also provide examples of the new method outperforming structure-based similarity detection and alignment. The implementation of the new method both as a standalone software package and as a web server is available at <url>http://www.ibt.lt/bioinformatics/coma</url>. Conclusion Due to a number of developments, the new profile-profile comparison method shows an improved ability to match distantly related protein domains. Therefore, the method should be useful for annotation and homology modeling of uncharacterized proteins.</p

Springer - Publisher Connector

Directory of Open Access Journals

Estimating statistical significance of local protein profile-profile alignments

Author: Margelevičius Mindaugas
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2019
Field of study

Background: Alignment of sequence families described by profiles provides a sensitive means for establishing homology between proteins and is important in protein evolutionary, structural, and functional studies. In the context of a steadily growing amount of sequence data, estimating the statistical significance of alignments, including profile-profile alignments, plays a key role in alignment-based homology search algorithms. Still, it is an open question as to what and whether one type of distribution governs profile-profile alignment score, especially when profile-profile substitution scores involve such terms as secondary structure predictions. Results: This study presents a methodology for estimating the statistical significance of this type of alignments. The methodology rests on a new algorithm developed for generating random profiles such that their alignment scores are distributed similarly to those obtained for real unrelated profiles. We show that improvements in statistical accuracy and sensitivity and high-quality alignment rate result from statistically characterizing alignments by establishing the dependence of statistical parameters on various measures associated with both individual and pairwise profile characteristics. Implemented in the COMER software, the proposed methodology yielded an increase of up to 34.2% in the number of true positives and up to 61.8% in the number of high-quality alignments with respect to the previous version of the COMER method. Conclusions: The more accurate estimation of statistical significance is implemented in the COMER method, which is now more sensitive and provides an increased rate of high-quality profile-profile alignments. The results of the present study also suggest directions for future research

COMER2: GPU-accelerated sensitive and specific homology searches

Author: Margelevičius Mindaugas
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2020
Field of study

Searching for homology in the vast amount of sequence data has a particular emphasis on its speed. We present a completely rewritten version of the sensitive homology search method COMER based on alignment of protein sequence profiles, which is capable of searching big databases even on a lightweight laptop. By harnessing the power of CUDA-enabled GPUs, it is up to 20 times faster than HHsearch, a state-of-the-art method using vectorized instructions on modern CPUs. AVAILABILITY AND IMPLEMENTATION: COMER2 is cross-platform open-source software available at https://sourceforge.net/projects/comer2 and https://github.com/minmarg/comer2. It can be easily installed from source code or using stand-alone installers. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online

The COMER web server for protein analysis by homology

Author: Dapkūnas Justas
Margelevičius Mindaugas
Publication venue
Publication date: 01/01/2022
Field of study

SUMMARY: Sequence homology is a basic concept in protein evolution, structure, and function studies. However, there are not many different tools and services for homology searches being sensitive, accurate, and fast at the same time. We present a new web server for protein analysis based on COMER2, a sequence alignment and homology search method that exhibits these characteristics. COMER2 has been upgraded since its last publication to improve its alignment quality and ease of use. We demonstrate how the user can benefit from using it by providing examples of extensive annotation of proteins of unknown function. Among the distinctive features of the web server is the user's ability to submit multiple queries with one click of a button. This and other features allow for transparently running homology searches-in a command-line, programmatic, or graphical environment-across multiple databases with multiple queries. They also promote extensive simultaneous protein analysis at the sequence, structure, and function levels. AVAILABILITY AND IMPLEMENTATION: The COMER web server is available at https://bioinformatics.lt/comer. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online

Re-searcher: a system for recurrent detection of homologous protein sequences

Author: Margelevičius Mindaugas
Repšys Valdemaras
Venclovas Česlovas
Publication venue: BMC
Publication date: 01/01/2008
Field of study

Abstract Background Sequence searches are routinely employed to detect and annotate related proteins. However, a rapid growth of databases necessitates a frequent repetition of sequence searches and subsequent analysis of obtained results. Although there are several automatic systems available for executing periodical sequence searches and reporting results, they all suffer either from a lack of sensitivity, restrictive database choice or limited flexibility in setting up search strategies. Here, a new sequence search and reporting software package designed to address these shortcomings is described. Results Re-searcher is an open-source highly configurable system for recurrent detection and reporting of new homologs for the sequence of interest in specified protein sequence databases. Searches are performed using PSI-BLAST at desired time intervals either within NCBI or local databases. In addition to searches against individual databases, the system can perform "PDB-BLAST"-like combined searches, when PSI-BLAST profile generated during search against the first database is used to search the second database. The system supports multiple users enabling each to separately keep track of multiple queries and query-specific results. Conclusions Re-searcher features a large number of options enabling automatic periodic detection of both close and distant homologs. At the same time it has a simple and intuitive interface, making the analysis of results even for a large number of queries a straightforward task.</p

Springer - Publisher Connector

Directory of Open Access Journals

COMA server for protein distant homology search

Author: Altschul
Alva
Fekete
Holm
Madera
Margelevičius
Mindaugas Laganeckas
Mindaugas Margelevičius
Sadreyev
Söding
Wang
Xiang
Česlovas Venclovas
Publication venue: 'Oxford University Press (OUP)'
Publication date
Field of study