24 research outputs found

    PSI-BLAST-ISS: an intermediate sequence search tool for estimation of the position-specific alignment reliability

    Get PDF
    BACKGROUND: Protein sequence alignments have become indispensable for virtually any evolutionary, structural or functional study involving proteins. Modern sequence search and comparison methods combined with rapidly increasing sequence data often can reliably match even distantly related proteins that share little sequence similarity. However, even highly significant matches generally may have incorrectly aligned regions. Therefore when exact residue correspondence is used to transfer biological information from one aligned sequence to another, it is critical to know which alignment regions are reliable and which may contain alignment errors. RESULTS: PSI-BLAST-ISS is a standalone Unix-based tool designed to delineate reliable regions of sequence alignments as well as to suggest potential variants in unreliable regions. The region-specific reliability is assessed by producing multiple sequence alignments in different sequence contexts followed by the analysis of the consistency of alignment variants. The PSI-BLAST-ISS output enables the user to simultaneously analyze alignment reliability between query and multiple homologous sequences. In addition, PSI-BLAST-ISS can be used to detect distantly related homologous proteins. The software is freely available at: . CONCLUSION: PSI-BLAST-ISS is an effective reliability assessment tool that can be useful in applications such as comparative modelling or analysis of individual sequence regions. It favorably compares with the existing similar software both in the performance and functional features

    Identification of new homologs of PD-(D/E)XK nucleases by support vector machines trained on data derived from profileā€“profile alignments

    Get PDF
    PD-(D/E)XK nucleases, initially represented by only Type II restriction enzymes, now comprise a large and extremely diverse superfamily of proteins. They participate in many different nucleic acids transactions including DNA degradation, recombination, repair and RNA processing. Different PD-(D/E)XK families, although sharing a structurally conserved core, typically display little or no detectable sequence similarity except for the active site motifs. This makes the identification of new superfamily members using standard homology search techniques challenging. To tackle this problem, we developed a method for the detection of PD-(D/E)XK families based on the binary classification of profileā€“profile alignments using support vector machines (SVMs). Using a number of both superfamily-specific and general features, SVMs were trained to identify true positive alignments of PD-(D/E)XK representatives. With this method we identified several PFAM families of uncharacterized proteins as putative new members of the PD-(D/E)XK superfamily. In addition, we assigned several unclassified restriction enzymes to the PD-(D/E)XK type. Results show that the new method is able to make confident assignments even for alignments that have statistically insignificant scores. We also implemented the method as a freely accessible web server at http://www.ibt.lt/bioinformatics/software/pdexk/

    Detection of distant evolutionary relationships between protein families using theory of sequence profile-profile comparison

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Detection of common evolutionary origin (homology) is a primary means of inferring protein structure and function. At present, comparison of protein families represented as sequence profiles is arguably the most effective homology detection strategy. However, finding the best way to represent evolutionary information of a protein sequence family in the profile, to compare profiles and to estimate the biological significance of such comparisons, remains an active area of research.</p> <p>Results</p> <p>Here, we present a new homology detection method based on sequence profile-profile comparison. The method has a number of new features including position-dependent gap penalties and a global score system. Position-dependent gap penalties provide a more biologically relevant way to represent and align protein families as sequence profiles. The global score system enables an analytical solution of the statistical parameters needed to estimate the statistical significance of profile-profile similarities. The new method, together with other state-of-the-art profile-based methods (HHsearch, COMPASS and PSI-BLAST), is benchmarked in all-against-all comparison of a challenging set of SCOP domains that share at most 20% sequence identity. For benchmarking, we use a reference ("gold standard") free model-based evaluation framework. Evaluation results show that at the level of protein domains our method compares favorably to all other tested methods. We also provide examples of the new method outperforming structure-based similarity detection and alignment. The implementation of the new method both as a standalone software package and as a web server is available at <url>http://www.ibt.lt/bioinformatics/coma</url>.</p> <p>Conclusion</p> <p>Due to a number of developments, the new profile-profile comparison method shows an improved ability to match distantly related protein domains. Therefore, the method should be useful for annotation and homology modeling of uncharacterized proteins.</p

    Estimating statistical significance of local protein profile-profile alignments

    No full text
    Background: Alignment of sequence families described by profiles provides a sensitive means for establishing homology between proteins and is important in protein evolutionary, structural, and functional studies. In the context of a steadily growing amount of sequence data, estimating the statistical significance of alignments, including profile-profile alignments, plays a key role in alignment-based homology search algorithms. Still, it is an open question as to what and whether one type of distribution governs profile-profile alignment score, especially when profile-profile substitution scores involve such terms as secondary structure predictions. Results: This study presents a methodology for estimating the statistical significance of this type of alignments. The methodology rests on a new algorithm developed for generating random profiles such that their alignment scores are distributed similarly to those obtained for real unrelated profiles. We show that improvements in statistical accuracy and sensitivity and high-quality alignment rate result from statistically characterizing alignments by establishing the dependence of statistical parameters on various measures associated with both individual and pairwise profile characteristics. Implemented in the COMER software, the proposed methodology yielded an increase of up to 34.2% in the number of true positives and up to 61.8% in the number of high-quality alignments with respect to the previous version of the COMER method. Conclusions: The more accurate estimation of statistical significance is implemented in the COMER method, which is now more sensitive and provides an increased rate of high-quality profile-profile alignments. The results of the present study also suggest directions for future research

    COMER2: GPU-accelerated sensitive and specific homology searches

    No full text
    Searching for homology in the vast amount of sequence data has a particular emphasis on its speed. We present a completely rewritten version of the sensitive homology search method COMER based on alignment of protein sequence profiles, which is capable of searching big databases even on a lightweight laptop. By harnessing the power of CUDA-enabled GPUs, it is up to 20 times faster than HHsearch, a state-of-the-art method using vectorized instructions on modern CPUs. AVAILABILITY AND IMPLEMENTATION: COMER2 is cross-platform open-source software available at https://sourceforge.net/projects/comer2 and https://github.com/minmarg/comer2. It can be easily installed from source code or using stand-alone installers. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online

    The COMER web server for protein analysis by homology

    No full text
    SUMMARY: Sequence homology is a basic concept in protein evolution, structure, and function studies. However, there are not many different tools and services for homology searches being sensitive, accurate, and fast at the same time. We present a new web server for protein analysis based on COMER2, a sequence alignment and homology search method that exhibits these characteristics. COMER2 has been upgraded since its last publication to improve its alignment quality and ease of use. We demonstrate how the user can benefit from using it by providing examples of extensive annotation of proteins of unknown function. Among the distinctive features of the web server is the user's ability to submit multiple queries with one click of a button. This and other features allow for transparently running homology searches-in a command-line, programmatic, or graphical environment-across multiple databases with multiple queries. They also promote extensive simultaneous protein analysis at the sequence, structure, and function levels. AVAILABILITY AND IMPLEMENTATION: The COMER web server is available at https://bioinformatics.lt/comer. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online

    Re-searcher: a system for recurrent detection of homologous protein sequences

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Sequence searches are routinely employed to detect and annotate related proteins. However, a rapid growth of databases necessitates a frequent repetition of sequence searches and subsequent analysis of obtained results. Although there are several automatic systems available for executing periodical sequence searches and reporting results, they all suffer either from a lack of sensitivity, restrictive database choice or limited flexibility in setting up search strategies. Here, a new sequence search and reporting software package designed to address these shortcomings is described.</p> <p>Results</p> <p>Re-searcher is an open-source highly configurable system for recurrent detection and reporting of new homologs for the sequence of interest in specified protein sequence databases. Searches are performed using PSI-BLAST at desired time intervals either within NCBI or local databases. In addition to searches against individual databases, the system can perform "PDB-BLAST"-like combined searches, when PSI-BLAST profile generated during search against the first database is used to search the second database. The system supports multiple users enabling each to separately keep track of multiple queries and query-specific results.</p> <p>Conclusions</p> <p>Re-searcher features a large number of options enabling automatic periodic detection of both close and distant homologs. At the same time it has a simple and intuitive interface, making the analysis of results even for a large number of queries a straightforward task.</p
    corecore