10 research outputs found

    Pair HMM based gap statistics for re-evaluation of indels in alignments with affine gap penalties: Extended Version

    Full text link
    Although computationally aligning sequence is a crucial step in the vast majority of comparative genomics studies our understanding of alignment biases still needs to be improved. To infer true structural or homologous regions computational alignments need further evaluation. It has been shown that the accuracy of aligned positions can drop substantially in particular around gaps. Here we focus on re-evaluation of score-based alignments with affine gap penalty costs. We exploit their relationships with pair hidden Markov models and develop efficient algorithms by which to identify gaps which are significant in terms of length and multiplicity. We evaluate our statistics with respect to the well-established structural alignments from SABmark and find that indel reliability substantially increases with their significance in particular in worst-case twilight zone alignments. This points out that our statistics can reliably complement other methods which mostly focus on the reliability of match positions.Comment: 17 pages, 7 figure

    PSI-BLAST-ISS: an intermediate sequence search tool for estimation of the position-specific alignment reliability

    Get PDF
    BACKGROUND: Protein sequence alignments have become indispensable for virtually any evolutionary, structural or functional study involving proteins. Modern sequence search and comparison methods combined with rapidly increasing sequence data often can reliably match even distantly related proteins that share little sequence similarity. However, even highly significant matches generally may have incorrectly aligned regions. Therefore when exact residue correspondence is used to transfer biological information from one aligned sequence to another, it is critical to know which alignment regions are reliable and which may contain alignment errors. RESULTS: PSI-BLAST-ISS is a standalone Unix-based tool designed to delineate reliable regions of sequence alignments as well as to suggest potential variants in unreliable regions. The region-specific reliability is assessed by producing multiple sequence alignments in different sequence contexts followed by the analysis of the consistency of alignment variants. The PSI-BLAST-ISS output enables the user to simultaneously analyze alignment reliability between query and multiple homologous sequences. In addition, PSI-BLAST-ISS can be used to detect distantly related homologous proteins. The software is freely available at: . CONCLUSION: PSI-BLAST-ISS is an effective reliability assessment tool that can be useful in applications such as comparative modelling or analysis of individual sequence regions. It favorably compares with the existing similar software both in the performance and functional features

    Measuring Global Credibility with Application to Local Sequence Alignment

    Get PDF
    Computational biology is replete with high-dimensional (high-D) discrete prediction and inference problems, including sequence alignment, RNA structure prediction, phylogenetic inference, motif finding, prediction of pathways, and model selection problems in statistical genetics. Even though prediction and inference in these settings are uncertain, little attention has been focused on the development of global measures of uncertainty. Regardless of the procedure employed to produce a prediction, when a procedure delivers a single answer, that answer is a point estimate selected from the solution ensemble, the set of all possible solutions. For high-D discrete space, these ensembles are immense, and thus there is considerable uncertainty. We recommend the use of Bayesian credibility limits to describe this uncertainty, where a (1−α)%, 0≤α≤1, credibility limit is the minimum Hamming distance radius of a hyper-sphere containing (1−α)% of the posterior distribution. Because sequence alignment is arguably the most extensively used procedure in computational biology, we employ it here to make these general concepts more concrete. The maximum similarity estimator (i.e., the alignment that maximizes the likelihood) and the centroid estimator (i.e., the alignment that minimizes the mean Hamming distance from the posterior weighted ensemble of alignments) are used to demonstrate the application of Bayesian credibility limits to alignment estimators. Application of Bayesian credibility limits to the alignment of 20 human/rodent orthologous sequence pairs and 125 orthologous sequence pairs from six Shewanella species shows that credibility limits of the alignments of promoter sequences of these species vary widely, and that centroid alignments dependably have tighter credibility limits than traditional maximum similarity alignments

    Tecnologías bioinformáticas para el análisis de secuencias de ADN

    Get PDF
    La información contenida en secuencias de ADN,por su contenido voluminoso requiere de técnicas inteligentes para el modelamiento de los datos y de métodos computacionales avanzados para el procesamiento de estos. Se busca optimizar el tiempo en el que se ejecutan cálculos e inferencias, y mejorar la confiabilidad de los análisis que se realizan a partir de los resultados obtenidos, los cuales pueden servir de base para el desarrollo de investigaciones científicas. El grupo de investigación GIA del programa Ingeniería de Sistemas y Computación de la Universidad Tecnológica de Pereira, se encuentra trabajando en la determinación de tecnologías informáticas que permitan hacer avances significativos en los desarrollos científicos en el campo de la biología. Este artículo explora que técnicas computacionales son pertinentes en el desarrollo de aplicaciones bioinformáticas.

    Soft Computing, Artificial Intelligence, Fuzzy Logic & Genetic Algorithm in Bioinformatics

    Get PDF
    Abstract Soft computing is creating several possibilities in bioinformatics, especially by generating low-cost, low precision (approximate), good solutions. Bioinformatics is an interdisciplinary research area that is the interface between the biological and computational sciences. Bioinformatics deals with algorithms, databases and information systems, web technologies, artificial intelligence and soft computing, information and computation theory, structural biology, software engineering, data mining, image processing, modeling and simulation, discrete mathematics, control and system theory, circuit theory, and statistics. Despite of a high number of techniques specifically dedicated to bioinformatics problems as well as many successful applications, we are in the beginning of a process to massively integrate the aspects and experiences in the different core subjects such as biology, medicine, computer science, engineering, and mathematics. Recently the use of soft computing tools for solving bioinformatics problems have been gaining the attention of researchers because of their ability to handle imprecision, uncertainty in large and complex search spaces. The paper will focus on soft computing paradigm in bioinformatics with particular emphasis on integrative research

    Tecnologías bioinformáticas para el análisis de secuencias de ADN

    Get PDF
    La solución del problema biológico e informático del procesamiento de grandes volúmenes de datos que contienen información de las secuencias de ADN requiere métodos computacionales avanzados, buscando así una optimización del tiempo en el que se realiza este proceso actualmente y permitiendo colaborar en un futuro con las investigaciones de las ciencias biológicas. Se pretende conocer cuáles son las herramientas bioinformáticas específicas que permitan el avance de los estudios científicos en el análisis de secuencias de ADN.Solving the biological and computering problem about processing of large volumes of data that contains DNA sequences information requires advanced Computaciónal methods, searching an optimization of time that this process is done now and allowing collaborating on a future research life sciences. This article pretendes to know what are the specific bioinformatics tools that allows the advance of scientific studies in the DNA secuence analysis

    A Novel Approach to Local Reliability of Sequence Alignments

    No full text
    Motivation: The pairwise alignment of biological sequences obtained from an algorithm will in general contain both correct and incorrect parts. Hence, to allow for a valid interpretation of the alignment, the local trustworthiness of the alignment has to be quantified. Results

    A novel approach to local reliability of sequence alignments

    No full text
    corecore