3,606 research outputs found

    Fast Statistical Alignment

    Get PDF
    We describe a new program for the alignment of multiple biological sequences that is both statistically motivated and fast enough for problem sizes that arise in practice. Our Fast Statistical Alignment program is based on pair hidden Markov models which approximate an insertion/deletion process on a tree and uses a sequence annealing algorithm to combine the posterior probabilities estimated from these models into a multiple alignment. FSA uses its explicit statistical model to produce multiple alignments which are accompanied by estimates of the alignment accuracy and uncertainty for every column and character of the alignment—previously available only with alignment programs which use computationally-expensive Markov Chain Monte Carlo approaches—yet can align thousands of long sequences. Moreover, FSA utilizes an unsupervised query-specific learning procedure for parameter estimation which leads to improved accuracy on benchmark reference alignments in comparison to existing programs. The centroid alignment approach taken by FSA, in combination with its learning procedure, drastically reduces the amount of false-positive alignment on biological data in comparison to that given by other methods. The FSA program and a companion visualization tool for exploring uncertainty in alignments can be used via a web interface at http://orangutan.math.berkeley.edu/fsa/, and the source code is available at http://fsa.sourceforge.net/

    QuASeR -- Quantum Accelerated De Novo DNA Sequence Reconstruction

    Full text link
    In this article, we present QuASeR, a reference-free DNA sequence reconstruction implementation via de novo assembly on both gate-based and quantum annealing platforms. Each one of the four steps of the implementation (TSP, QUBO, Hamiltonians and QAOA) is explained with simple proof-of-concept examples to target both the genomics research community and quantum application developers in a self-contained manner. The details of the implementation are discussed for the various layers of the quantum full-stack accelerator design. We also highlight the limitations of current classical simulation and available quantum hardware systems. The implementation is open-source and can be found on https://github.com/prince-ph0en1x/QuASeR.Comment: 24 page

    AccuSyn: Using Simulated Annealing to Declutter Genome Visualizations

    Get PDF
    We apply Simulated Annealing, a well-known metaheuristic for obtaining near-optimal solutions to optimization problems, to discover conserved synteny relations (similar features) in genomes. The analysis of synteny gives biologists insights into the evolutionary history of species and the functional relationships between genes. However, as even simple organisms have huge numbers of genomic features, syntenic plots initially present an enormous clutter of connections, making the structure difficult to understand. We address this problem by using Simulated Annealing to minimize link crossings. Our interactive web-based synteny browser, AccuSyn, visualizes syntenic relations with circular plots of chromosomes and draws links between similar blocks of genes. It also brings together a huge amount of genomic data by integrating an adjacent view and additional tracks, to visualize the details of the blocks and accompanying genomic data, respectively. Our work shows multiple ways to manually declutter a synteny plot and then thoroughly explains how we integrated Simulated Annealing, along with human interventions as a human-in-the-loop approach, to achieve an accurate representation of conserved synteny relations for any genome. The goal of AccuSyn was to make a fairly complete tool combining ideas from four major areas: genetics, information visualization, heuristic search, and human-in-the-loop. Our results contribute to a better understanding of synteny plots and show the potential that decluttering algorithms have for syntenic analysis, adding more clues for evolutionary development. At this writing, AccuSyn is already actively used in the research being done at the University of Saskatchewan and has already produced a visualization of the recently-sequenced Wheat genome

    A survey of DNA motif finding algorithms

    Get PDF
    Background: Unraveling the mechanisms that regulate gene expression is a major challenge in biology. An important task in this challenge is to identify regulatory elements, especially the binding sites in deoxyribonucleic acid (DNA) for transcription factors. These binding sites are short DNA segments that are called motifs. Recent advances in genome sequence availability and in high-throughput gene expression analysis technologies have allowed for the development of computational methods for motif finding. As a result, a large number of motif finding algorithms have been implemented and applied to various motif models over the past decade. This survey reviews the latest developments in DNA motif finding algorithms.Results: Earlier algorithms use promoter sequences of coregulated genes from single genome and search for statistically overrepresented motifs. Recent algorithms are designed to use phylogenetic footprinting or orthologous sequences and also an integrated approach where promoter sequences of coregulated genes and phylogenetic footprinting are used. All the algorithms studied have been reported to correctly detect the motifs that have been previously detected by laboratory experimental approaches, and some algorithms were able to find novel motifs. However, most of these motif finding algorithms have been shown to work successfully in yeast and other lower organisms, but perform significantly worse in higher organisms.Conclusion: Despite considerable efforts to date, DNA motif finding remains a complex challenge for biologists and computer scientists. Researchers have taken many different approaches in developing motif discovery tools and the progress made in this area of research is very encouraging. Performance comparison of different motif finding tools and identification of the best tools have proven to be a difficult task because tools are designed based on algorithms and motif models that are diverse and complex and our incomplete understanding of the biology of regulatory mechanism does not always provide adequate evaluation of underlying algorithms over motif models.Peer reviewedComputer Scienc

    A Combined Motif Discovery Method

    Get PDF
    A central problem in the bioinformatics is to find the binding sites for regulatory motifs. This is a challenging problem that leads us to a platform to apply a variety of data mining methods. In the efforts described here, a combined motif discovery method that uses mutual information and Gibbs sampling was developed. A new scoring schema was introduced with mutual information and joint information content involved. Simulated tempering was embedded into classic Gibbs sampling to avoid local optima. This method was applied to the 18 pieces DNA sequences containing CRP binding sites validated by Stormo and the results were compared with Bioprospector. Based on the results, the new scoring schema can get over the defect that the basic model PWM only contains single positioin information. Simulated tempering proved to be an adaptive adjustment of the search strategy and showed a much increased resistance to local optima
    corecore