38,857 research outputs found

    Computing alignment plots efficiently

    Full text link
    Dot plots are a standard method for local comparison of biological sequences. In a dot plot, a substring to substring distance is computed for all pairs of fixed-size windows in the input strings. Commonly, the Hamming distance is used since it can be computed in linear time. However, the Hamming distance is a rather crude measure of string similarity, and using an alignment-based edit distance can greatly improve the sensitivity of the dot plot method. In this paper, we show how to compute alignment plots of the latter type efficiently. Given two strings of length m and n and a window size w, this problem consists in computing the edit distance between all pairs of substrings of length w, one from each input string. The problem can be solved by repeated application of the standard dynamic programming algorithm in time O(mnw^2). This paper gives an improved data-parallel algorithm, running in time O(mnw/γ/p)O(mnw/\gamma/p) using vector operations that work on γ\gamma values in parallel and pp processors. We show experimental results from an implementation of this algorithm, which uses Intel's MMX/SSE instructions for vector parallelism and MPI for coarse-grained parallelism.Comment: Presented at ParCo 200

    A methodology for determining amino-acid substitution matrices from set covers

    Full text link
    We introduce a new methodology for the determination of amino-acid substitution matrices for use in the alignment of proteins. The new methodology is based on a pre-existing set cover on the set of residues and on the undirected graph that describes residue exchangeability given the set cover. For fixed functional forms indicating how to obtain edge weights from the set cover and, after that, substitution-matrix elements from weighted distances on the graph, the resulting substitution matrix can be checked for performance against some known set of reference alignments and for given gap costs. Finding the appropriate functional forms and gap costs can then be formulated as an optimization problem that seeks to maximize the performance of the substitution matrix on the reference alignment set. We give computational results on the BAliBASE suite using a genetic algorithm for optimization. Our results indicate that it is possible to obtain substitution matrices whose performance is either comparable to or surpasses that of several others, depending on the particular scenario under consideration

    Experimental demonstration of a directionally-unbiased linear-optical multiport

    Full text link
    All existing optical quantum walk approaches are based on the use of beamsplitters and multiple paths to explore the multitude of unitary transformations of quantum amplitudes in a Hilbert space. The beamsplitter is naturally a directionally biased device: the photon cannot travel in reverse direction. This causes rapid increases in optical hardware resources required for complex quantum walk applications, since the number of options for the walking particle grows with each step. Here we present the experimental demonstration of a directionally-unbiased linear-optical multiport, which allows reversibility of photon direction. An amplitude-controllable probability distribution matrix for a unitary three-edge vertex is reconstructed with only linear-optical devices. Such directionally-unbiased multiports allow direct execution of quantum walks over a multitude of complex graphs and in tensor networks. This approach would enable simulation of complex Hamiltonians of physical systems and quantum walk applications in a more efficient and compact setup, substantially reducing the required hardware resources

    Jabba: hybrid error correction for long sequencing reads

    Get PDF
    Background: Third generation sequencing platforms produce longer reads with higher error rates than second generation technologies. While the improved read length can provide useful information for downstream analysis, underlying algorithms are challenged by the high error rate. Error correction methods in which accurate short reads are used to correct noisy long reads appear to be attractive to generate high-quality long reads. Methods that align short reads to long reads do not optimally use the information contained in the second generation data, and suffer from large runtimes. Recently, a new hybrid error correcting method has been proposed, where the second generation data is first assembled into a de Bruijn graph, on which the long reads are then aligned. Results: In this context we present Jabba, a hybrid method to correct long third generation reads by mapping them on a corrected de Bruijn graph that was constructed from second generation data. Unique to our method is the use of a pseudo alignment approach with a seed-and-extend methodology, using maximal exact matches (MEMs) as seeds. In addition to benchmark results, certain theoretical results concerning the possibilities and limitations of the use of MEMs in the context of third generation reads are presented. Conclusion: Jabba produces highly reliable corrected reads: almost all corrected reads align to the reference, and these alignments have a very high identity. Many of the aligned reads are error-free. Additionally, Jabba corrects reads using a very low amount of CPU time. From this we conclude that pseudo alignment with MEMs is a fast and reliable method to map long highly erroneous sequences on a de Bruijn graph
    corecore