    Computing alignment plots efficiently

    Dot plots are a standard method for local comparison of biological sequences. In a dot plot, a substring to substring distance is computed for all pairs of fixed-size windows in the input strings. Commonly, the Hamming distance is used since it can be computed in linear time. However, the Hamming distance is a rather crude measure of string similarity, and using an alignment-based edit distance can greatly improve the sensitivity of the dot plot method. In this paper, we show how to compute alignment plots of the latter type efficiently. Given two strings of length m and n and a window size w, this problem consists in computing the edit distance between all pairs of substrings of length w, one from each input string. The problem can be solved by repeated application of the standard dynamic programming algorithm in time O(mnw^2). This paper gives an improved data-parallel algorithm, running in time O(mnw/γ/p)O(mnw/\gamma/p) using vector operations that work on γ\gamma values in parallel and pp processors. We show experimental results from an implementation of this algorithm, which uses Intel's MMX/SSE instructions for vector parallelism and MPI for coarse-grained parallelism.Comment: Presented at ParCo 200

    A hybrid algorithm for the longest common transposition-invariant subsequence problem

    The longest common transposition-invariant subsequence (LCTS) problem is a music information retrieval oriented variation of the classic LCS problem. There are basically only two known efficient approaches to calculate the length of the LCTS, one based on sparse dynamic programming and the other on bit-parallelism. In this work, we propose a hybrid algorithm picking the better of the two algorithms for individual subproblems. Experiments on music (MIDI), with 32-bit and 64-bit implementations, show that the proposed algorithm outperforms the faster of the two component algorithms by a factor of 1.4–2.0, depending on sequence lengths. Similar, if not better, improvements can be observed for random data with Gaussian distribution. Also for uniformly random data, the hybrid algorithm is the winner if the alphabet is neither too small (at least 32 symbols) nor too large (up to 128 symbols). Part of the success of our scheme is attributed to a quite robust component selection heuristic

    Plagiarism detection in source programs using structural similarities

    The paper presents a plagiarism detection framework the goal of which is to determine whether two programs are similar to each other, and if so, to what extent. The issue of plagiarism detection has been considered earlier for written material, such as student essays. For these, text-based algorithms have been published. We argue that in case of program code comparison, structure based techniques may be much more suitable. The main idea is to transform the source code into mathematical objects, use appropriate reduction and comparison methods on these, and interpret the results appropriately. We have designed a generic program structure comparison framework and implemented it for the Prolog and SML programming languages. We have been using the implementation at BUTE to successfully detect plagiarism in homework assignments for years

    Algorithms in the Ultra-Wide Word Model

    The effective use of parallel computing resources to speed up algorithms in current multi-core parallel architectures remains a difficult challenge, with ease of programming playing a key role in the eventual success of various parallel architectures. In this paper we consider an alternative view of parallelism in the form of an ultra-wide word processor. We introduce the Ultra-Wide Word architecture and model, an extension of the word-RAM model that allows for constant time operations on thousands of bits in parallel. Word parallelism as exploited by the word-RAM model does not suffer from the more difficult aspects of parallel programming, namely synchronization and concurrency. For the standard word-RAM algorithms, the speedups obtained are moderate, as they are limited by the word size. We argue that a large class of word-RAM algorithms can be implemented in the Ultra-Wide Word model, obtaining speedups comparable to multi-threaded computations while keeping the simplicity of programming of the sequential RAM model. We show that this is the case by describing implementations of Ultra-Wide Word algorithms for dynamic programming and string searching. In addition, we show that the Ultra-Wide Word model can be used to implement a nonstandard memory architecture, which enables the sidestepping of lower bounds of important data structure problems such as priority queues and dynamic prefix sums. While similar ideas about operating on large words have been mentioned before in the context of multimedia processors [Thorup 2003], it is only recently that an architecture like the one we propose has become feasible and that details can be worked out.Comment: 28 pages, 5 figures; minor change

    A Fast and Practical Bit-Vector Algorithm for the Longest Common Subsequence Problem

    This paper presents a new practical bit-vector algorithm for solving the well known Longest Common Subsequence (LCS) problem. Given two strings of length m and n, n m, we present an algorithm which determines the length p of an LCS in O(nm=w) time and O(m=w) space, where w is the number of bits in a machine word. This algorithm can be thought of as column-wise "parallelization" of the classical dynamic programming approach. Our algorithm is very efficiently in practice, where computing the length of an LCS of two strings can be done in linear time and constant (additional/working) space by assuming that m w

    Fine-grained Complexity Meets IP = PSPACE

    In this paper we study the fine-grained complexity of finding exact and approximate solutions to problems in P. Our main contribution is showing reductions from exact to approximate solution for a host of such problems. As one (notable) example, we show that the Closest-LCS-Pair problem (Given two sets of strings AA and BB, compute exactly the maximum LCS(a,b)\textsf{LCS}(a, b) with (a,b)∈A×B(a, b) \in A \times B) is equivalent to its approximation version (under near-linear time reductions, and with a constant approximation factor). More generally, we identify a class of problems, which we call BP-Pair-Class, comprising both exact and approximate solutions, and show that they are all equivalent under near-linear time reductions. Exploring this class and its properties, we also show: ∙\bullet Under the NC-SETH assumption (a significantly more relaxed assumption than SETH), solving any of the problems in this class requires essentially quadratic time. ∙\bullet Modest improvements on the running time of known algorithms (shaving log factors) would imply that NEXP is not in non-uniform NC1\textsf{NC}^1. ∙\bullet Finally, we leverage our techniques to show new barriers for deterministic approximation algorithms for LCS. At the heart of these new results is a deep connection between interactive proof systems for bounded-space computations and the fine-grained complexity of exact and approximate solutions to problems in P. In particular, our results build on the proof techniques from the classical IP = PSPACE result

    Bit-parallel and SIMD alignment algorithms for biological sequence analysis

    High-throughput next-generation sequencing techniques have hugely decreased the cost and increased the speed of sequencing, resulting in an explosion of sequencing data. This motivates the development of high-efficiency sequence alignment algorithms. In this thesis, I present multiple bit-parallel and Single Instruction Multiple Data (SIMD) algorithms that greatly accelerate the processing of biological sequences. The first chapter describes the BitPAl bit-parallel algorithms for global alignment with general integer scoring, which assigns integer weights for match, mismatch, and insertion/deletion. The bit-parallel approach represents individual cells in an alignment scoring matrix as bits in computer words and emulates the calculation of scores by a series of logic operations. Bit-parallelism has previously been applied to other pattern matching problems, producing fast algorithms. In timed tests, we show that BitPAl runs 7 - 25 times faster than a standard iterative algorithm. The second part involves two approaches to alignment with substitution scoring, which assigns a potentially different substitution weight to every pair of alphabet characters, better representing the relative rates of different mutations. The first approach extends the existing BitPAl method. The second approach is a new SIMD algorithm that uses partial sums of adjacent score differences. I present a simple partial sum method as well as one that uses parallel scan for additional acceleration. Results demonstrate that these algorithms are significantly faster than existing SIMD dynamic programming algorithms. Finally, I describe two extensions to the partial sums algorithm. The first adds support for affine gap penalty scoring. Affine gap scoring represents the biological likelihood that it is more likely for gaps to be continuous than to be distributed throughout a region by introducing a gap opening penalty and a gap extension penalty. The second extension is an algorithm that uses the partial sums method to calculate the tandem alignment of a pattern against a text sequence using a single pattern copy. Next generation sequencing data provides a wealth of information to researchers. Extracting that information in a timely manner increases the utility and practicality of sequence analysis algorithms. This thesis presents a family of algorithms which provide alignment scores in less time than previous algorithms

    Communication avoiding parallel algorithms for amorphous problems

    Parallelizing large sized problem in parallel systems has always been a challenge for programmer. This difficulty is caused by the complexity of the existing systems as well as the target problems. This is becoming a greater issue as the data sizes are constantly growing and as a result, larger parallel systems are required. Graph algorithms, machine learning problems and bio-informatics methods are among the many ever-growing problems. These group of problems are amorphous, meaning that memory accesses are unpredictable and the application usually has a poor locality. Therefore, synchronizations in these problems are specially costly since all-to-all communications are required and delivering an efficient parallel algorithm becomes more challenging. Another difficulty with these problems is that the amount of parallelism in them is limited which naturally makes them hard to parallelize. This is due to complicated data-dependences among the data elements in the algorithm. Writing parallel algorithms for these problems, on the other hand, are specially difficult since an amorphous problem can be expressed in several dramatically different ways. This is because of complex data dependences which are statically unknown and therefore, many unique parallel approaches exist for a single problem. Consequently, programming each single approach requires starting from scratch which is time consuming. This thesis introduces several ways to avoid costly communications in amorphous problems by compromising from the computation. This means that we can increase the total amount of work done by the processors to avoid synchronizations in an algorithm. This is specially effective in large clusters since there is a massive computing power with very costly communications. These approaches, clearly, have a trade off between computation and communication and in this thesis, we study these trade offs as well. Also, we propose a new language to express the proposed algorithms to overcome the programming difficulty of the problems by providing tunable parameters for performance

