9,374 research outputs found

    Faster Approximate String Matching for Short Patterns

    Full text link
    We study the classical approximate string matching problem, that is, given strings PP and QQ and an error threshold kk, find all ending positions of substrings of QQ whose edit distance to PP is at most kk. Let PP and QQ have lengths mm and nn, respectively. On a standard unit-cost word RAM with word size wlognw \geq \log n we present an algorithm using time O(nkmin(log2mlogn,log2mlogww)+n) O(nk \cdot \min(\frac{\log^2 m}{\log n},\frac{\log^2 m\log w}{w}) + n) When PP is short, namely, m=2o(logn)m = 2^{o(\sqrt{\log n})} or m=2o(w/logw)m = 2^{o(\sqrt{w/\log w})} this improves the previously best known time bounds for the problem. The result is achieved using a novel implementation of the Landau-Vishkin algorithm based on tabulation and word-level parallelism.Comment: To appear in Theory of Computing System

    Optimized Merge Sort on Modern Commodity Multi-core CPUs

    Get PDF
    Sorting is a kind of widely used basic algorithms. As the high performance computing devices are increasingly common, more and more modern commodity machines have the capability of parallel concurrent computing. A new implementation of sorting algorithms is proposed to harness the power of newer SIMD operations and multi-core computing provided by modern CPUs. The algorithm is hybrid by optimized bitonic sorting network and multi-way merge. New SIMD instructions provided by modern CPUs are used in the bitonic network implementation, which adopted a different method to arrange data so that the number of SIMD operations is reduced. Balanced binary trees are used in multi-way merge, which is also different with former implementations. Efforts are also paid on minimizing data moving in memory since merge sort is a kind of memory-bound application. The performance evaluation shows that the proposed algorithm is twice as fast as the sort function in C++ standard library when only single thread is used. It also outperforms radix sort implemented in Boost library

    Optimizing sorting algorithms for the Cell Broadband Engine

    Get PDF
    The quest for higher performance in computationally intensive tasks is and will always be an ongoing effort. General purpose processors (GPP) have not been sufficient for many of these tasks which has led to research focused towards computing on specialty processors and graphics processing units (GPU). While GPU provide sufficient speedups for some tasks, other specialty processors may be better suited, more economical, or more efficient for different types of tasks. Sorting is an important task in many applications and can be computationally intensive when dealing with large data sets. One such specialty processor that has proven to be a viable solution for sorting is the Cell Broadband Engine (CBE). The CBE is being used as the main platform for this thesis since there are already applications for it that require sorting software. The Cell processor is a general purpose processor that combines one master PowerPC core with eight other vector processors connected via a high bandwidth interconnect bus. The user must explicitly manage the communication, scheduling, and load-balancing between the vector processors and the PowerPC processor to achieve the highest efficiency. By optimizing the sorting algorithms for the vector processors, large speedups can be achieved because multiple operations occur simultaneously. Optimized sorting software is often sought when sorting is not the main purpose of the application. This keeps overheads low so that the performance gains can be realized from the actual code that is to be optimized on specialty processors. Often having sorted datasets enable algorithms to run faster and are more predictably. The motivation behind this thesis is that there is currently no standard library of sorting algorithms that have been optimized for the CBE. Lack of standard libraries makes writing code for the CBE difficult. Results from previous works have also not been sufficient in providing specific measurements of sorting performance. This thesis will explore the development and analysis of a variety of optimized parallel sorting algorithms written for the Cell processor. This thesis will focus on the sorting of both individual elements within vectors as well as sorting entire vectors within arrays. The sorting algorithms, written in C++, that will be optimized and analyzed include, but are not limited to bitonic sort, heap sort, merge sort, and quick sort. A communication management framework will also be created as a main focus of this thesis in order to better understand the architecture of the processor
    corecore