Search CORE

6,262 research outputs found

Sorting by Block Moves

Author: Huang Jici
Publication venue: UNF Digital Commons
Publication date: 01/01/2015
Field of study

The research in this thesis is focused on the problem of Block Sorting, which has applications in Computational Biology and in Optical Character Recognition (OCR). A block in a permutation is a maximal sequence of consecutive elements that are also consecutive in the identity permutation. BLOCK SORTING is the process of transforming an arbitrary permutation to the identity permutation through a sequence of block moves. Given an arbitrary permutation π and an integer m, the Block Sorting Problem, or the problem of deciding whether the transformation can be accomplished in at most m block moves has been shown to be NP-hard. After being known to be 3-approximable for over a decade, block sorting has been researched extensively and now there are several 2-approximation algorithms for its solution. This work introduces new structures on a permutation, which are called runs and ordered pairs, and are used to develop two new approximation algorithms. Both the new algorithms are 2-approximation algorithms, yielding the approximation ratio equal to the current best. This work also includes an analysis of both the new algorithms showing they are 2-approximation algorithms

UNF Digital Commons

Importance of Explicit Vectorization for CPU and GPU Software Performance

Author: Allen
Anderson
Berg
Eichenberger
Firas Hamze
Hamze
Kamran Karimi
Karimi
Karimi
Kirk
Knuth
Marsaglia
Matsumoto
Metropolis
Neil G. Dickson
Owens
Preis
Samant
Scott
Suzuki
Tomov
Publication venue: 'Elsevier BV'
Publication date: 31/03/2010
Field of study

Much of the current focus in high-performance computing is on multi-threading, multi-computing, and graphics processing unit (GPU) computing. However, vectorization and non-parallel optimization techniques, which can often be employed additionally, are less frequently discussed. In this paper, we present an analysis of several optimizations done on both central processing unit (CPU) and GPU implementations of a particular computationally intensive Metropolis Monte Carlo algorithm. Explicit vectorization on the CPU and the equivalent, explicit memory coalescing, on the GPU are found to be critical to achieving good performance of this algorithm in both environments. The fully-optimized CPU version achieves a 9x to 12x speedup over the original CPU version, in addition to speedup from multi-threading. This is 2x faster than the fully-optimized GPU version.Comment: 17 pages, 17 figure

arXiv.org e-Print Archive