140 research outputs found
Engineering Parallel String Sorting
We discuss how string sorting algorithms can be parallelized on modern
multi-core shared memory machines. As a synthesis of the best sequential string
sorting algorithms and successful parallel sorting algorithms for atomic
objects, we first propose string sample sort. The algorithm makes effective use
of the memory hierarchy, uses additional word level parallelism, and largely
avoids branch mispredictions. Then we focus on NUMA architectures, and develop
parallel multiway LCP-merge and -mergesort to reduce the number of random
memory accesses to remote nodes. Additionally, we parallelize variants of
multikey quicksort and radix sort that are also useful in certain situations.
Comprehensive experiments on five current multi-core platforms are then
reported and discussed. The experiments show that our implementations scale
very well on real-world inputs and modern machines.Comment: 46 pages, extension of "Parallel String Sample Sort" arXiv:1305.115
Recommended from our members
The analysis and synthesis of a parallel sorting engine
This thesis is concerned with the development of a unique
parallel sort-merge system suitable for implementation in VLSI.
Two new sorting subsystems, a high performance VLSI sorter and a
four-way merger, were also realized during the development
process. In addition, the analysis of several existing parallel sorting
architectures and algorithms was carried out.
Algorithmic time complexity, VLSI processor performance, and
chip area requirements for the existing sorting systems were
evaluated. The rebound sorting algorithm was determined to be the
most efficient among those considered. The rebound sorter
algorithm was implemented in hardware as a systolic array with
external expansion capability.
The second phase of the research involved analyzing several
parallel merge algorithms and their buffer management schemes.
The dominant considerations for this phase of the research were the
achievement of minimum VLSI chip area, design complexity, and logic delay. It was determined that the proposed merger
architecture could be implemented in several ways. Selecting the
appropriate microarchitecture for the merger, given the constraints
of chip area and performance, was the major problem. The tradeoffs
associated with this process are outlined.
Finally, a pipelined sort-merge system was implemented in
VLSI by combining a rebound sorter and a four-way merger on a
single chip. The final chip size was 416 mils by 432 mils. Two
micron CMOS technology was utilized in this chip realization. An
overall throughput rate of 10M bytes/sec was achieved. The
prototype system developed is capable of sorting thirty two 2-byte
keys during each merge phase. If extended, this system is capable of
economically sorting files of 100M bytes or more in size. In order to
sort larger files, this design should be incorporated in a disk-based
sort-merge system. A simplified disk I/O access model for such a
system was studied. In this study the sort-merge system was
assumed to be part of a disk controller subsystem
The Geometry of Tree-Based Sorting
We study the connections between sorting and the binary search tree (BST) model, with an aim towards showing that the fields are connected more deeply than is currently appreciated. While any BST can be used to sort by inserting the keys one-by-one, this is a very limited relationship and importantly says nothing about parallel sorting. We show what we believe to be the first formal relationship between the BST model and sorting. Namely, we show that a large class of sorting algorithms, which includes mergesort, quicksort, insertion sort, and almost every instance-optimal sorting algorithm, are equivalent in cost to offline BST algorithms. Our main theoretical tool is the geometric interpretation of the BST model introduced by Demaine et al. [Demaine et al., 2009], which finds an equivalence between searches on a BST and point sets in the plane satisfying a certain property. To give an example of the utility of our approach, we introduce the log-interleave bound, a measure of the information-theoretic complexity of a permutation ?, which is within a lg lg n multiplicative factor of a known lower bound in the BST model; we also devise a parallel sorting algorithm with polylogarithmic span that sorts a permutation ? using comparisons proportional to its log-interleave bound. Our aforementioned result on sorting and offline BST algorithms can be used to show existence of an offline BST algorithm whose cost is within a constant factor of the log-interleave bound of any permutation ?
Protein microenvironments for topology analysis
Previously held under moratorium from 1st December 2016 until 1st December 2021Amino Acid Residues are often the focus of research on protein structures. However, in a folded protein, each residue finds itself in an environment that is defined
by the properties of its surrounding residues. The term microenvironment is used
herein to refer to these local ensembles. Not only do they have chemical properties but also topological properties which quantify concepts such as density,
boundaries between domains and junction complexity. These quantifications are
used to project a protein’s backbone structure into a series of scores.
The hypothesis was that these sequences of scores can be used to discover protein
domains and motifs and that they can be used to align and compare groups of
3D protein structures.
This research sought to implement a system that could efficiently compute microenvironments such that they can be applied routinely to large datasets. The
computation of the microenvironments was the most challenging aspect in terms
of performance, and the optimisations required are described.
Methods of scoring microenvironments were developed to enable the extraction
of domain and motif data without 3D alignment. The problem of allosteric site
detection was addressed with a classifier that gave high rates of allosteric site
detection.
Overall, this work describes the development of a system that scales well with
increasing dataset sizes. It builds on existing techniques, in order to automatically detect the boundaries of domains and demonstrates the ability to process
large datasets by application to allosteric site detection, a problem that has not
previously been adequately solved.Amino Acid Residues are often the focus of research on protein structures. However, in a folded protein, each residue finds itself in an environment that is defined
by the properties of its surrounding residues. The term microenvironment is used
herein to refer to these local ensembles. Not only do they have chemical properties but also topological properties which quantify concepts such as density,
boundaries between domains and junction complexity. These quantifications are
used to project a protein’s backbone structure into a series of scores.
The hypothesis was that these sequences of scores can be used to discover protein
domains and motifs and that they can be used to align and compare groups of
3D protein structures.
This research sought to implement a system that could efficiently compute microenvironments such that they can be applied routinely to large datasets. The
computation of the microenvironments was the most challenging aspect in terms
of performance, and the optimisations required are described.
Methods of scoring microenvironments were developed to enable the extraction
of domain and motif data without 3D alignment. The problem of allosteric site
detection was addressed with a classifier that gave high rates of allosteric site
detection.
Overall, this work describes the development of a system that scales well with
increasing dataset sizes. It builds on existing techniques, in order to automatically detect the boundaries of domains and demonstrates the ability to process
large datasets by application to allosteric site detection, a problem that has not
previously been adequately solved
Slider—maximum use of probability information for alignment of short sequence reads and SNP detection
Motivation: A plethora of alignment tools have been created that are designed to best fit different types of alignment conditions. While some of these are made for aligning Illumina Sequence Analyzer reads, none of these are fully utilizing its probability (prb) output. In this article, we will introduce a new alignment approach (Slider) that reduces the alignment problem space by utilizing each read base's probabilities given in the prb files
Smooth heaps and a dual view of self-adjusting data structures
We present a new connection between self-adjusting binary search trees (BSTs)
and heaps, two fundamental, extensively studied, and practically relevant
families of data structures. Roughly speaking, we map an arbitrary heap
algorithm within a natural model, to a corresponding BST algorithm with the
same cost on a dual sequence of operations (i.e. the same sequence with the
roles of time and key-space switched). This is the first general transformation
between the two families of data structures.
There is a rich theory of dynamic optimality for BSTs (i.e. the theory of
competitiveness between BST algorithms). The lack of an analogous theory for
heaps has been noted in the literature. Through our connection, we transfer all
instance-specific lower bounds known for BSTs to a general model of heaps,
initiating a theory of dynamic optimality for heaps.
On the algorithmic side, we obtain a new, simple and efficient heap
algorithm, which we call the smooth heap. We show the smooth heap to be the
heap-counterpart of Greedy, the BST algorithm with the strongest proven and
conjectured properties from the literature, widely believed to be
instance-optimal. Assuming the optimality of Greedy, the smooth heap is also
optimal within our model of heap algorithms. As corollaries of results known
for Greedy, we obtain instance-specific upper bounds for the smooth heap, with
applications in adaptive sorting.
Intriguingly, the smooth heap, although derived from a non-practical BST
algorithm, is simple and easy to implement (e.g. it stores no auxiliary data
besides the keys and tree pointers). It can be seen as a variation on the
popular pairing heap data structure, extending it with a "power-of-two-choices"
type of heuristic.Comment: Presented at STOC 2018, light revision, additional figure
GPU-aided edge computing for processing the k nearest-neighbor query on SSD-resident data
Edge computing aims at improving performance by storing and processing data closer to their source. The Nearest-Neighbor (-NN) query is a common spatial query in several applications. For example, this query can be used for distance classification of a group of points against a big reference dataset to derive the dominating feature class. Typically, GPU devices have much larger numbers of processing cores than CPUs and faster device memory than main memory accessed by CPUs, thus, providing higher computing power. However, since device and/or main memory may not be able to host an entire reference dataset, the use of secondary storage is inevitable. Solid State Disks (SSDs) could be used for storing such a dataset. In this paper, we propose an architecture of a distributed edge-computing environment where large-scale processing of the -NN query can be accomplished by executing an efficient algorithm for processing the -NN query on its (GPU and SSD enabled) edge nodes. We also propose a new algorithm for this purpose, a GPU-based partitioning algorithm for processing the -NN query on big reference data stored on SSDs. We implement this algorithm in a GPU-enabled edge-computing device, hosting reference data on an SSD. Using synthetic datasets, we present an extensive experimental performance comparison of the new algorithm against two existing ones (working on memory-resident data) proposed by other researchers and two existing ones (working on SSD-resident data) recently proposed by us. The new algorithm excels in all the conducted experiments and outperforms its competitors
- …