Search CORE

687 research outputs found

Parallel String Sample Sort

Author: J. Kärkkäinen
J. Wassenberg
K. Mehlhorn
P. Sanders
P.M. McIlroy
R. Sinha
R. Sinha
R. Sinha
T. Hagerup
W. Ng
Publication venue
Publication date: 01/01/2013
Field of study

arXiv.org e-Print Archive

CiteSeerX

Crossref

KITopen

Engineering Parallel String Sorting

Author: Bingmann Timo
Eberle Andreas
Sanders Peter
Publication venue
Publication date: 09/03/2014
Field of study

We discuss how string sorting algorithms can be parallelized on modern multi-core shared memory machines. As a synthesis of the best sequential string sorting algorithms and successful parallel sorting algorithms for atomic objects, we first propose string sample sort. The algorithm makes effective use of the memory hierarchy, uses additional word level parallelism, and largely avoids branch mispredictions. Then we focus on NUMA architectures, and develop parallel multiway LCP-merge and -mergesort to reduce the number of random memory accesses to remote nodes. Additionally, we parallelize variants of multikey quicksort and radix sort that are also useful in certain situations. Comprehensive experiments on five current multi-core platforms are then reported and discussed. The experiments show that our implementations scale very well on real-world inputs and modern machines.Comment: 46 pages, extension of "Parallel String Sample Sort" arXiv:1305.115

arXiv.org e-Print Archive

CiteSeerX

Crossref

KITopen

The Case for Learned Index Structures

Author: Abadi M.
Armbrust M.
Böhm M.
Chang F.
Goodfellow I.
Grossi R.
Lehman T. J.
Litwin W.
Magdon-Ismail M.
Miller D. J.
Moerkotte G.
Sutskever I.
You S.
Publication venue
Publication date: 30/04/2018
Field of study

Indexes are models: a B-Tree-Index can be seen as a model to map a key to the position of a record within a sorted array, a Hash-Index as a model to map a key to a position of a record within an unsorted array, and a BitMap-Index as a model to indicate if a data record exists or not. In this exploratory research paper, we start from this premise and posit that all existing index structures can be replaced with other types of models, including deep-learning models, which we term learned indexes. The key idea is that a model can learn the sort order or structure of lookup keys and use this signal to effectively predict the position or existence of records. We theoretically analyze under which conditions learned indexes outperform traditional index structures and describe the main challenges in designing learned index structures. Our initial results show, that by using neural nets we are able to outperform cache-optimized B-Trees by up to 70% in speed while saving an order-of-magnitude in memory over several real-world data sets. More importantly though, we believe that the idea of replacing core components of a data management system through learned models has far reaching implications for future systems designs and that this work just provides a glimpse of what might be possible

arXiv.org e-Print Archive

Crossref

PatTrieSort - External String Sorting based on Patricia Tries

Author: Christopher Blochwitz
Dennis Heinrich
Stefan Werner
Sven Groppe
Thilo Pionteck
Publication venue: RonPub
Publication date: 01/01/2015
Field of study

External merge sort belongs to the most efficient and widely used algorithms to sort big data: As much data as fits inside is sorted in main memory and afterwards swapped to external storage as so called initial run. After sorting all the data in this way block-wise, the initial runs are merged in a merging phase in order to retrieve the final sorted run containing the completely sorted original data. Patricia tries are one of the most space-efficient ways to store strings especially those with common prefixes. Hence, we propose to use patricia tries for initial run generation in an external merge sort variant, such that initial runs can become large compared to traditional external merge sort using the same main memory size. Furthermore, we store the initial runs as patricia tries instead of lists of sorted strings. As we will show in this paper, patricia tries can be efficiently merged having a superior performance in comparison to merging runs of sorted strings. We complete our discussion with a complexity analysis as well as a comprehensive performance evaluation, where our new approach outperforms traditional external merge sort by a factor of 4 for sorting over 4 billion strings of real world data

RonPub -- Research Online Publishing

Constructing Large-Scale Semantic Web Indices for the Six RDF Collation Orders

Author: Christopher Blochwitz
Dennis Heinrich
Sven Groppe
Thilo Pionteck
Publication venue: RonPub
Publication date: 01/01/2016
Field of study

The Semantic Web community collects masses of valuable and publicly available RDF data in order to drive the success story of the Semantic Web. Efficient processing of these datasets requires their indexing. Semantic Web indices make use of the simple data model of RDF: The basic concept of RDF is the triple, which hence has only 6 different collation orders. On the one hand having 6 collation orders indexed fast merge joins (consuming the sorted input of the indices) can be applied as much as possible during query processing. On the other hand constructing the indices for 6 different collation orders is very time-consuming for large-scale datasets. Hence the focus of this paper is the efficient Semantic Web index construction for large-scale datasets on today's multi-core computers. We complete our discussion with a comprehensive performance evaluation, where our approach efficiently constructs the indices of over 1 billion triples of real world data

RonPub -- Research Online Publishing

Hyperion: Building the largest in-memory search tree

Author: Andre Brinkmann (7168145)
Lars Nagel (4482649)
Lingfang Zeng (7168142)
Markus Masker (7168133)
Tim Suss (7168136)
Publication venue
Publication date: 01/01/2019
Field of study

Indexes are essential in data management systems to increase the speed of data retrievals. Widespread data structures to provide fast and memory-efficient indexes are prefix tries. Implementations like Judy, ART, or HOT optimize their internal alignments for cache and vector unit efficiency. While these measures usually improve the performance substantially, they can have a negative impact on memory efficiency. In this paper we present Hyperion, a trie-based main-memory key-value store achieving extreme space efficiency. In contrast to other data structures, Hyperion does not depend on CPU vector units, but scans the data structure linearly. Combined with a custom memory allocator, Hyperion accomplishes a remarkable data density while achieving a competitive point query and an exceptional range query performance. Hyperion can significantly reduce the index memory footprint, while being at least two times better concerning the performance to memory ratio compared to the best implemented alternative strategies for randomized string data sets

Loughborough University Institutional Repository

In search of the fastest sorting algorithm

Author: Attard Cassar Emmanuel
Connections
Publication venue: University of Malta. Junior College
Publication date: 01/01/2018
Field of study

This paper explores in a chronological way the concepts, structures, and algorithms that programmers and computer scientists have tried out in their attempt to produce and improve the process of sorting. The measure of ‘fastness’ in this paper is mainly given in terms of the Big O notation.peer-reviewe

OAR@UM