1,381 research outputs found
A Memory Bandwidth-Efficient Hybrid Radix Sort on GPUs
Sorting is at the core of many database operations, such as index creation,
sort-merge joins, and user-requested output sorting. As GPUs are emerging as a
promising platform to accelerate various operations, sorting on GPUs becomes a
viable endeavour. Over the past few years, several improvements have been
proposed for sorting on GPUs, leading to the first radix sort implementations
that achieve a sorting rate of over one billion 32-bit keys per second. Yet,
state-of-the-art approaches are heavily memory bandwidth-bound, as they require
substantially more memory transfers than their CPU-based counterparts.
Our work proposes a novel approach that almost halves the amount of memory
transfers and, therefore, considerably lifts the memory bandwidth limitation.
Being able to sort two gigabytes of eight-byte records in as little as 50
milliseconds, our approach achieves a 2.32-fold improvement over the
state-of-the-art GPU-based radix sort for uniform distributions, sustaining a
minimum speed-up of no less than a factor of 1.66 for skewed distributions.
To address inputs that either do not reside on the GPU or exceed the
available device memory, we build on our efficient GPU sorting approach with a
pipelined heterogeneous sorting algorithm that mitigates the overhead
associated with PCIe data transfers. Comparing the end-to-end sorting
performance to the state-of-the-art CPU-based radix sort running 16 threads,
our heterogeneous approach achieves a 2.06-fold and a 1.53-fold improvement for
sorting 64 GB key-value pairs with a skewed and a uniform distribution,
respectively.Comment: 16 pages, accepted at SIGMOD 201
Massiv-Parallele Algorithmen zum Laden von Daten auf Moderner Hardware
While systems face an ever-growing amount of data that needs to be ingested, queried and analysed, processors are seeing only moderate improvements in sequential processing performance. This thesis addresses the fundamental shift towards increasingly parallel processors and contributes multiple massively parallel algorithms to accelerate different stages of the ingestion pipeline, such as data parsing and sorting.Systeme sehen sich mit einer stetig anwachsenden Menge an Daten konfrontiert, die geladen und analysiert, sowie Anfragen darauf bearbeitet werden müssen. Gleichzeitig nimmt die sequentielle Verarbeitungsgeschwindigkeit von Prozessoren nur noch moderat zu. Diese Arbeit adressiert den Wandel hin zu zunehmend parallelen Prozessoren und leistet mit mehreren massiv-parallelen Algorithmen einen Beitrag um unterschiedliche Phasen der Datenverarbeitung wie zum Beispiel Parsing und Sortierung zu beschleunigen
Frequent itemset mining in big data with effective single scan algorithms
© 2013 IEEE. This paper considers frequent itemsets mining in transactional databases. It introduces a new accurate single scan approach for frequent itemset mining (SSFIM), a heuristic as an alternative approach (EA-SSFIM), as well as a parallel implementation on Hadoop clusters (MR-SSFIM). EA-SSFIM and MR-SSFIM target sparse and big databases, respectively. The proposed approach (in all its variants) requires only one scan to extract the candidate itemsets, and it has the advantage to generate a fixed number of candidate itemsets independently from the value of the minimum support. This accelerates the scan process compared with existing approaches while dealing with sparse and big databases. Numerical results show that SSFIM outperforms the state-of-the-art FIM approaches while dealing with medium and large databases. Moreover, EA-SSFIM provides similar performance as SSFIM while considerably reducing the runtime for large databases. The results also reveal the superiority of MR-SSFIM compared with the existing HPC-based solutions for FIM using sparse and big databases
SwiftSpatial: Spatial Joins on Modern Hardware
Spatial joins are among the most time-consuming queries in spatial data
management systems. In this paper, we propose SwiftSpatial, a specialized
accelerator architecture tailored for spatial joins. SwiftSpatial contains
multiple high-performance join units with innovative hybrid parallelism,
several efficient memory management units, and an integrated on-chip join
scheduler. We prototype SwiftSpatial on an FPGA and incorporate the R-tree
synchronous traversal algorithm as the control flow. Benchmarked against
various CPU and GPU-based spatial data processing systems, SwiftSpatial
demonstrates a latency reduction of up to 5.36x relative to the best-performing
baseline, while requiring 6.16x less power. The remarkable performance and
energy efficiency of SwiftSpatial lay a solid foundation for its future
integration into spatial data management systems, both in data centers and at
the edge
SS-ITS: secure scalable intelligent transportation systems
submittedVersio
- …