2,591 research outputs found
Recommended from our members
A Generalization of Band Joins and the Merge-Purge Problem
The problem of merging multiple databases of information about common entities is frequently encountered in large commercial and government organizations. The problem we study is often called the Merge/Purge problem and is difficult to solve both in scale and accuracy. Large repositories of data always have numerous duplicate information entries about the same entities that are difficult to cull together without an intelligent "equational theory" that identifies equivalent items by a complex, domain dependent matching process. We have developed a system for accomplishing this task for lists of names of potential customers in a direct marketing-type application. Our results for statistically generated data are shown to be accurate and effective when processing the data multiple times using different keys for sorting. The system provides a rule programming module that is easy to program and quite good at finding duplicates especially in an environment with massive amounts of data
A Memory Bandwidth-Efficient Hybrid Radix Sort on GPUs
Sorting is at the core of many database operations, such as index creation,
sort-merge joins, and user-requested output sorting. As GPUs are emerging as a
promising platform to accelerate various operations, sorting on GPUs becomes a
viable endeavour. Over the past few years, several improvements have been
proposed for sorting on GPUs, leading to the first radix sort implementations
that achieve a sorting rate of over one billion 32-bit keys per second. Yet,
state-of-the-art approaches are heavily memory bandwidth-bound, as they require
substantially more memory transfers than their CPU-based counterparts.
Our work proposes a novel approach that almost halves the amount of memory
transfers and, therefore, considerably lifts the memory bandwidth limitation.
Being able to sort two gigabytes of eight-byte records in as little as 50
milliseconds, our approach achieves a 2.32-fold improvement over the
state-of-the-art GPU-based radix sort for uniform distributions, sustaining a
minimum speed-up of no less than a factor of 1.66 for skewed distributions.
To address inputs that either do not reside on the GPU or exceed the
available device memory, we build on our efficient GPU sorting approach with a
pipelined heterogeneous sorting algorithm that mitigates the overhead
associated with PCIe data transfers. Comparing the end-to-end sorting
performance to the state-of-the-art CPU-based radix sort running 16 threads,
our heterogeneous approach achieves a 2.06-fold and a 1.53-fold improvement for
sorting 64 GB key-value pairs with a skewed and a uniform distribution,
respectively.Comment: 16 pages, accepted at SIGMOD 201
Efficient permutation-based range-join algorithms on N-dimensionalmeshes using data-shifting
©2001 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.In this paper, we present two efficient parallel algorithms for computing a non-equijoin, range-join, of two relations an N-dimensional mesh-connected computers. The proposed algorithms uses the data-shifting approach to effectively permute every sorted subset of relation S to each processor in turn recursively in dimensions from low to high, where it is joined with the local subset of relation RShao Dong Chen, Hong Shen, Rodeny Topo
Forecasting the cost of processing multi-join queries via hashing for main-memory databases (Extended version)
Database management systems (DBMSs) carefully optimize complex multi-join
queries to avoid expensive disk I/O. As servers today feature tens or hundreds
of gigabytes of RAM, a significant fraction of many analytic databases becomes
memory-resident. Even after careful tuning for an in-memory environment, a
linear disk I/O model such as the one implemented in PostgreSQL may make query
response time predictions that are up to 2X slower than the optimal multi-join
query plan over memory-resident data. This paper introduces a memory I/O cost
model to identify good evaluation strategies for complex query plans with
multiple hash-based equi-joins over memory-resident data. The proposed cost
model is carefully validated for accuracy using three different systems,
including an Amazon EC2 instance, to control for hardware-specific differences.
Prior work in parallel query evaluation has advocated right-deep and bushy
trees for multi-join queries due to their greater parallelization and
pipelining potential. A surprising finding is that the conventional wisdom from
shared-nothing disk-based systems does not directly apply to the modern
shared-everything memory hierarchy. As corroborated by our model, the
performance gap between the optimal left-deep and right-deep query plan can
grow to about 10X as the number of joins in the query increases.Comment: 15 pages, 8 figures, extended version of the paper to appear in
SoCC'1
The HELLS-Join: A Heterogeneous Stream join for ExtremeLy Large windows
Upcoming processors are combining different computing units in a tightly-coupled approach using a unified shared memory hierarchy. This tightly-coupled combination leads to novel properties with regard to cooperation and interaction. This paper demonstrates the advantages of those processors for a stream-join operator as an important data-intensive example. In detail, we propose our HELLS-Join approach employing all heterogeneous devices by outsourcing parts of the algorithm on the appropriate device. Our HELLS-Join performs better than CPU stream joins, allowing wider time windows, higher stream frequencies, and more streams to be joined as before
Faster Radix Sort via Virtual Memory and Write-Combining
Sorting algorithms are the deciding factor for the performance of common
operations such as removal of duplicates or database sort-merge joins. This
work focuses on 32-bit integer keys, optionally paired with a 32-bit value. We
present a fast radix sorting algorithm that builds upon a
microarchitecture-aware variant of counting sort. Taking advantage of virtual
memory and making use of write-combining yields a per-pass throughput
corresponding to at least 88 % of the system's peak memory bandwidth. Our
implementation outperforms Intel's recently published radix sort by a factor of
1.5. It also compares favorably to the reported performance of an algorithm for
Fermi GPUs when data-transfer overhead is included. These results indicate that
scalar, bandwidth-sensitive sorting algorithms remain competitive on current
architectures. Various other memory-intensive applications can benefit from the
techniques described herein
- …