247 research outputs found

    Locality-Adaptive Parallel Hash Joins Using Hardware Transactional Memory

    Get PDF
    Previous work [1] has claimed that the best performing implementation of in-memory hash joins is based on (radix-)partitioning of the build-side input. Indeed, despite the overhead of partitioning, the benefits from increased cache-locality and synchronization free parallelism in the build-phase outweigh the costs when the input data is randomly ordered. However, many datasets already exhibit significant spatial locality (i.e., non-randomness) due to the way data items enter the database: through periodic ETL or trickle loaded in the form of transactions. In such cases, the first benefit of partitioning — increased locality — is largely irrelevant. In this paper, we demonstrate how hardware transactional memory (HTM) can render the other benefit, freedom from synchronization, irrelevant as well. Specifically, using careful analysis and engineering, we develop an adaptive hash join implementation that outperforms parallel radix-partitioned hash joins as well as sort-merge joins on data with high spatial locality. In addition, we show how, through lightweight (less than 1% overhead) runtime monitoring of the transaction abort rate, our implementation can detect inputs with low spatial locality and dynamically fall back to radix-partitioning of the build-side input. The result is a hash join implementation that is more than 3 times faster than the state-of-the-art on high-locality data and never more than 1% slower

    Non-invasive progressive optimization for in-memory databases

    Get PDF
    Progressive optimization introduces robustness for database workloads against wrong estimates, skewed data, correlated attributes, or outdated statistics. Previous work focuses on cardinality estimates and rely on expensive counting methods as well as complex learning algorithms. In this paper, we utilize performance counters to drive progressive optimization during query execution. The main advantages are that performance counters introduce virtually no costs on modern CPUs and their usage enables a non-invasive monitoring. We present fine-grained cost models to detect differences between estimates and actual costs which enables us to kick-start reoptimization. Based on our cost models, we implement an optimization approach that estimates the individual selectivities of a multi-selection query efficiently. Furthermore, we are able to learn properties like sortedness, skew, or correlation during run-time. In our evaluation we show, that the overhead of our approach is negligible, while performance improvements are convincing. Using progressive optimization, we improve runtime up to a factor of three compared to average run-times and up to a factor of 4,5 compared to worst case run-times. As a result, we avoid costly operator execution orders and; thus, making query execution highly robust

    Voodoo - a vector algebra for portable database performance on modern hardware

    Get PDF
    In-memory databases require careful tuning and many engineering tricks to achieve good performance. Such database performance engineering is hard: a plethora of data and hardware-dependent optimization techniques form a design space that is difficult to navigate for a skilled engineer --- even more so for a query compiler. To facilitate performance-oriented design exploration and query plan compilation, we present Voodoo, a declarative intermediate algebra that abstracts the detailed architectural properties of the hardware, such as multi- or many-core architectures, caches and SIMD registers, without losing the ability to generate highly tuned code. Because it consists of a collection of declarative, vector-oriented operations, Voodoo is easier to reason about and tune than low-level C and related hardware-focused extensions (Intrinsics, OpenCL, CUDA, etc.). This enables our Voodoo compiler to produce (OpenCL) code that rivals and even outperforms the fastest state-of-the-art in memory databases for both GPUs and CPUs. In addition, Voodoo makes it possible to express techniques as diverse as cache-conscious processing, predication and vectorization (again on both GPUs and CPUs) with just a few lines of code. Central to our approach is a novel idea we termed control vectors, which allows a code generating frontend to expose parallelism to the Voodoo compiler in a abstract manner, enabling portable performance across hardware platforms. We used Voodoo to build an alternative backend for MonetDB, a popular open-source in-memory database. Our backend allows MonetDB to perform at the same level as highly tuned in-memory databases, including HyPeR and Ocelot. We also demonstrate Voodoo's usefulness when investigating hardware conscious tuning techniques, assessing their performance on different queries, devices and data

    Accelerating Foreign-Key Joins using Asymmetric Memory Channels

    Get PDF
    Indexed Foreign-Key Joins expose a very asymmetric access pattern: the Foreign-Key Index is sequentially scanned whilst the Primary-Key table is target of many quasi-random lookups which is the dominant cost factor. To reduce the costs of the random lookups the fact-table can be (re-) partitioned at runtime to increase access locality on the dimension table, and thus limit the random memory access to inside the CPU's cache. However, this is very hard to optimize and the performance impact on recent architectures is limited because the partitioning costs consume most of the achievable join improvement. GPGPUs on the other hand have an architecture that is well suited for this operation: a relatively slow connection to the large system memory and a very fast connection to the smaller internal device memory. We show how to accelerate Foreign-Key Joins by executing the random table lookups on the GPU's VRAM while sequentially streaming the Foreign- Key-Index through the PCI-E Bus. We also experimentally study the memory access costs on GPU and CPU to provide estimations of the benefit of this technique

    Hadamard States and Adiabatic Vacua

    Full text link
    Reversing a slight detrimental effect of the mailer related to TeXabilityComment: 10pages, LaTeX (RevTeX-preprint style

    X-Device Query Processing by Bitwise Distribution

    Get PDF
    The diversity of hardware components within a single system calls for strategies for efficient cross-device data processing. For exam- ple, existing approaches to CPU/GPU co-processing distribute individual relational operators to the “most appropriate” device. While pleasantly simple, this strategy has a number of problems: it may leave the “inappropriate” devices idle while overloading the “appropriate” device and putting a high pressure on the PCI bus. To address these issues we distribute data among the devices by par- tially decomposing relations at the granularity of individual bits. Each of the resulting bit-partitions is stored and processed on one of the available devices. Using this strategy, we implemented a processor for spatial range queries that makes efficient use of all available devices. The performance gains achieved indicate that bitwise distribution makes a good cross-device processing strategy

    A scientific note on the natural merger of two honeybee colonies (Apis mellifera capensis)

    Get PDF
    Natural mergers of honeybee colonies are commonplace in tropical Africa (Hepburn and Radloff, 1998), but their consequences on organizational structure are unknown. Here we determine the spatial distribution and division of labor of workers (Apis mellifera capensis Esch.) following a merger of two colonies. Two unrelated colonies (each ~3000 bees) were placed in threeframe observation hives. When workers emerged from the sealed brood of each colony, they were individually labeled and reintroduced into their respective mother hives. They are referred to as cohorts Aand B, each comprising 300 workers of the same age. The behaviors and positions of all labeled workers and queens were recorded twice daily for 24 days (Kolmes, 1989; Pirk et al., 2000). On day 14 colony B was dequeened, left its nest and merged with colony A on day 15

    CO2 and CH4 exchanges between moist moss tundra and atmosphere on Kapp Linne, Svalbard

    Get PDF
    We measured CO2 and CH4 fluxes using chambers and eddy covariance (only CO2) from a moist moss tundra in Svalbard. The average net ecosystem exchange (NEE) during the summer (9 June-31 August) was negative (sink), with -0.139 +/- 0.032 mu mol m(-2) s(-1) corresponding to -11.8 g C m(-2) for the whole summer. The cumulated NEE over the whole growing season (day no. 160 to 284) was -2.5 g C m(-2). The CH4 flux during the summer period showed a large spatial and temporal variability. The mean value of all 214 samples was 0.000511 +/- 0.000315 mu mol m(-2) s(-1), which corresponds to a growing season estimate of 0.04 to 0.16 g CH4 m(-2). Thus, we find that this moss tundra ecosystem is closely in balance with the atmosphere during the growing season when regarding exchanges of CO2 and CH4. The sink of CO2 and the source of CH4 are small in comparison with other tundra ecosystems in the high Arctic.Air temperature, soil moisture and the greenness index contributed significantly to explaining the variation in ecosystem respiration (R-eco), while active layer depth, soil moisture and the greenness index were the variables that best explained CH4 emissions. An estimate of temperature sensitivity of Reco and gross primary productivity (GPP) showed that the sensitivity is slightly higher for GPP than for R-eco in the interval 0-4.5 degrees C; thereafter, the difference is small up to about 6 degrees C and then begins to rise rapidly for R-eco. The consequence of this, for a small increase in air temperature of 1 degrees (all other variables assumed unchanged), was that the respiration increased more than photosynthesis turning the small sink into a small source (4.5 g C m(-2)) during the growing season. Thus, we cannot rule out that the reason why the moss tundra is close to balance today is an effect of the warming that has already taken place in Svalbard

    Scalable Generation of Synthetic GPS Traces with Real-life Data Characteristics

    Get PDF
    Database benchmarking is most valuable if real-life data and workloads are available. However, real-life data (and workloads) are often not publicly available due to IPR constraints or privacy concerns. And even if available, they are often limited regarding scalability and variability of data characteristics. On the oth

    Instant-on scientific data warehouses: Lazy ETL for data-intensive research

    Get PDF
    In the dawning era of data intensive research, scientific discovery deploys data analysis techniques similar to those that drive business intelligence. Similar to classical Extract, Transform and Load (ETL) processes, data is loaded entirely from external data sources (repositories) into a scientific data warehouse before it can be analyzed. This process is both, time and resource intensive and may not be entirely necessary if only a subset of the data is of interest to a particular user. To overcome this problem, we propose a novel technique to lower the costs for data loading: Lazy ETL. Data is extracted and loaded transparently on-the-fly only for the required data items. Extensive experiments demonstrate the significant reduction of the time from source data availability to query answer compared to state-of-the-art solutions. In addition to reducing the costs for bootstrapping a scientific data warehouse, our approach also reduces the costs for loading new incoming data
    • …
    corecore