Search CORE

1,259 research outputs found

Data Mining the SDSS SkyServer Database

Author: Gray Jim
Kunszt Peter Z.
Slutz Don
Stoughton Christopher
Szalay Alex S.
Thakar Ani R.
vandenBerg Jan
Publication venue
Publication date: 12/02/2002
Field of study

An earlier paper (Szalay et. al. "Designing and Mining MultiTerabyte Astronomy Archives: The Sloan Digital Sky Survey," ACM SIGMOD 2000) described the Sloan Digital Sky Survey's (SDSS) data management needs by defining twenty database queries and twelve data visualization tasks that a good data management system should support. We built a database and interfaces to support both the query load and also a website for ad-hoc access. This paper reports on the database design, describes the data loading pipeline, and reports on the query implementation and performance. The queries typically translated to a single SQL statement. Most queries run in less than 20 seconds, allowing scientists to interactively explore the database. This paper is an in-depth tour of those queries. Readers should first have studied the companion overview paper Szalay et. al. "The SDSS SkyServer, Public Access to the Sloan Digital Sky Server Data" ACM SIGMOND 2002.Comment: 40 pages, Original source is at http://research.microsoft.com/~gray/Papers/MSR_TR_O2_01_20_queries.do

arXiv.org e-Print Archive

CERN Document Server

Forward Scan based Plane Sweep Algorithm for Parallel Interval Joins

Author: Bouros P
Mamoulis N
Publication venue: 'United States Sports Academy'
Publication date: 01/01/2017
Field of study

The interval join is a basic operation that finds application in temporal, spatial, and uncertain databases. Although a number of centralized and distributed algorithms have been proposed for the efficient evaluation of interval joins, classic plane sweep approaches have not been considered at their full potential. A recent piece of related work proposes an optimized approach based on plane sweep (PS) for modern hardware, showing that it greatly outperforms previous work. However, this approach depends on the development of a complex data structure and its parallelization has not been adequately studied. In this paper, we explore the applicability of a largely ignored forward scan (FS) based plane sweep algorithm, which is extremely simple to implement. We propose two optimizations of FS that greatly reduce its cost, making it competitive to the state-of-the-art single-threaded PS algorithm while achieving a lower memory footprint. In addition, we show the drawbacks of a previously proposed hash-based partitioning approach for parallel join processing and suggest a domain-based partitioning approach that does not produce duplicate results. Within our approach we propose a novel breakdown of the partition join jobs into a small number of independent mini-join jobs with varying cost and manage to avoid redundant comparisons. Finally, we show how these mini-joins can be scheduled in multiple CPU cores and propose an adaptive domain partitioning, aiming at load balancing. We include an experimental study that demonstrates the efficiency of our optimized FS and the scalability of our parallelization framework.published_or_final_versio

HKU Scholars Hub

Optimisation of partitioned temporal joins

Author: Zurek Thomas
Publication venue: The University of Edinburgh
Publication date: 01/01/1997
Field of study

Edinburgh Research Archive

Efficient permutation-based range-join algorithms on N-dimensionalmeshes using data-shifting

Author: Chen S.
Shen H.
Topor R.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2001
Field of study

©2001 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.In this paper, we present two efficient parallel algorithms for computing a non-equijoin, range-join, of two relations an N-dimensional mesh-connected computers. The proposed algorithms uses the data-shifting approach to effectively permute every sorted subset of relation S to each processor in turn recursively in dimensions from low to high, where it is joined with the local subset of relation RShao Dong Chen, Hong Shen, Rodeny Topo

Adelaide Research & Scholarship

Recommended from our members

Compiling Communication-Minimizing Query Plans

Author: Love Eric J
Publication venue: eScholarship, University of California
Publication date: 01/01/2019
Field of study

Because of the low arithmetic intensity of relational database operators, the performance of in-memory column stores ought to be bound by main-memory bandwidth, and in practice, highly-optimized operator implementations already achieve close to their peak theoretical performance. By itself, this would imply that hardware acceleration for analytics would be of limited utility, but I show that the emergence of full-query compilation presents new opportunities to reduce memory traffic and trade computation for communication, meaning that database-oriented processors may yet be worth designing.Moreover, the communication costs of queries on a given processor and memory hierarchy are determined by factors below the level of abstraction expressed in traditional query plans, such as how operators are (or are not) fused together, how execution is parallelized and cache-blocked, and how intermediate results are arranged in memory. I present a Scala- embedded programming language called Ressort that exposes these machine-level aspects of query compilation, and which emits parallel C++/OpenMP code as its target to express a greater range of algorithmic variants for each query than would be easy to study by hand

eScholarship - University of California