Search CORE

208 research outputs found

Recommended from our members

Complex Query Operators on Modern Parallel Architectures

Author: Zois Vasileios
Publication venue: eScholarship, University of California
Publication date: 01/01/2019
Field of study

Identifying interesting objects from a large data collection is a fundamental problem for multi-criteria decision making applications.In Relational Database Management Systems (RDBMS), the most popular complex query operators used to solve this type of problem are the Top-K selection operator and the Skyline operator.Top-K selection is tasked with retrieving the k-highest ranking tuples from a given relation, as determined by a user-defined aggregation function.Skyline selection retrieves those tuples with attributes offering (pareto) optimal trade-offs in a given relation.Efficient Top-K query processing entails minimizing tuple evaluations by utilizing elaborate processing schemes combined with sophisticated data structures that enable early termination.Skyline query evaluation involves supporting processing strategies which are geared towards early termination and incomparable tuple pruning.The rapid increase in memory capacity and decreasing costs have been the main drivers behind the development of main-memory database systems.Although the act of migrating query processing in-memory has created many opportunities to improve the associated query latency, attaining such improvements has been very challenging due to the growing gap between processor and main memory speeds.Addressing this limitation has been made easier by the rapid proliferation of multi-core and many-core architectures.However, their utilization in real systems has been hindered by the lack of suitable parallel algorithms that focus on algorithmic efficiency.In this thesis, we study in depth the Top-K and Skyline selection operators, in the context of emerging parallel architectures.Our ultimate goal is to provide practical guidelines for developing work-efficient algorithms suitable for parallel main memory processing.We concentrate on multi-core (CPU), many-core (GPU), and processing-in-memory architectures (PIM), developing solutions optimized for high throughout and low latency.The first part of this thesis focuses on Top-K selection, presenting the specific details of early termination algorithms that we developed specifically for parallel architectures and various types of accelerators (i.e. GPU, PIM).The second part of this thesis, concentrates on Skyline selection and the development of a massively parallel load balanced algorithm for PIM architectures.Our work consolidates performance results across different parallel architectures using synthetic and real data on variable query parameters and distributions for both of the aforementioned problems.The experimental results demonstrate several orders of magnitude better throughput and query latency, thus validating the effectiveness of our proposed solutions for the Top-K and Skyline selection operators

eScholarship - University of California

Lanczos eigensolution method for high-performance computers

Author: Bostic Susan W.
Publication venue
Publication date
Field of study

The theory, computational analysis, and applications are presented of a Lanczos algorithm on high performance computers. The computationally intensive steps of the algorithm are identified as: the matrix factorization, the forward/backward equation solution, and the matrix vector multiples. These computational steps are optimized to exploit the vector and parallel capabilities of high performance computers. The savings in computational time from applying optimization techniques such as: variable band and sparse data storage and access, loop unrolling, use of local memory, and compiler directives are presented. Two large scale structural analysis applications are described: the buckling of a composite blade stiffened panel with a cutout, and the vibration analysis of a high speed civil transport. The sequential computational time for the panel problem executed on a CONVEX computer of 181.6 seconds was decreased to 14.1 seconds with the optimized vector algorithm. The best computational time of 23 seconds for the transport problem with 17,000 degs of freedom was on the the Cray-YMP using an average of 3.63 processors

NASA Technical Reports Server

Computational methods and software systems for dynamics and control of large space structures

Author: Farhat C.
Felippa C. A.
Park K. C.
Pramono E.
Publication venue
Publication date
Field of study

Two key areas of crucial importance to the computer-based simulation of large space structures are discussed. The first area involves multibody dynamics (MBD) of flexible space structures, with applications directed to deployment, construction, and maneuvering. The second area deals with advanced software systems, with emphasis on parallel processing. The latest research thrust in the second area involves massively parallel computers

NASA Technical Reports Server

Viscous Shock Capturing in a Time-Explicit Discontinuous Galerkin Method

Author: A. Klöckner
Arnold
Arora
Barter
Bogacki
Burbeau
Burman
Cockburn
Cockburn
Cockburn
Cockburn
Cockburn
Dolejsi
Dormand
Dubiner
Efraimsson
Ern
Feistauer
Flaherty
Gottlieb
Gresho
Guermond
Hartmann
Houston
J. S. Hesthaven
Jaffre
John
Kirby
Klöckner
Kreiss
Krivodonova
Lapidus
Lax
Lorcher
Mavriplis
Proft
Rieper
Shu
Shu
Sod
T. Warburton
Tadmor
Tu
von Neumann
Warburton
Warburton
Warburton
Woodward
Xu
Zhou
Publication venue: 'EDP Sciences'
Publication date: 01/01/2011
Field of study

We present a novel, cell-local shock detector for use with discontinuous Galerkin (DG) methods. The output of this detector is a reliably scaled, element-wise smoothness estimate which is suited as a control input to a shock capture mechanism. Using an artificial viscosity in the latter role, we obtain a DG scheme for the numerical solution of nonlinear systems of conservation laws. Building on work by Persson and Peraire, we thoroughly justify the detector's design and analyze its performance on a number of benchmark problems. We further explain the scaling and smoothing steps necessary to turn the output of the detector into a local, artificial viscosity. We close by providing an extensive array of numerical tests of the detector in use.Comment: 26 pages, 21 figure

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

Crossref

EDP Sciences OAI-PMH repository (1.2.0)

Format Abstraction for Sparse Tensor Algebra Compilers

Author: Amarasinghe Saman
Chou Stephen
Kjolstad Fredrik
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 11/11/2018
Field of study

This paper shows how to build a sparse tensor algebra compiler that is agnostic to tensor formats (data layouts). We develop an interface that describes formats in terms of their capabilities and properties, and show how to build a modular code generator where new formats can be added as plugins. We then describe six implementations of the interface that compose to form the dense, CSR/CSF, COO, DIA, ELL, and HASH tensor formats and countless variants thereof. With these implementations at hand, our code generator can generate code to compute any tensor algebra expression on any combination of the aforementioned formats. To demonstrate our technique, we have implemented it in the taco tensor algebra compiler. Our modular code generator design makes it simple to add support for new tensor formats, and the performance of the generated code is competitive with hand-optimized implementations. Furthermore, by extending taco to support a wider range of formats specialized for different application and data characteristics, we can improve end-user application performance. For example, if input data is provided in the COO format, our technique allows computing a single matrix-vector multiplication directly with the data in COO, which is up to 3.6

\times

faster than by first converting the data to CSR.Comment: Presented at OOPSLA 201

arXiv.org e-Print Archive

DSpace@MIT