Search CORE

3,509 research outputs found

Incremental elasticity for array databases

Author: Ang K. H.
de Witt S.
Ganesan P.
P.
Stonebraker M.
Stonebraker M.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/06/2014
Field of study

Relational databases benefit significantly from elasticity, whereby they execute on a set of changing hardware resources provisioned to match their storage and processing requirements. Such flexibility is especially attractive for scientific databases because their users often have a no-overwrite storage model, in which they delete data only when their available space is exhausted. This results in a database that is regularly growing and expanding its hardware proportionally. Also, scientific databases frequently store their data as multidimensional arrays optimized for spatial querying. This brings about several novel challenges in clustered, skew-aware data placement on an elastic shared-nothing database. In this work, we design and implement elasticity for an array database. We address this challenge on two fronts: determining when to expand a database cluster and how to partition the data within it. In both steps we propose incremental approaches, affecting a minimum set of data and nodes, while maintaining high performance. We introduce an algorithm for gradually augmenting an array database's hardware using a closed-loop control system. After the cluster adds nodes, we optimize data placement for n-dimensional arrays. Many of our elastic partitioners incrementally reorganize an array, redistributing data only to new nodes. By combining these two tools, the scientific database efficiently and seamlessly manages its monotonically increasing hardware resources.Intel Corporation (Science and Technology Center for Big Data

CiteSeerX

DSpace@MIT

Crossref

Relational Cloud: The Case for a Database Service

Author: Curino Carlo
Jones Evan
Madden Samuel
Wu Eugene
Zhang Yang
Publication venue
Publication date: 14/03/2010
Field of study

In this paper, we make the case for â databases as a serviceâ (DaaS), with two target scenarios in mind: (i) consolidation of data management functionality for large organizations and (ii) outsourcing data management to a cloud-based service provider for small/medium organizations. We analyze the many challenges to be faced, and discuss the design of a database service we are building, called Relational Cloud. The system has been designed from scratch and combines many recent advances and novel solutions. The prototype we present exploits multiple dedicated storage engines, provides high-availability via transparent replication, supports automatic workload partitioning and live data migration, and provides serializable distributed transactions. While the system is still under active development, we are able to present promising initial results that showcase the key features of our system. The tests are based on TPC benchmarks and real-world data from epinions.com, and show our partitioning, scalability and balancing capabilities

DSpace@MIT

SAP HANA distributed in-memory database system: Transaction, session, and metadata management

Author: Bensberg Christian
Färber Franz
Kwon Yong Sik
Lee Arthur H.
Lee Chulwon
Lee Joo Yeon
Lee Juchang
Lehner Wolfgang
Muehle Michael
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 12/01/2023
Field of study

One of the core principles of the SAP HANA database system is the comprehensive support of distributed query facility. Supporting scale-out scenarios was one of the major design principles of the system from the very beginning. Within this paper, we first give an overview of the overall functionality with respect to data allocation, metadata caching and query routing. We then dive into some level of detail for specific topics and explain features and methods not common in traditional disk-based database systems. In summary, the paper provides a comprehensive overview of distributed query processing in SAP HANA database to achieve scalability to handle large databases and heterogeneous types of workloads

Qucosa

HSSS - Hochschulschriftenserver der SLUB

Technische Universität Dresden: Qucosa

Doctor of Philosophy

Author: Kumar Sidharth
Publication venue: University of Utah
Publication date: 01/01/2016
Field of study

dissertationThe increase in computational power of supercomputers is enabling complex scientific phenomena to be simulated at ever-increasing resolution and fidelity. With these simulations routinely producing large volumes of data, performing efficient I/O at this scale has become a very difficult task. Large-scale parallel writes are challenging due to the complex interdependencies between I/O middleware and hardware. Analytic-appropriate reads are traditionally hindered by bottlenecks in I/O access. Moreover, the two components of I/O, data generation from simulations (writes) and data exploration for analysis and visualization (reads), have substantially different data access requirements. Parallel writes, performed on supercomputers, often deploy aggregation strategies to permit large-sized contiguous access. Analysis and visualization tasks, usually performed on computationally modest resources, require fast access to localized subsets or multiresolution representations of the data. This dissertation tackles the problem of parallel I/O while bridging the gap between large-scale writes and analytics-appropriate reads. The focus of this work is to develop an end-to-end adaptive-resolution data movement framework that provides efficient I/O, while supporting the full spectrum of modern HPC hardware. This is achieved by developing technology for highly scalable and tunable parallel I/O, applicable to both traditional parallel data formats and multiresolution data formats, which are directly appropriate for analysis and visualization. To demonstrate the efficacy of the approach, a novel library (PIDX) is developed that is highly tunable and capable of adaptive-resolution parallel I/O to a multiresolution data format. Adaptive resolution storage and I/O, which allows subsets of a simulation to be accessed at varying spatial resolutions, can yield significant improvements to both the storage performance and I/O time. The library provides a set of parameters that controls the storage format and the nature of data aggregation across he network; further, a machine learning-based model is constructed that tunes these parameters for the maximum throughput. This work is empirically demonstrated by showing parallel I/O scaling up to 768K cores within a framework flexible enough to handle adaptive resolution I/O

The University of Utah: J. Willard Marriott Digital Library

Performance of a characteristic-based, 3-D, time-domain Maxwell equations solver on a massively parallel computer

Author: Calahan D. A.
Hill K. C.
Shang J. S.
Publication venue: 'American Institute of Aeronautics and Astronautics (AIAA)'
Publication date: 01/01/1993
Field of study

Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/77035/1/AIAA-1993-3179-911.pd

Deep Blue Documents at the University of Michigan