2,074,606 research outputs found
Main Memory Adaptive Indexing for Multi-core Systems
Adaptive indexing is a concept that considers index creation in databases as
a by-product of query processing; as opposed to traditional full index creation
where the indexing effort is performed up front before answering any queries.
Adaptive indexing has received a considerable amount of attention, and several
algorithms have been proposed over the past few years; including a recent
experimental study comparing a large number of existing methods. Until now,
however, most adaptive indexing algorithms have been designed single-threaded,
yet with multi-core systems already well established, the idea of designing
parallel algorithms for adaptive indexing is very natural. In this regard only
one parallel algorithm for adaptive indexing has recently appeared in the
literature: The parallel version of standard cracking. In this paper we
describe three alternative parallel algorithms for adaptive indexing, including
a second variant of a parallel standard cracking algorithm. Additionally, we
describe a hybrid parallel sorting algorithm, and a NUMA-aware method based on
sorting. We then thoroughly compare all these algorithms experimentally; along
a variant of a recently published parallel version of radix sort. Parallel
sorting algorithms serve as a realistic baseline for multi-threaded adaptive
indexing techniques. In total we experimentally compare seven parallel
algorithms. Additionally, we extensively profile all considered algorithms. The
initial set of experiments considered in this paper indicates that our parallel
algorithms significantly improve over previously known ones. Our results
suggest that, although adaptive indexing algorithms are a good design choice in
single-threaded environments, the rules change considerably in the parallel
case. That is, in future highly-parallel environments, sorting algorithms could
be serious alternatives to adaptive indexing.Comment: 26 pages, 7 figure
Quantifying memory capacity as a quantum thermodynamic resource
The information-carrying capacity of a memory is known to be a thermodynamic
resource facilitating the conversion of heat to work. Szilard's engine
explicates this connection through a toy example involving an energy-degenerate
two-state memory. We devise a formalism to quantify the thermodynamic value of
memory in general quantum systems with nontrivial energy landscapes. Calling
this the thermal information capacity, we show that it converges to the
non-equilibrium Helmholtz free energy in the thermodynamic limit. We compute
the capacity exactly for a general two-state (qubit) memory away from the
thermodynamic limit, and find it to be distinct from known free energies. We
outline an explicit memory--bath coupling that can approximate the optimal
qubit thermal information capacity arbitrarily well.Comment: 6 main + 7 appendix pages; 5 main + 2 appendix figure
External Memory Pipelining Made Easy With TPIE
When handling large datasets that exceed the capacity of the main memory,
movement of data between main memory and external memory (disk), rather than
actual (CPU) computation time, is often the bottleneck in the computation.
Since data is moved between disk and main memory in large contiguous blocks,
this has led to the development of a large number of I/O-efficient algorithms
that minimize the number of such block movements.
TPIE is one of two major libraries that have been developed to support
I/O-efficient algorithm implementations. TPIE provides an interface where list
stream processing and sorting can be implemented in a simple and modular way
without having to worry about memory management or block movement. However, if
care is not taken, such streaming-based implementations can lead to practically
inefficient algorithms since lists of data items are typically written to (and
read from) disk between components.
In this paper we present a major extension of the TPIE library that includes
a pipelining framework that allows for practically efficient streaming-based
implementations while minimizing I/O-overhead between streaming components. The
framework pipelines streaming components to avoid I/Os between components, that
is, it processes several components simultaneously while passing output from
one component directly to the input of the next component in main memory. TPIE
automatically determines which components to pipeline and performs the required
main memory management, and the extension also includes support for
parallelization of internal memory computation and progress tracking across an
entire application. The extended library has already been used to evaluate
I/O-efficient algorithms in the research literature and is heavily used in
I/O-efficient commercial terrain processing applications by the Danish startup
SCALGO
Stochastic Database Cracking: Towards Robust Adaptive Indexing in Main-Memory Column-Stores
Modern business applications and scientific databases call for inherently
dynamic data storage environments. Such environments are characterized by two
challenging features: (a) they have little idle system time to devote on
physical design; and (b) there is little, if any, a priori workload knowledge,
while the query and data workload keeps changing dynamically. In such
environments, traditional approaches to index building and maintenance cannot
apply. Database cracking has been proposed as a solution that allows on-the-fly
physical data reorganization, as a collateral effect of query processing.
Cracking aims to continuously and automatically adapt indexes to the workload
at hand, without human intervention. Indexes are built incrementally,
adaptively, and on demand. Nevertheless, as we show, existing adaptive indexing
methods fail to deliver workload-robustness; they perform much better with
random workloads than with others. This frailty derives from the inelasticity
with which these approaches interpret each query as a hint on how data should
be stored. Current cracking schemes blindly reorganize the data within each
query's range, even if that results into successive expensive operations with
minimal indexing benefit. In this paper, we introduce stochastic cracking, a
significantly more resilient approach to adaptive indexing. Stochastic cracking
also uses each query as a hint on how to reorganize data, but not blindly so;
it gains resilience and avoids performance bottlenecks by deliberately applying
certain arbitrary choices in its decision-making. Thereby, we bring adaptive
indexing forward to a mature formulation that confers the workload-robustness
previous approaches lacked. Our extensive experimental study verifies that
stochastic cracking maintains the desired properties of original database
cracking while at the same time it performs well with diverse realistic
workloads.Comment: VLDB201
- …
