764 research outputs found
Implementing Performance Competitive Logical Recovery
New hardware platforms, e.g. cloud, multi-core, etc., have led to a
reconsideration of database system architecture. Our Deuteronomy project
separates transactional functionality from data management functionality,
enabling a flexible response to exploiting new platforms. This separation
requires, however, that recovery is described logically. In this paper, we
extend current recovery methods to work in this logical setting. While this is
straightforward in principle, performance is an issue. We show how ARIES style
recovery optimizations can work for logical recovery where page information is
not captured on the log. In side-by-side performance experiments using a common
log, we compare logical recovery with a state-of-the art ARIES style recovery
implementation and show that logical redo performance can be competitive.Comment: VLDB201
Optimizing large databases : a study on index structures
The following research is about comparing index structures for large databases, both
analytically and experimentally. The study is divided into two main parts. The first part is
centered around hash-based indexing and B-trees. Both of which are set in the context of
the widely known external memory model. The second part presents the cache-oblivious
model, describing its implications on the design of algorithms for any arbitrary pair of
memory levels...La siguiente investigación trata sobre comparar estructuras de índice para grandes bases de
datos, tanto analíticamente, como experimentalmente. El estudio se encuentra dividido en
dos partes principales. La primera parte se centra en índices de hash y B-trees. Ambas
estructuras son estudiadas en el contexto del modelo de acceso de disco tradicional. La
segunda parte presenta al modelo cache-oblivious, incluyendo sus implicaciones en el
diseño de algoritmos para niveles de memoria arbitrarios..
Fast Feedforward Networks
We break the linear link between the layer size and its inference cost by
introducing the fast feedforward (FFF) architecture, a log-time alternative to
feedforward networks. We demonstrate that FFFs are up to 220x faster than
feedforward networks, up to 6x faster than mixture-of-experts networks, and
exhibit better training properties than mixtures of experts thanks to noiseless
conditional execution. Pushing FFFs to the limit, we show that they can use as
little as 1% of layer neurons for inference in vision transformers while
preserving 94.2% of predictive performance.Comment: 12 pages, 6 figures, 4 table
Survey of Vector Database Management Systems
There are now over 20 commercial vector database management systems (VDBMSs),
all produced within the past five years. But embedding-based retrieval has been
studied for over ten years, and similarity search a staggering half century and
more. Driving this shift from algorithms to systems are new data intensive
applications, notably large language models, that demand vast stores of
unstructured data coupled with reliable, secure, fast, and scalable query
processing capability. A variety of new data management techniques now exist
for addressing these needs, however there is no comprehensive survey to
thoroughly review these techniques and systems. We start by identifying five
main obstacles to vector data management, namely vagueness of semantic
similarity, large size of vectors, high cost of similarity comparison, lack of
natural partitioning that can be used for indexing, and difficulty of
efficiently answering hybrid queries that require both attributes and vectors.
Overcoming these obstacles has led to new approaches to query processing,
storage and indexing, and query optimization and execution. For query
processing, a variety of similarity scores and query types are now well
understood; for storage and indexing, techniques include vector compression,
namely quantization, and partitioning based on randomization, learning
partitioning, and navigable partitioning; for query optimization and execution,
we describe new operators for hybrid queries, as well as techniques for plan
enumeration, plan selection, and hardware accelerated execution. These
techniques lead to a variety of VDBMSs across a spectrum of design and runtime
characteristics, including native systems specialized for vectors and extended
systems that incorporate vector capabilities into existing systems. We then
discuss benchmarks, and finally we outline research challenges and point the
direction for future work.Comment: 25 page
External-Memory Graph Algorithms
We present a collection of new techniques for designing and analyzing efficient external-memory algorithms for graph problems and illustrate how these techniques can be applied to a wide variety of specific problems. Our results include:
Proximate-neighboring. We present a simple
method for deriving external-memory lower bounds
via reductions from a problem we call the “proximate neighbors” problem. We use this technique to derive non-trivial lower bounds for such problems as list ranking, expression tree evaluation, and connected components. PRAM simulation. We give methods for efficiently
simulating PRAM computations in external memory, even for some cases in which the PRAM algorithm is not work-optimal. We apply this to derive a number of optimal (and simple) external-memory graph algorithms.
Time-forward processing. We present a general
technique for evaluating circuits (or “circuit-like”
computations) in external memory. We also usethis in a deterministic list ranking algorithm.
Deterministic 3-coloring of a cycle. We give
several optimal methods for 3-coloring a cycle,
which can be used as a subroutine for finding large
independent sets for list ranking. Our ideas go
beyond a straightforward PRAM simulation, and
may be of independent interest.
External depth-first search. We discuss a method
for performing depth first search and solving related
problems efficiently in external memory. Our
technique can be used in conjunction with ideas
due to Ullman and Yannakakis in order to solve
graph problems involving closed semi-ring computations even when their assumption that vertices fit in main memory does not hold.
Our techniques apply to a number of problems, including list ranking, which we discuss in detail, finding Euler tours, expression-tree evaluation, centroid decomposition of a tree, least-common ancestors, minimum spanning tree verification, connected and biconnected components, minimum spanning forest, ear decomposition, topological sorting, reachability, graph drawing, and visibility representation
Consistent, Durable, and Safe Memory Management for Byte-addressable Non Volatile Main Memory
This paper presents three building blocks for enabling the efficient and safe design of persistent data stores for emerging non-volatile memory technologies. Taking the fullest advantage of the low latency and high bandwidths of emerging memories such as phase change memory (PCM), spin torque, and memristor necessitates a serious look at placing these persistent storage technologies on the main memory bus. Doing so, however, introduces critical challenges of not sacrificing the data reliability and consistency that users demand from storage. This paper introduces techniques for (1) robust wear-aware memory allocation, (2) preventing of erroneous writes, and (3) consistency-preserving updates that are cacheefficient. We show through our evaluation that these techniques are efficiently implementable and effective by demonstrating a B+-tree implementation modified to make full use of our toolkit.
Algorithm Engineering for fundamental Sorting and Graph Problems
Fundamental Algorithms build a basis knowledge for every computer science undergraduate or a professional programmer. It is a set of basic techniques one can find in any (good) coursebook on algorithms and data structures. In this thesis we try to close the gap between theoretically worst-case optimal classical algorithms and the real-world circumstances one face under the assumptions imposed by the data size, limited main memory or available parallelism
- …