204 research outputs found
PAN AIR: A computer program for predicting subsonic or supersonic linear potential flows about arbitrary configurations using a higher order panel method. Volume 4: Maintenance document (version 3.0)
The Maintenance Document Version 3.0 is a guide to the PAN AIR software system, a system which computes the subsonic or supersonic linear potential flow about a body of nearly arbitrary shape, using a higher order panel method. The document describes the overall system and each program module of the system. Sufficient detail is given for program maintenance, updating, and modification. It is assumed that the reader is familiar with programming and CRAY computer systems. The PAN AIR system was written in FORTRAN 4 language except for a few CAL language subroutines which exist in the PAN AIR library. Structured programming techniques were used to provide code documentation and maintainability. The operating systems accommodated are COS 1.11, COS 1.12, COS 1.13, and COS 1.14 on the CRAY 1S, 1M, and X-MP computing systems. The system is comprised of a data base management system, a program library, an execution control module, and nine separate FORTRAN technical modules. Each module calculates part of the posed PAN AIR problem. The data base manager is used to communicate between modules and within modules. The technical modules must be run in a prescribed fashion for each PAN AIR problem. In order to ease the problem of supplying the many JCL cards required to execute the modules, a set of CRAY procedures (PAPROCS) was created to automatically supply most of the JCL cards. Most of this document has not changed for Version 3.0. It now, however, strictly applies only to PAN AIR version 3.0. The major changes are: (1) additional sections covering the new FDP module (which calculates streamlines and offbody points); (2) a complete rewrite of the section on the MAG module; and (3) strict applicability to CRAY computing systems
Updatable Learned Indexes Meet Disk-Resident DBMS -- From Evaluations to Design Choices
Although many updatable learned indexes have been proposed in recent years,
whether they can outperform traditional approaches on disk remains unknown. In
this study, we revisit and implement four state-of-the-art updatable learned
indexes on disk, and compare them against the B+-tree under a wide range of
settings. Through our evaluation, we make some key observations: 1) Overall,
the B+-tree performs well across a range of workload types and datasets. 2) A
learned index could outperform B+-tree or other learned indexes on disk for a
specific workload. For example, PGM achieves the best performance in write-only
workloads while LIPP significantly outperforms others in lookup-only workloads.
We further conduct a detailed performance analysis to reveal the strengths and
weaknesses of these learned indexes on disk. Moreover, we summarize the
observed common shortcomings in five categories and propose four design
principles to guide future design of on-disk, updatable learned indexes: (1)
reducing the index's tree height, (2) better data structures to lower operation
overheads, (3) improving the efficiency of scan operations, and (4) more
efficient storage layout.Comment: 22 page
PAN AIR: A computer program for predicting subsonic or supersonic linear potential flows about arbitrary configurations using a higher order panel method. Volume 4: Maintenance document (version 1.1)
The Maintenance Document is a guide to the PAN AIR software system, a system which computes the subsonic or supersonic linear potential flow about a body of nearly arbitrary shape, using a higher order panel method. The document describes the over-all system and each program module of the system. Sufficient detail is given for program maintenance, updating and modification. It is assumed that the reader is familiar with programming and CDC (Control Data Corporation) computer systems. The PAN AIR system was written in FORTRAN 4 language except for a few COMPASS language subroutines which exist in the PAN AIR library. Structured programming techniques were used to provide code documentation and maintainability. The operating systems accommodated are NOS 1.2, NOS/BE and SCOPE 2.1.3 on the CDC 6600, 7600 and Cyber 175 computing systems. The system is comprised of a data management system, a program library, an execution control module and nine separate FORTRAN technical modules. Each module calculates part of the posed PAN AIR problem. The data base manager is used to communicate between modules and within modules. The technical modules must be run in a prescribed fashion for each PAN AIR problem. In order to ease the problem of supplying the many JCL cards required to execute the modules, a separate module called MEC (Module Execution Control) was created to automatically supply most of the JCL cards. In addition to the MEC generated JCL, there is an additional set of user supplied JCL cards to initiate the JCL sequence stored on the system
Bridging the gap between algorithmic and learned index structures
Index structures such as B-trees and bloom filters are the well-established petrol engines of database systems. However, these structures do not fully exploit patterns in data distribution. To address this, researchers have suggested using machine learning models as electric engines that can entirely replace index structures. Such a paradigm shift in data system design, however, opens many unsolved design challenges. More research is needed to understand the theoretical guarantees and design efficient support for insertion and deletion.
In this thesis, we adopt a different position: index algorithms are good enough, and instead of going back to the drawing board to fit data systems with learned models, we should develop lightweight hybrid engines that build on the benefits of both algorithmic and learned index structures. The indexes that we suggest provide the theoretical performance guarantees and updatability of algorithmic indexes while using position prediction models to leverage the data distributions and thereby improve the performance of the index structure. We investigate the potential for minimal modifications to algorithmic indexes such that they can leverage data distribution similar to how learned indexes work. In this regard, we propose and explore the use of helping models that boost classical index performance using techniques from machine learning. Our suggested approach inherits performance guarantees from its algorithmic baseline index, but at the same time it considers the data distribution to improve performance considerably. We study single-dimensional range indexes, spatial indexes, and stream indexing, and show that the suggested approach results in range indexes that outperform the algorithmic indexes and have comparable performance to the read-only, fully learned indexes and hence can be reliably used as a default index structure in a database engine.
Besides, we consider the updatability of the indexes and suggest solutions for updating the index, notably when the data distribution drastically changes over time (e.g., for indexing data streams). In particular, we propose a specific learning-augmented index for indexing a sliding window with timestamps in a data stream.
Additionally, we highlight the limitations of learned indexes for low-latency lookup on real- world data distributions. To tackle this issue, we suggest adding an algorithmic enhancement layer to a learned model to correct the prediction error with a small memory latency. This approach enables efficient modelling of the data distribution and resolves the local biases of a learned model at the cost of roughly one memory lookup.Open Acces
LMSFC: A Novel Multidimensional Index based on Learned Monotonic Space Filling Curves
The recently proposed learned indexes have attracted much attention as they
can adapt to the actual data and query distributions to attain better search
efficiency. Based on this technique, several existing works build up indexes
for multi-dimensional data and achieve improved query performance. A common
paradigm of these works is to (i) map multi-dimensional data points to a
one-dimensional space using a fixed space-filling curve (SFC) or its variant
and (ii) then apply the learned indexing techniques. We notice that the first
step typically uses a fixed SFC method, such as row-major order and z-order. It
definitely limits the potential of learned multi-dimensional indexes to adapt
variable data distributions via different query workloads. In this paper, we
propose a novel idea of learning a space-filling curve that is carefully
designed and actively optimized for efficient query processing. We also
identify innovative offline and online optimization opportunities common to
SFC-based learned indexes and offer optimal and/or heuristic solutions.
Experimental results demonstrate that our proposed method, LMSFC, outperforms
state-of-the-art non-learned or learned methods across three commonly used
real-world datasets and diverse experimental settings.Comment: Extended Version. Accepted by VLDB 202
Storage Management with Multi-Version Partitioned B-Trees
Database Management Systems and K/V-Stores operate on updatable datasets --
massively exceeding the size of available main memory. Tree-based K/V storage
management structures became particularly popular in storage engines. B+ Trees
allow constant search performance, however write-heavy workloads yield in
inefficient write patterns to secondary storage devices and poor performance
characteristics. LSM-Trees overcome this issue by horizontal partitioning
fractions of data - small enough to fully reside in main memory, but require
frequent maintenance to sustain search performance.
Firstly, we propose Multi-Version Partitioned BTrees (MV-PBT) as sole storage
and index management structure in key-sorted storage engines like K/V-Stores.
Secondly, we compare MV-PBT against LSM-Trees. The logical horizontal
partitioning in MV-PBT allows leveraging recent advances in modern B-Tree
techniques in a small transparent and memory resident portion of the structure.
Structural properties sustain steady read performance, yielding efficient write
patterns and reducing write amplification.
We integrated MV-PBT in the WiredTiger KV storage engine. MV-PBT offers an up
to 2x increased steady throughput in comparison to LSM-Trees and several orders
of magnitude in comparison to B+ Trees in a YCSB workload.Comment: Extended Version, ADBIS 202
AirIndex: Versatile Index Tuning Through Data and Storage
The end-to-end lookup latency of a hierarchical index -- such as a B-tree or
a learned index -- is determined by its structure such as the number of layers,
the kinds of branching functions appearing in each layer, the amount of data we
must fetch from layers, etc. Our primary observation is that by optimizing
those structural parameters (or designs) specifically to a target system's I/O
characteristics (e.g., latency, bandwidth), we can offer a faster lookup
compared to the ones that are not optimized. Can we develop a systematic method
for finding those optimal design parameters? Ideally, the method must have the
potential to generate almost any existing index or a novel combination of them
for the fastest possible lookup.
In this work, we present new data and an I/O-aware index builder (called
AirIndex) that can find high-speed hierarchical index designs in a principled
way. Specifically, AirIndex minimizes an objective function expressing the
end-to-end latency in terms of various designs -- the number of layers, types
of layers, and more -- for given data and a storage profile, using a
graph-based optimization method purpose-built to address the computational
challenges rising from the inter-dependencies among index layers and the
exponentially many candidate parameters in a large search space. Our empirical
studies confirm that AirIndex can find optimal index designs, build optimal
indexes within the times comparable to existing methods, and deliver up to 4.1x
faster lookup than a lightweight B-tree library (LMDB), 3.3x--46.3x faster than
state-of-the-art learned indexes (RMI/CDFShop, PGM-Index, ALEX/APEX, PLEX), and
2.0 faster than Data Calculator's suggestion on various dataset and storage
settings.Comment: 13 pages, 3 appendices, 19 figures, to appear at SIGMOD 202
SALI: A Scalable Adaptive Learned Index Framework based on Probability Models
The growth in data storage capacity and the increasing demands for high
performance have created several challenges for concurrent indexing structures.
One promising solution is learned indexes, which use a learning-based approach
to fit the distribution of stored data and predictively locate target keys,
significantly improving lookup performance. Despite their advantages,
prevailing learned indexes exhibit constraints and encounter issues of
scalability on multi-core data storage.
This paper introduces SALI, the Scalable Adaptive Learned Index framework,
which incorporates two strategies aimed at achieving high scalability,
improving efficiency, and enhancing the robustness of the learned index.
Firstly, a set of node-evolving strategies is defined to enable the learned
index to adapt to various workload skews and enhance its concurrency
performance in such scenarios. Secondly, a lightweight strategy is proposed to
maintain statistical information within the learned index, with the goal of
further improving the scalability of the index. Furthermore, to validate their
effectiveness, SALI applied the two strategies mentioned above to the learned
index structure that utilizes fine-grained write locks, known as LIPP. The
experimental results have demonstrated that SALI significantly enhances the
insertion throughput with 64 threads by an average of 2.04x compared to the
second-best learned index. Furthermore, SALI accomplishes a lookup throughput
similar to that of LIPP+.Comment: Accepted by Conference SIGMOD 24, June 09-15, 2024, Santiago, Chil
- …