976 research outputs found
LMSFC: A Novel Multidimensional Index based on Learned Monotonic Space Filling Curves
The recently proposed learned indexes have attracted much attention as they
can adapt to the actual data and query distributions to attain better search
efficiency. Based on this technique, several existing works build up indexes
for multi-dimensional data and achieve improved query performance. A common
paradigm of these works is to (i) map multi-dimensional data points to a
one-dimensional space using a fixed space-filling curve (SFC) or its variant
and (ii) then apply the learned indexing techniques. We notice that the first
step typically uses a fixed SFC method, such as row-major order and z-order. It
definitely limits the potential of learned multi-dimensional indexes to adapt
variable data distributions via different query workloads. In this paper, we
propose a novel idea of learning a space-filling curve that is carefully
designed and actively optimized for efficient query processing. We also
identify innovative offline and online optimization opportunities common to
SFC-based learned indexes and offer optimal and/or heuristic solutions.
Experimental results demonstrate that our proposed method, LMSFC, outperforms
state-of-the-art non-learned or learned methods across three commonly used
real-world datasets and diverse experimental settings.Comment: Extended Version. Accepted by VLDB 202
Forecasting the cost of processing multi-join queries via hashing for main-memory databases (Extended version)
Database management systems (DBMSs) carefully optimize complex multi-join
queries to avoid expensive disk I/O. As servers today feature tens or hundreds
of gigabytes of RAM, a significant fraction of many analytic databases becomes
memory-resident. Even after careful tuning for an in-memory environment, a
linear disk I/O model such as the one implemented in PostgreSQL may make query
response time predictions that are up to 2X slower than the optimal multi-join
query plan over memory-resident data. This paper introduces a memory I/O cost
model to identify good evaluation strategies for complex query plans with
multiple hash-based equi-joins over memory-resident data. The proposed cost
model is carefully validated for accuracy using three different systems,
including an Amazon EC2 instance, to control for hardware-specific differences.
Prior work in parallel query evaluation has advocated right-deep and bushy
trees for multi-join queries due to their greater parallelization and
pipelining potential. A surprising finding is that the conventional wisdom from
shared-nothing disk-based systems does not directly apply to the modern
shared-everything memory hierarchy. As corroborated by our model, the
performance gap between the optimal left-deep and right-deep query plan can
grow to about 10X as the number of joins in the query increases.Comment: 15 pages, 8 figures, extended version of the paper to appear in
SoCC'1
FITing-Tree: A Data-aware Index Structure
Index structures are one of the most important tools that DBAs leverage to
improve the performance of analytics and transactional workloads. However,
building several indexes over large datasets can often become prohibitive and
consume valuable system resources. In fact, a recent study showed that indexes
created as part of the TPC-C benchmark can account for 55% of the total memory
available in a modern DBMS. This overhead consumes valuable and expensive main
memory, and limits the amount of space available to store new data or process
existing data.
In this paper, we present FITing-Tree, a novel form of a learned index which
uses piece-wise linear functions with a bounded error specified at construction
time. This error knob provides a tunable parameter that allows a DBA to FIT an
index to a dataset and workload by being able to balance lookup performance and
space consumption. To navigate this tradeoff, we provide a cost model that
helps determine an appropriate error parameter given either (1) a lookup
latency requirement (e.g., 500ns) or (2) a storage budget (e.g., 100MB). Using
a variety of real-world datasets, we show that our index is able to provide
performance that is comparable to full index structures while reducing the
storage footprint by orders of magnitude.Comment: 18 page
Review Paper Title: Research & Evaluation of Keyword Search Techniques over Relational Data
Currently the relational keyword based searches techniques consider the large number of data’s to provide efficient result while the user searching. There is an issue of limited memory hence there is a need of the implementation of the novel techniques/ algorithm. To improve the search technique process by optimizing the query from that has to contain the memory optimization with the help of the genetic algorithm. The process is executed in the dynamic manner which is considered as the real time scenario in that have to execute the whole process as the dynamic based on the user given query. The proposed system is Research and Evaluation of Keyword Search Techniques over Relational Data. Results indicate that many existing search techniques do not provide acceptable performance for realistic retrieval tasks. Keyword Search with ranking so that our execution time consumption is less, file length and execution time can be seen, ranking can be seen by using chart
The Case for Learned Index Structures
Indexes are models: a B-Tree-Index can be seen as a model to map a key to the
position of a record within a sorted array, a Hash-Index as a model to map a
key to a position of a record within an unsorted array, and a BitMap-Index as a
model to indicate if a data record exists or not. In this exploratory research
paper, we start from this premise and posit that all existing index structures
can be replaced with other types of models, including deep-learning models,
which we term learned indexes. The key idea is that a model can learn the sort
order or structure of lookup keys and use this signal to effectively predict
the position or existence of records. We theoretically analyze under which
conditions learned indexes outperform traditional index structures and describe
the main challenges in designing learned index structures. Our initial results
show, that by using neural nets we are able to outperform cache-optimized
B-Trees by up to 70% in speed while saving an order-of-magnitude in memory over
several real-world data sets. More importantly though, we believe that the idea
of replacing core components of a data management system through learned models
has far reaching implications for future systems designs and that this work
just provides a glimpse of what might be possible
NETEMBED: A Network Resource Mapping Service for Distributed Applications
Emerging configurable infrastructures such as large-scale overlays and grids, distributed testbeds, and sensor networks comprise diverse sets of available computing resources (e.g., CPU and OS capabilities and memory constraints) and network conditions (e.g., link delay, bandwidth, loss rate, and jitter) whose characteristics are both complex and time-varying. At the same time, distributed applications to be deployed on these infrastructures exhibit increasingly complex constraints and requirements on resources they wish to utilize. Examples include selecting nodes and links to schedule an overlay multicast file transfer across the Grid, or embedding a network experiment with specific resource constraints in a distributed testbed such as PlanetLab. Thus, a common problem facing the efficient deployment of distributed applications on these infrastructures is that of "mapping" application-level requirements onto the network in such a manner that the requirements of the application are realized, assuming that the underlying characteristics of the network are known. We refer to this problem as the network embedding problem. In this paper, we propose a new approach to tackle this combinatorially-hard problem. Thanks to a number of heuristics, our approach greatly improves performance and scalability over previously existing techniques. It does so by pruning large portions of the search space without overlooking any valid embedding. We present a construction that allows a compact representation of candidate embeddings, which is maintained by carefully controlling the order via which candidate mappings are inserted and invalid mappings are removed. We present an implementation of our proposed technique, which we call NETEMBED – a service that identify feasible mappings of a virtual network configuration (the query network) to an existing real infrastructure or testbed (the hosting network). We present results of extensive performance evaluation experiments of NETEMBED using several combinations of real and synthetic network topologies. Our results show that our NETEMBED service is quite effective in identifying one (or all) possible embeddings for quite sizable queries and hosting networks – much larger than what any of the existing techniques or services are able to handle.National Science Foundation (CNS Cybertrust 0524477, NSF CNS NeTS 0520166, NSF CNS ITR 0205294, EIA RI 0202067
- …