4 research outputs found
Tsunami: A Learned Multi-dimensional Index for Correlated Data and Skewed Workloads
Filtering data based on predicates is one of the most fundamental operations
for any modern data warehouse. Techniques to accelerate the execution of filter
expressions include clustered indexes, specialized sort orders (e.g., Z-order),
multi-dimensional indexes, and, for high selectivity queries, secondary
indexes. However, these schemes are hard to tune and their performance is
inconsistent. Recent work on learned multi-dimensional indexes has introduced
the idea of automatically optimizing an index for a particular dataset and
workload. However, the performance of that work suffers in the presence of
correlated data and skewed query workloads, both of which are common in real
applications. In this paper, we introduce Tsunami, which addresses these
limitations to achieve up to 6X faster query performance and up to 8X smaller
index size than existing learned multi-dimensional indexes, in addition to up
to 11X faster query performance and 170X smaller index size than
optimally-tuned traditional indexes
Learning Multi-dimensional Indexes
Scanning and filtering over multi-dimensional tables are key operations in
modern analytical database engines. To optimize the performance of these
operations, databases often create clustered indexes over a single dimension or
multi-dimensional indexes such as R-trees, or use complex sort orders (e.g.,
Z-ordering). However, these schemes are often hard to tune and their
performance is inconsistent across different datasets and queries. In this
paper, we introduce Flood, a multi-dimensional in-memory index that
automatically adapts itself to a particular dataset and workload by jointly
optimizing the index structure and data storage. Flood achieves up to three
orders of magnitude faster performance for range scans with predicates than
state-of-the-art multi-dimensional indexes or sort orders on real-world
datasets and workloads. Our work serves as a building block towards an
end-to-end learned database system
Towards Large-Scale, Heterogeneous Anomaly Detection Systems in Industrial Networks: A Survey of Current Trends
Industrial Networks (INs) are widespread environments where heterogeneous devices collaborate to control and monitor physical
processes. Some of the controlled processes belong to Critical Infrastructures (CIs), and, as such, IN protection is an active research
field. Among different types of security solutions, IN Anomaly Detection Systems (ADSs) have received wide attention from the
scientific community.While INs have grown in size and in complexity, requiring the development of novel, Big Data solutions for
data processing, IN ADSs have not evolved at the same pace. In parallel, the development of BigData frameworks such asHadoop or
Spark has led the way for applying Big Data Analytics to the field of cyber-security,mainly focusing on the Information Technology
(IT) domain. However, due to the particularities of INs, it is not feasible to directly apply IT security mechanisms in INs, as IN
ADSs face unique characteristics. In this work we introduce three main contributions. First, we survey the area of Big Data ADSs
that could be applicable to INs and compare the surveyed works. Second, we develop a novel taxonomy to classify existing INbased
ADSs. And, finally, we present a discussion of open problems in the field of Big Data ADSs for INs that can lead to further
development