2 research outputs found
Ingestion, Indexing and Retrieval of High-Velocity Multidimensional Sensor Data on a Single Node
Multidimensional data are becoming more prevalent, partly due to the rise of
the Internet of Things (IoT), and with that the need to ingest and analyze data
streams at rates higher than before. Some industrial IoT applications require
ingesting millions of records per second, while processing queries on recently
ingested and historical data. Unfortunately, existing database systems suited
to multidimensional data exhibit low per-node ingestion performance, and even
if they can scale horizontally in distributed settings, they require large
number of nodes to meet such ingest demands. For this reason, in this paper we
evaluate a single-node multidimensional data store for high-velocity sensor
data. Its design centers around a two-level indexing structure, wherein the
global index is an in-memory R*-tree and the local indices are serialized
kd-trees. This study is confined to records with numerical indexing fields and
range queries, and covers ingest throughput, query response time, and storage
footprint. We show that the adopted design streamlines data ingestion and
offers ingress rates two orders of magnitude higher than those of Percona
Server, SQLite, and Druid. Our prototype also reports query response times
comparable to or better than those of Percona Server and Druid, and compares
favorably in terms of storage footprint. In addition, we evaluate a kd-tree
partitioning based scheme for grouping incoming streamed data records. Compared
to a random scheme, this scheme produces less overlap between groups of
streamed records, but contrary to what we expected, such reduced overlap does
not translate into better query performance. By contrast, the local indices
prove much more beneficial to query performance. We believe the experience
reported in this paper is valuable to practitioners and researchers alike
interested in building database systems for high-velocity multidimensional
data
Hardware-Conscious Stream Processing: A Survey
Data stream processing systems (DSPSs) enable users to express and run stream
applications to continuously process data streams. To achieve real-time data
analytics, recent researches keep focusing on optimizing the system latency and
throughput. Witnessing the recent great achievements in the computer
architecture community, researchers and practitioners have investigated the
potential of adoption hardware-conscious stream processing by better utilizing
modern hardware capacity in DSPSs. In this paper, we conduct a systematic
survey of recent work in the field, particularly along with the following three
directions: 1) computation optimization, 2) stream I/O optimization, and 3)
query deployment. Finally, we advise on potential future research directions