2 research outputs found
A High-Performance Algorithm for Identifying Frequent Items in Data Streams
Estimating frequencies of items over data streams is a common building block
in streaming data measurement and analysis. Misra and Gries introduced their
seminal algorithm for the problem in 1982, and the problem has since been
revisited many times due its practicality and applicability. We describe a
highly optimized version of Misra and Gries' algorithm that is suitable for
deployment in industrial settings. Our code is made public via an open source
library called DataSketches that is already used by several companies and
production systems.
Our algorithm improves on two theoretical and practical aspects of prior
work. First, it handles weighted updates in amortized constant time, a common
requirement in practice. Second, it uses a simple and fast method for merging
summaries that asymptotically improves on prior work even for unweighted
streams. We describe experiments confirming that our algorithms are more
efficient than prior proposals.Comment: Typo correction
Learning Graph Structures with Transformer for Multivariate Time Series Anomaly Detection in IoT
Many real-world IoT systems, which include a variety of internet-connected
sensory devices, produce substantial amounts of multivariate time series data.
Meanwhile, vital IoT infrastructures like smart power grids and water
distribution networks are frequently targeted by cyber-attacks, making anomaly
detection an important study topic. Modeling such relatedness is, nevertheless,
unavoidable for any efficient and effective anomaly detection system, given the
intricate topological and nonlinear connections that are originally unknown
among sensors. Furthermore, detecting anomalies in multivariate time series is
difficult due to their temporal dependency and stochasticity. This paper
presented GTA, a new framework for multivariate time series anomaly detection
that involves automatically learning a graph structure, graph convolution, and
modeling temporal dependency using a Transformer-based architecture. The
connection learning policy, which is based on the Gumbel-softmax sampling
approach to learn bi-directed links among sensors directly, is at the heart of
learning graph structure. To describe the anomaly information flow between
network nodes, we introduced a new graph convolution called Influence
Propagation convolution. In addition, to tackle the quadratic complexity
barrier, we suggested a multi-branch attention mechanism to replace the
original multi-head self-attention method. Extensive experiments on four
publicly available anomaly detection benchmarks further demonstrate the
superiority of our approach over alternative state-of-the-arts.Comment: 12 pages, 5 figures, Accepted by IEEE Internet of Things Journal 202