7,680 research outputs found

    Sliding HyperLogLog: Estimating cardinality in a data stream

    Get PDF
    In this paper, a new algorithm estimating the number of active flows in a data stream is proposed. This algorithm adapts the HyperLogLog algorithm of Flajolet et al to the data stream processing by adding a sliding window mechanism. It has the advantage to estimate at any time the number of flows seen over any duration bounded by the length of the sliding window. The estimate is very accurate with a standard error of about 1.04/\sqrt{m} (the same as in HyperLogLog algorithm). As the new algorithm answers more flexible queries, it needs an additional memory storage compared to HyerLogLog algorithm. It is proved that this additional memory is at most equal to 5mln(n/m) bytes, where n is the real number of flows in the sliding window. For instance, with an additional memory of only 35kB, a standard error of about 3% can be achieved for a data stream of several million flows. Theoretical results are validated on both real and synthetic traffic

    Parameter estimation for stochastic hybrid model applied to urban traffic flow estimation

    Get PDF
    This study proposes a novel data-based approach for estimating the parameters of a stochastic hybrid model describing the traffic flow in an urban traffic network with signalized intersections. The model represents the evolution of the traffic flow rate, measuring the number of vehicles passing a given location per time unit. This traffic flow rate is described using a mode-dependent first-order autoregressive (AR) stochastic process. The parameters of the AR process take different values depending on the mode of traffic operation – free flowing, congested or faulty – making this a hybrid stochastic process. Mode switching occurs according to a first-order Markov chain. This study proposes an expectation-maximization (EM) technique for estimating the transition matrix of this Markovian mode process and the parameters of the AR models for each mode. The technique is applied to actual traffic flow data from the city of Jakarta, Indonesia. The model thus obtained is validated by using the smoothed inference algorithms and an online particle filter. The authors also develop an EM parameter estimation that, in combination with a time-window shift technique, can be useful and practical for periodically updating the parameters of hybrid model leading to an adaptive traffic flow state estimator

    Designing Probabilistic Flow Counting over Sliding Windows

    Get PDF
    Probabilistic approaches allow designing very efficient data structures and algorithms aimed at computing the number of flows within a given observation window. The practical applications are many, ranging from security to network monitoring and control. We focus our investigation on approaches tailored for sliding windows, that enable continous-time measurements independently from the observation window. In particular, we show how to extend standard approaches, such as Probabilistic Counting with Stochastic Averaging (PCSA), to count over an observation window. The main idea is to modify the data structure to store a compact representation of the timestamp in the registers and to modify coherently the related algorithms. We propose a timestamp-augmented version of PCSA, denoted as TS-PCSA, and compare it with state-of-the-art solutions based on Hyper-LogLog (HLL) counters that evaluate the cardinality over a sliding window, but without storing the timestamps. We will show that TS-PCSA with a limited memory footprint is achieving a different tradeoff between memory and accuracy with respect to HLL-based solutions
    • …
    corecore