800 research outputs found
Dense-choice Counter Machines revisited
This paper clarifies the picture about Dense-choice Counter Machines, which
have been less studied than (discrete) Counter Machines. We revisit the
definition of "Dense Counter Machines" so that it now extends (discrete)
Counter Machines, and we provide new undecidability and decidability results.
Using the first-order additive mixed theory of reals and integers, we give a
logical characterization of the sets of configurations reachable by
reversal-bounded Dense-choice Counter Machines
Extending Eventually Consistent Cloud Databases for Enforcing Numeric Invariants
Geo-replicated databases often operate under the principle of eventual
consistency to offer high-availability with low latency on a simple key/value
store abstraction. Recently, some have adopted commutative data types to
provide seamless reconciliation for special purpose data types, such as
counters. Despite this, the inability to enforce numeric invariants across all
replicas still remains a key shortcoming of relying on the limited guarantees
of eventual consistency storage. We present a new replicated data type, called
bounded counter, which adds support for numeric invariants to eventually
consistent geo-replicated databases. We describe how this can be implemented on
top of existing cloud stores without modifying them, using Riak as an example.
Our approach adapts ideas from escrow transactions to devise a solution that is
decentralized, fault-tolerant and fast. Our evaluation shows much lower latency
and better scalability than the traditional approach of using strong
consistency to enforce numeric invariants, thus alleviating the tension between
consistency and availability
Data Sketches for Disaggregated Subset Sum and Frequent Item Estimation
We introduce and study a new data sketch for processing massive datasets. It
addresses two common problems: 1) computing a sum given arbitrary filter
conditions and 2) identifying the frequent items or heavy hitters in a data
set. For the former, the sketch provides unbiased estimates with state of the
art accuracy. It handles the challenging scenario when the data is
disaggregated so that computing the per unit metric of interest requires an
expensive aggregation. For example, the metric of interest may be total clicks
per user while the raw data is a click stream with multiple rows per user. Thus
the sketch is suitable for use in a wide range of applications including
computing historical click through rates for ad prediction, reporting user
metrics from event streams, and measuring network traffic for IP flows.
We prove and empirically show the sketch has good properties for both the
disaggregated subset sum estimation and frequent item problems. On i.i.d. data,
it not only picks out the frequent items but gives strongly consistent
estimates for the proportion of each frequent item. The resulting sketch
asymptotically draws a probability proportional to size sample that is optimal
for estimating sums over the data. For non i.i.d. data, we show that it
typically does much better than random sampling for the frequent item problem
and never does worse. For subset sum estimation, we show that even for
pathological sequences, the variance is close to that of an optimal sampling
design. Empirically, despite the disadvantage of operating on disaggregated
data, our method matches or bests priority sampling, a state of the art method
for pre-aggregated data and performs orders of magnitude better on skewed data
compared to uniform sampling. We propose extensions to the sketch that allow it
to be used in combining multiple data sets, in distributed systems, and for
time decayed aggregation
Optimal Elephant Flow Detection
Monitoring the traffic volumes of elephant flows, including the total byte
count per flow, is a fundamental capability for online network measurements. We
present an asymptotically optimal algorithm for solving this problem in terms
of both space and time complexity. This improves on previous approaches, which
can only count the number of packets in constant time. We evaluate our work on
real packet traces, demonstrating an up to X2.5 speedup compared to the best
alternative.Comment: Accepted to IEEE INFOCOM 201
Identifying Correlated Heavy-Hitters in a Two-Dimensional Data Stream
We consider online mining of correlated heavy-hitters from a data stream.
Given a stream of two-dimensional data, a correlated aggregate query first
extracts a substream by applying a predicate along a primary dimension, and
then computes an aggregate along a secondary dimension. Prior work on
identifying heavy-hitters in streams has almost exclusively focused on
identifying heavy-hitters on a single dimensional stream, and these yield
little insight into the properties of heavy-hitters along other dimensions. In
typical applications however, an analyst is interested not only in identifying
heavy-hitters, but also in understanding further properties such as: what other
items appear frequently along with a heavy-hitter, or what is the frequency
distribution of items that appear along with the heavy-hitters. We consider
queries of the following form: In a stream S of (x, y) tuples, on the substream
H of all x values that are heavy-hitters, maintain those y values that occur
frequently with the x values in H. We call this problem as Correlated
Heavy-Hitters (CHH). We formulate an approximate formulation of CHH
identification, and present an algorithm for tracking CHHs on a data stream.
The algorithm is easy to implement and uses workspace which is orders of
magnitude smaller than the stream itself. We present provable guarantees on the
maximum error, as well as detailed experimental results that demonstrate the
space-accuracy trade-off
Reachability in Parameterized Systems: All Flavors of Threshold Automata
Threshold automata, and the counter systems they define, were introduced as a framework for parameterized model checking of fault-tolerant distributed algorithms. This application domain suggested natural constraints on the automata structure, and a specific form of acceleration, called single-rule acceleration: consecutive occurrences of the same automaton rule are executed as a single transition in the counter system. These accelerated systems have bounded diameter, and can be verified in a complete manner with bounded model checking.
We go beyond the original domain, and investigate extensions of threshold automata: non-linear guards, increments and decrements of shared variables, increments of shared variables within loops, etc., and show that the bounded diameter property holds for several extensions. Finally, we put single-rule acceleration in the scope of flat counter automata: although increments in loops may break the bounded diameter property, the corresponding counter automaton is flattable, and reachability can be verified using more permissive forms of acceleration
On the diagnostic emulation technique and its use in the AIRLAB
An aid is presented for understanding and judging the relevance of the diagnostic emulation technique to studies of highly reliable, digital computing systems for aircraft. A short review is presented of the need for and the use of the technique as well as an explanation of its principles of operation and implementation. Details that would be needed for operational control or modification of existing versions of the technique are not described
- …