4,916 research outputs found
Consistent and efficient output-streams management in optimistic simulation platforms
Optimistic synchronization is considered an effective means for supporting Parallel Discrete Event Simulations. It relies on a speculative approach, where concurrent processes execute simulation events regardless of their safety, and consistency is ensured via proper rollback mechanisms, upon the a-posteriori detection of causal inconsistencies along the events' execution path. Interactions with the outside world (e.g. generation of output streams) are a well-known problem for rollback-based systems, since the outside world may have no notion of rollback. In this context, approaches for allowing the simulation modeler to generate consistent output rely on either the usage of ad-hoc APIs (which must be provided by the underlying simulation kernel) or temporary suspension of processing activities in order to wait for the final outcome (commit/rollback) associated with a speculatively-produced output. In this paper we present design indications and a reference implementation for an output streams' management subsystem which allows the simulation-model writer to rely on standard output-generation libraries (e.g. stdio) within code blocks associated with event processing. Further, the subsystem ensures that the produced output is consistent, namely associated with events that are eventually committed, and system-wide ordered along the simulation time axis. The above features jointly provide the illusion of a classical (simple to deal with) sequential programming model, which spares the developer from being aware that the simulation program is run concurrently and speculatively. We also show, via an experimental study, how the design/development optimizations we present lead to limited overhead, giving rise to the situation where the simulation run would have been carried out with near-to-zero or reduced output management cost. At the same time, the delay for materializing the output stream (making it available for any type of audit activity) is shown to be fairly limited and constant, especially for good mixtures of I/O-bound vs CPU-bound behaviors at the application level. Further, the whole output streams' management subsystem has been designed in order to provide scalability for I/O management on clusters. © 2013 ACM
Temporal Stream Algebra
Data stream management systems (DSMS) so far focus on
event queries and hardly consider combined queries to both
data from event streams and from a database. However,
applications like emergency management require combined
data stream and database queries. Further requirements are
the simultaneous use of multiple timestamps after different
time lines and semantics, expressive temporal relations between multiple time-stamps and
exible negation, grouping
and aggregation which can be controlled, i. e. started and
stopped, by events and are not limited to fixed-size time
windows. Current DSMS hardly address these requirements.
This article proposes Temporal Stream Algebra (TSA) so
as to meet the afore mentioned requirements. Temporal
streams are a common abstraction of data streams and data-
base relations; the operators of TSA are generalizations of
the usual operators of Relational Algebra. A in-depth 'analysis of temporal relations guarantees that valid TSA expressions are non-blocking, i. e. can be evaluated incrementally.
In this respect TSA differs significantly from previous algebraic approaches which use specialized operators to prevent
blocking expressions on a "syntactical" level
Quality-Driven Disorder Handling for M-way Sliding Window Stream Joins
Sliding window join is one of the most important operators for stream
applications. To produce high quality join results, a stream processing system
must deal with the ubiquitous disorder within input streams which is caused by
network delay, asynchronous source clocks, etc. Disorder handling involves an
inevitable tradeoff between the latency and the quality of produced join
results. To meet different requirements of stream applications, it is desirable
to provide a user-configurable result-latency vs. result-quality tradeoff.
Existing disorder handling approaches either do not provide such
configurability, or support only user-specified latency constraints.
In this work, we advocate the idea of quality-driven disorder handling, and
propose a buffer-based disorder handling approach for sliding window joins,
which minimizes sizes of input-sorting buffers, thus the result latency, while
respecting user-specified result-quality requirements. The core of our approach
is an analytical model which directly captures the relationship between sizes
of input buffers and the produced result quality. Our approach is generic. It
supports m-way sliding window joins with arbitrary join conditions. Experiments
on real-world and synthetic datasets show that, compared to the state of the
art, our approach can reduce the result latency incurred by disorder handling
by up to 95% while providing the same level of result quality.Comment: 12 pages, 11 figures, IEEE ICDE 201
A Novel Framework for Online Amnesic Trajectory Compression in Resource-constrained Environments
State-of-the-art trajectory compression methods usually involve high
space-time complexity or yield unsatisfactory compression rates, leading to
rapid exhaustion of memory, computation, storage and energy resources. Their
ability is commonly limited when operating in a resource-constrained
environment especially when the data volume (even when compressed) far exceeds
the storage limit. Hence we propose a novel online framework for error-bounded
trajectory compression and ageing called the Amnesic Bounded Quadrant System
(ABQS), whose core is the Bounded Quadrant System (BQS) algorithm family that
includes a normal version (BQS), Fast version (FBQS), and a Progressive version
(PBQS). ABQS intelligently manages a given storage and compresses the
trajectories with different error tolerances subject to their ages. In the
experiments, we conduct comprehensive evaluations for the BQS algorithm family
and the ABQS framework. Using empirical GPS traces from flying foxes and cars,
and synthetic data from simulation, we demonstrate the effectiveness of the
standalone BQS algorithms in significantly reducing the time and space
complexity of trajectory compression, while greatly improving the compression
rates of the state-of-the-art algorithms (up to 45%). We also show that the
operational time of the target resource-constrained hardware platform can be
prolonged by up to 41%. We then verify that with ABQS, given data volumes that
are far greater than storage space, ABQS is able to achieve 15 to 400 times
smaller errors than the baselines. We also show that the algorithm is robust to
extreme trajectory shapes.Comment: arXiv admin note: substantial text overlap with arXiv:1412.032
Muppet: MapReduce-Style Processing of Fast Data
MapReduce has emerged as a popular method to process big data. In the past
few years, however, not just big data, but fast data has also exploded in
volume and availability. Examples of such data include sensor data streams, the
Twitter Firehose, and Facebook updates. Numerous applications must process fast
data. Can we provide a MapReduce-style framework so that developers can quickly
write such applications and execute them over a cluster of machines, to achieve
low latency and high scalability? In this paper we report on our investigation
of this question, as carried out at Kosmix and WalmartLabs. We describe
MapUpdate, a framework like MapReduce, but specifically developed for fast
data. We describe Muppet, our implementation of MapUpdate. Throughout the
description we highlight the key challenges, argue why MapReduce is not well
suited to address them, and briefly describe our current solutions. Finally, we
describe our experience and lessons learned with Muppet, which has been used
extensively at Kosmix and WalmartLabs to power a broad range of applications in
social media and e-commerce.Comment: VLDB201
Shared Arrangements: practical inter-query sharing for streaming dataflows
Current systems for data-parallel, incremental processing and view
maintenance over high-rate streams isolate the execution of independent
queries. This creates unwanted redundancy and overhead in the presence of
concurrent incrementally maintained queries: each query must independently
maintain the same indexed state over the same input streams, and new queries
must build this state from scratch before they can begin to emit their first
results. This paper introduces shared arrangements: indexed views of maintained
state that allow concurrent queries to reuse the same in-memory state without
compromising data-parallel performance and scaling. We implement shared
arrangements in a modern stream processor and show order-of-magnitude
improvements in query response time and resource consumption for interactive
queries against high-throughput streams, while also significantly improving
performance in other domains including business analytics, graph processing,
and program analysis
SCABBARD: single-node fault-tolerant stream processing
Single-node multi-core stream processing engines (SPEs) can process hundreds of millions of tuples per second. Yet making them fault-tolerant with exactly-once semantics while retaining this performance is an open challenge: due to the limited I/O bandwidth of a single-node, it becomes infeasible to persist all stream data and operator state during execution. Instead, single-node SPEs rely on upstream distributed systems, such as Apache Kafka, to recover stream data after failure, necessitating complex cluster-based deployments. This lack of built-in fault-tolerance features has hindered the adoption of single-node SPEs.We describe Scabbard, the first single-node SPE that supports exactly-once fault-tolerance semantics despite limited local I/O bandwidth. Scabbard achieves this by integrating persistence operations with the query workload. Within the operator graph, Scabbard determines when to persist streams based on the selectivity of operators: by persisting streams after operators that discard data, it can substantially reduce the required I/O bandwidth. As part of the operator graph, Scabbard supports parallel persistence operations and uses markers to decide when to discard persisted data. The persisted data volume is further reduced using workload-specific compression: Scabbard monitors stream statistics and dynamically generates computationally efficient compression operators. Our experiments show that Scabbard can execute stream queries that process over 200 million tuples per second while recovering from failures with sub-second latencies
- …