1,445 research outputs found
Event Stream Processing with Multiple Threads
Current runtime verification tools seldom make use of multi-threading to
speed up the evaluation of a property on a large event trace. In this paper, we
present an extension to the BeepBeep 3 event stream engine that allows the use
of multiple threads during the evaluation of a query. Various parallelization
strategies are presented and described on simple examples. The implementation
of these strategies is then evaluated empirically on a sample of problems.
Compared to the previous, single-threaded version of the BeepBeep engine, the
allocation of just a few threads to specific portions of a query provides
dramatic improvement in terms of running time
State Management for Efficient Event Pattern Detection
Event Stream Processing (ESP) Systeme überwachen kontinuierliche Datenströme, um benutzerdefinierte Queries auszuwerten. Die Herausforderung besteht darin, dass die Queryverarbeitung zustandsbehaftet ist und die Anzahl von Teilübereinstimmungen mit der Größe der verarbeiteten Events exponentiell anwächst.
Die Dynamik von Streams und die Notwendigkeit, entfernte Daten zu integrieren, erschweren die Zustandsverwaltung. Erstens liefern heterogene Eventquellen Streams mit unvorhersehbaren Eingaberaten und Queryselektivitäten. Während Spitzenzeiten ist eine erschöpfende Verarbeitung unmöglich, und die Systeme müssen auf eine Best-Effort-Verarbeitung zurückgreifen. Zweitens erfordern Queries möglicherweise externe Daten, um ein bestimmtes Event für eine Query auszuwählen. Solche Abhängigkeiten sind problematisch: Das Abrufen der Daten unterbricht die Stream-Verarbeitung. Ohne eine Eventauswahl auf Grundlage externer Daten wird das Wachstum von Teilübereinstimmungen verstärkt.
In dieser Dissertation stelle ich Strategien für optimiertes Zustandsmanagement von ESP Systemen vor. Zuerst ermögliche ich eine Best-Effort-Verarbeitung mittels Load Shedding. Dabei werden sowohl Eingabeeevents als auch Teilübereinstimmungen systematisch verworfen, um eine Latenzschwelle mit minimalem Qualitätsverlust zu garantieren. Zweitens integriere ich externe Daten, indem ich das Abrufen dieser von der Verwendung in der Queryverarbeitung entkoppele. Mit einem effizienten Caching-Mechanismus vermeide ich Unterbrechungen durch Übertragungslatenzen. Dazu werden externe Daten basierend auf ihrer erwarteten Verwendung vorab abgerufen und mittels Lazy Evaluation bei der Eventauswahl berücksichtigt. Dabei wird ein Kostenmodell verwendet, um zu bestimmen, wann welche externen Daten abgerufen und wie lange sie im Cache aufbewahrt werden sollen. Ich habe die Effektivität und Effizienz der vorgeschlagenen Strategien anhand von synthetischen und realen Daten ausgewertet und unter Beweis gestellt.Event stream processing systems continuously evaluate queries over event streams to detect user-specified patterns with low latency. However, the challenge is that query processing is stateful and it maintains partial matches that grow exponentially in the size of processed events.
State management is complicated by the dynamicity of streams and the need to integrate remote data. First, heterogeneous event sources yield dynamic streams with unpredictable input rates, data distributions, and query selectivities. During peak times, exhaustive processing is unreasonable, and systems shall resort to best-effort processing. Second, queries may require remote data to select a specific event for a pattern. Such dependencies are problematic: Fetching the remote data interrupts the stream processing. Yet, without event selection based on remote data, the growth of partial matches is amplified.
In this dissertation, I present strategies for optimised state management in event pattern detection. First, I enable best-effort processing with load shedding that discards both input events and partial matches. I carefully select the shedding elements to satisfy a latency bound while striving for a minimal loss in result quality. Second, to efficiently integrate remote data, I decouple the fetching of remote data from its use in query evaluation by a caching mechanism. To this end, I hide the transmission latency by prefetching remote data based on anticipated use and by lazy evaluation that postpones the event selection based on remote data to avoid interruptions. A cost model is used to determine when to fetch which remote data items and how long to keep them in the cache.
I evaluated the above techniques with queries over synthetic and real-world data. I show that the load shedding technique significantly improves the recall of pattern detection over baseline approaches, while the technique for remote data integration significantly reduces the pattern detection latency
Power efficiency through tuple ranking in wireless sensor network monitoring
In this paper, we present an innovative framework for efficiently monitoring Wireless Sensor Networks (WSNs). Our framework, coined KSpot, utilizes a novel top-k query processing algorithm we developed, in conjunction with the concept of in-network views, in order to minimize the cost of query execution. For ease of exposition, consider a set of sensors acquiring data from their environment at a given time instance. The generated information can conceptually be thought as a horizontally fragmented base relation R. Furthermore, the results to a user-defined query Q, registered at some sink point,
can conceptually be thought as a view V . Maintaining consistency between V and R is very expensive in terms of communication and energy. Thus, KSpot focuses on a subset V′ (⊆ V ) that unveils only the k highest-ranked answers
at the sink, for some user defined parameter k. To illustrate the efficiency of our framework, we have implemented a real
system in nesC, which combines the traditional advantages of declarative acquisition frameworks, like TinyDB, with the ideas presented in this work. Extensive real-world testing and experimentation with traces from University of California-Berkeley, the University of Washington and Intel Research Berkeley, show that KSpot provides an up to 66% of energy savings compared to TinyDB, minimizes both the size and number of packets transmitted over the network (up to 77%), and prolongs the longevity of a WSN deployment to new scales
A Survey of Green Networking Research
Reduction of unnecessary energy consumption is becoming a major concern in
wired networking, because of the potential economical benefits and of its
expected environmental impact. These issues, usually referred to as "green
networking", relate to embedding energy-awareness in the design, in the devices
and in the protocols of networks. In this work, we first formulate a more
precise definition of the "green" attribute. We furthermore identify a few
paradigms that are the key enablers of energy-aware networking research. We
then overview the current state of the art and provide a taxonomy of the
relevant work, with a special focus on wired networking. At a high level, we
identify four branches of green networking research that stem from different
observations on the root causes of energy waste, namely (i) Adaptive Link Rate,
(ii) Interface proxying, (iii) Energy-aware infrastructures and (iv)
Energy-aware applications. In this work, we do not only explore specific
proposals pertaining to each of the above branches, but also offer a
perspective for research.Comment: Index Terms: Green Networking; Wired Networks; Adaptive Link Rate;
Interface Proxying; Energy-aware Infrastructures; Energy-aware Applications.
18 pages, 6 figures, 2 table
A Taxonomy of Data Grids for Distributed Data Sharing, Management and Processing
Data Grids have been adopted as the platform for scientific communities that
need to share, access, transport, process and manage large data collections
distributed worldwide. They combine high-end computing technologies with
high-performance networking and wide-area storage management techniques. In
this paper, we discuss the key concepts behind Data Grids and compare them with
other data sharing and distribution paradigms such as content delivery
networks, peer-to-peer networks and distributed databases. We then provide
comprehensive taxonomies that cover various aspects of architecture, data
transportation, data replication and resource allocation and scheduling.
Finally, we map the proposed taxonomy to various Data Grid systems not only to
validate the taxonomy but also to identify areas for future exploration.
Through this taxonomy, we aim to categorise existing systems to better
understand their goals and their methodology. This would help evaluate their
applicability for solving similar problems. This taxonomy also provides a "gap
analysis" of this area through which researchers can potentially identify new
issues for investigation. Finally, we hope that the proposed taxonomy and
mapping also helps to provide an easy way for new practitioners to understand
this complex area of research.Comment: 46 pages, 16 figures, Technical Repor
APRIL: Approximating Polygons as Raster Interval Lists
The spatial intersection join an important spatial query operation, due to
its popularity and high complexity. The spatial join pipeline takes as input
two collections of spatial objects (e.g., polygons). In the filter step, pairs
of object MBRs that intersect are identified and passed to the refinement step
for verification of the join predicate on the exact object geometries. The
bottleneck of spatial join evaluation is in the refinement step. We introduce
APRIL, a powerful intermediate step in the pipeline, which is based on raster
interval approximations of object geometries. Our technique applies a sequence
of interval joins on 'intervalized' object approximations to determine whether
the objects intersect or not. Compared to previous work, APRIL approximations
are simpler, occupy much less space, and achieve similar pruning effectiveness
at a much higher speed. Besides intersection joins between polygons, APRIL can
directly be applied and has high effectiveness for polygonal range queries,
within joins, and polygon-linestring joins. By applying a lightweight
compression technique, APRIL approximations may occupy even less space than
object MBRs. Furthermore, APRIL can be customized to apply on partitioned data
and on polygons of varying sizes, rasterized at different granularities. Our
last contribution is a novel algorithm that computes the APRIL approximation of
a polygon without having to rasterize it in full, which is orders of magnitude
faster than the computation of other raster approximations. Experiments on real
data demonstrate the effectiveness and efficiency of APRIL; compared to the
state-of-the-art intermediate filter, APRIL occupies 2x-8x less space, is
3.5x-8.5x more time-efficient, and reduces the end-to-end join cost up to 3
times.Comment: 12 page
- …