16,043 research outputs found
Engineering Crowdsourced Stream Processing Systems
A crowdsourced stream processing system (CSP) is a system that incorporates
crowdsourced tasks in the processing of a data stream. This can be seen as
enabling crowdsourcing work to be applied on a sample of large-scale data at
high speed, or equivalently, enabling stream processing to employ human
intelligence. It also leads to a substantial expansion of the capabilities of
data processing systems. Engineering a CSP system requires the combination of
human and machine computation elements. From a general systems theory
perspective, this means taking into account inherited as well as emerging
properties from both these elements. In this paper, we position CSP systems
within a broader taxonomy, outline a series of design principles and evaluation
metrics, present an extensible framework for their design, and describe several
design patterns. We showcase the capabilities of CSP systems by performing a
case study that applies our proposed framework to the design and analysis of a
real system (AIDR) that classifies social media messages during time-critical
crisis events. Results show that compared to a pure stream processing system,
AIDR can achieve a higher data classification accuracy, while compared to a
pure crowdsourcing solution, the system makes better use of human workers by
requiring much less manual work effort
SQPR: Stream Query Planning with Reuse
When users submit new queries to a distributed stream processing system (DSPS), a query planner must allocate physical resources, such as CPU cores, memory and network bandwidth, from a set of hosts to queries. Allocation decisions must provide the correct mix of resources required by queries, while achieving an efficient overall allocation to scale in the number of admitted queries. By exploiting overlap between queries and reusing partial results, a query planner can conserve resources but has to carry out more complex planning decisions. In this paper, we describe SQPR, a query planner that targets DSPSs in data centre environments with heterogeneous resources. SQPR models query admission, allocation and reuse as a single constrained optimisation problem and solves an approximate version to achieve scalability. It prevents individual resources from becoming bottlenecks by re-planning past allocation decisions and supports different allocation objectives. As our experimental evaluation in comparison with a state-of-the-art planner shows SQPR makes efficient resource allocation decisions, even with a high utilisation of resources, with acceptable overheads
Quality-Driven Disorder Handling for M-way Sliding Window Stream Joins
Sliding window join is one of the most important operators for stream
applications. To produce high quality join results, a stream processing system
must deal with the ubiquitous disorder within input streams which is caused by
network delay, asynchronous source clocks, etc. Disorder handling involves an
inevitable tradeoff between the latency and the quality of produced join
results. To meet different requirements of stream applications, it is desirable
to provide a user-configurable result-latency vs. result-quality tradeoff.
Existing disorder handling approaches either do not provide such
configurability, or support only user-specified latency constraints.
In this work, we advocate the idea of quality-driven disorder handling, and
propose a buffer-based disorder handling approach for sliding window joins,
which minimizes sizes of input-sorting buffers, thus the result latency, while
respecting user-specified result-quality requirements. The core of our approach
is an analytical model which directly captures the relationship between sizes
of input buffers and the produced result quality. Our approach is generic. It
supports m-way sliding window joins with arbitrary join conditions. Experiments
on real-world and synthetic datasets show that, compared to the state of the
art, our approach can reduce the result latency incurred by disorder handling
by up to 95% while providing the same level of result quality.Comment: 12 pages, 11 figures, IEEE ICDE 201
A new wind energy conversion system
It is presupposed that vertical axis wind energy machines will be superior to horizontal axis machines on a power output/cost basis and the design of a new wind energy machine is presented. The design employs conical cones with sharp lips and smooth surfaces to promote maximum drag and minimize skin friction. The cones are mounted on a vertical axis in such a way as to assist torque development. Storing wind energy as compressed air is thought to be optimal and reasons are: (1) the efficiency of compression is fairly high compared to the conversion of mechanical energy to electrical energy in storage batteries; (2) the release of stored energy through an air motor has high efficiency; and (3) design, construction, and maintenance of an all-mechanical system is usually simpler than for a mechanical to electrical conversion system
Review of delta wing space shuttle vehicle dynamics
The unsteady aerodynamics of the proposed delta planform, high cross range, shuttle orbiters, are investigated. It is found that these vehicles are subject to five unsteady-flow phenomena that could compromise the flight dynamics. The phenomena are as follows: (1) leeside shock-induced separation, (2) sudden leading-edge stall, (3) vortex burst, (4)bow shock-flap shock interaction, and (5) forebody vorticity. Trajectory shaping is seen as the most powerful means of avoiding deterimental effects of the stall phenomena; however, stall must be fixed or controlled when traversing the stall region. Other phenomana may be controlled by carefully programmed control deflections and some configuration modifications. Ways to alter the occurrence of the various flow conditions are explored
Flexible operation of a mixed fluid cascade LNG plant for electrical power management
The paper discusses operation and control of a process for the liquefaction of natural gas in which the refrigeration compressors are driven by electric motors. The aim is to enable the plant to accommodate contingencies in the availability of electrical power and to continue running when there is a shortage of electrical power, avoiding the significant economic impact of a shutdown. The article provides a detailed first principles analysis of the relationships between the electrical power consumption of the process, the production rate of the liquefied natural gas, its exit temperature, and its purity. By doing this, it is possible to ascertain settings for operating the process at various levels of power consumption. The results show that the process can operate with reductions of electrical power of 30 percent or more. Hence, power shortages could be managed by operating the process flexibly to make best use of the available remaining power, rather than by shutting down. The paper also discusses how such a system could be implemented industrially and identifies aspects that require further study
A Deviant Load Shedding System for Data Stream Mining
AbstractLoad shedding is imperative for data stream processing systems in numerous functions as data streams are susceptible to sudden spikes in volume. The proposed system is an attempt to seek and resolve four major problems associated with data stream, which include load shedding and anti-shedding time, number of transactions pruned and selecting predicate; using efficient mining system. The frequent pattern discovered in data stream used in the model exploits the synergy between scheduling and load shedding. This paper also proposes various load shedding strategies which reduce and lighten the workload of the system ensuring an acceptable level of mining accuracy using various parameters like transaction, priority and attributes of data mining. A majority chunk of workload in mining algorithm lies in the innumerable item sets, which are counted and enumerated. The approach is based on the frequent pattern matching principle of stream mining which involves reducing the workload to maintain smaller item sets
UpStream: storage-centric load management for streaming applications with update semantics
This paper addresses the problem of minimizing the staleness of query results for streaming applications with update semantics under overload conditions. Staleness is a measure of how out-of-date the results are compared with the latest data arriving on the input. Real-time streaming applications are subject to overload due to unpredictably increasing data rates, while in many of them, we observe that data streams and queries in fact exhibit "update semanticsâ (i.e., the latest input data are all that really matters when producing a query result). Under such semantics, overload will cause staleness to build up. The key to avoid this is to exploit the update semantics of applications as early as possible in the processing pipeline. In this paper, we propose UpStream, a storage-centric framework for load management over streaming applications with update semantics. We first describe how we model streams and queries that possess the update semantics, providing definitions for correctness and staleness for the query results. Then, we show how staleness can be minimized based on intelligent update key scheduling techniques applied at the queue level, while preserving the correctness of the results, even for complex queries that involve sliding windows. UpStream is based on the simple idea of applying the updates in place, yet with great returns in terms of lowering staleness and memory consumption, as we also experimentally verify on the Borealis syste
- âŚ