16,043 research outputs found

    Engineering Crowdsourced Stream Processing Systems

    Full text link
    A crowdsourced stream processing system (CSP) is a system that incorporates crowdsourced tasks in the processing of a data stream. This can be seen as enabling crowdsourcing work to be applied on a sample of large-scale data at high speed, or equivalently, enabling stream processing to employ human intelligence. It also leads to a substantial expansion of the capabilities of data processing systems. Engineering a CSP system requires the combination of human and machine computation elements. From a general systems theory perspective, this means taking into account inherited as well as emerging properties from both these elements. In this paper, we position CSP systems within a broader taxonomy, outline a series of design principles and evaluation metrics, present an extensible framework for their design, and describe several design patterns. We showcase the capabilities of CSP systems by performing a case study that applies our proposed framework to the design and analysis of a real system (AIDR) that classifies social media messages during time-critical crisis events. Results show that compared to a pure stream processing system, AIDR can achieve a higher data classification accuracy, while compared to a pure crowdsourcing solution, the system makes better use of human workers by requiring much less manual work effort

    SQPR: Stream Query Planning with Reuse

    Get PDF
    When users submit new queries to a distributed stream processing system (DSPS), a query planner must allocate physical resources, such as CPU cores, memory and network bandwidth, from a set of hosts to queries. Allocation decisions must provide the correct mix of resources required by queries, while achieving an efficient overall allocation to scale in the number of admitted queries. By exploiting overlap between queries and reusing partial results, a query planner can conserve resources but has to carry out more complex planning decisions. In this paper, we describe SQPR, a query planner that targets DSPSs in data centre environments with heterogeneous resources. SQPR models query admission, allocation and reuse as a single constrained optimisation problem and solves an approximate version to achieve scalability. It prevents individual resources from becoming bottlenecks by re-planning past allocation decisions and supports different allocation objectives. As our experimental evaluation in comparison with a state-of-the-art planner shows SQPR makes efficient resource allocation decisions, even with a high utilisation of resources, with acceptable overheads

    Quality-Driven Disorder Handling for M-way Sliding Window Stream Joins

    Full text link
    Sliding window join is one of the most important operators for stream applications. To produce high quality join results, a stream processing system must deal with the ubiquitous disorder within input streams which is caused by network delay, asynchronous source clocks, etc. Disorder handling involves an inevitable tradeoff between the latency and the quality of produced join results. To meet different requirements of stream applications, it is desirable to provide a user-configurable result-latency vs. result-quality tradeoff. Existing disorder handling approaches either do not provide such configurability, or support only user-specified latency constraints. In this work, we advocate the idea of quality-driven disorder handling, and propose a buffer-based disorder handling approach for sliding window joins, which minimizes sizes of input-sorting buffers, thus the result latency, while respecting user-specified result-quality requirements. The core of our approach is an analytical model which directly captures the relationship between sizes of input buffers and the produced result quality. Our approach is generic. It supports m-way sliding window joins with arbitrary join conditions. Experiments on real-world and synthetic datasets show that, compared to the state of the art, our approach can reduce the result latency incurred by disorder handling by up to 95% while providing the same level of result quality.Comment: 12 pages, 11 figures, IEEE ICDE 201

    A new wind energy conversion system

    Get PDF
    It is presupposed that vertical axis wind energy machines will be superior to horizontal axis machines on a power output/cost basis and the design of a new wind energy machine is presented. The design employs conical cones with sharp lips and smooth surfaces to promote maximum drag and minimize skin friction. The cones are mounted on a vertical axis in such a way as to assist torque development. Storing wind energy as compressed air is thought to be optimal and reasons are: (1) the efficiency of compression is fairly high compared to the conversion of mechanical energy to electrical energy in storage batteries; (2) the release of stored energy through an air motor has high efficiency; and (3) design, construction, and maintenance of an all-mechanical system is usually simpler than for a mechanical to electrical conversion system

    Review of delta wing space shuttle vehicle dynamics

    Get PDF
    The unsteady aerodynamics of the proposed delta planform, high cross range, shuttle orbiters, are investigated. It is found that these vehicles are subject to five unsteady-flow phenomena that could compromise the flight dynamics. The phenomena are as follows: (1) leeside shock-induced separation, (2) sudden leading-edge stall, (3) vortex burst, (4)bow shock-flap shock interaction, and (5) forebody vorticity. Trajectory shaping is seen as the most powerful means of avoiding deterimental effects of the stall phenomena; however, stall must be fixed or controlled when traversing the stall region. Other phenomana may be controlled by carefully programmed control deflections and some configuration modifications. Ways to alter the occurrence of the various flow conditions are explored

    Flexible operation of a mixed fluid cascade LNG plant for electrical power management

    Get PDF
    The paper discusses operation and control of a process for the liquefaction of natural gas in which the refrigeration compressors are driven by electric motors. The aim is to enable the plant to accommodate contingencies in the availability of electrical power and to continue running when there is a shortage of electrical power, avoiding the significant economic impact of a shutdown. The article provides a detailed first principles analysis of the relationships between the electrical power consumption of the process, the production rate of the liquefied natural gas, its exit temperature, and its purity. By doing this, it is possible to ascertain settings for operating the process at various levels of power consumption. The results show that the process can operate with reductions of electrical power of 30 percent or more. Hence, power shortages could be managed by operating the process flexibly to make best use of the available remaining power, rather than by shutting down. The paper also discusses how such a system could be implemented industrially and identifies aspects that require further study

    A Deviant Load Shedding System for Data Stream Mining

    Get PDF
    AbstractLoad shedding is imperative for data stream processing systems in numerous functions as data streams are susceptible to sudden spikes in volume. The proposed system is an attempt to seek and resolve four major problems associated with data stream, which include load shedding and anti-shedding time, number of transactions pruned and selecting predicate; using efficient mining system. The frequent pattern discovered in data stream used in the model exploits the synergy between scheduling and load shedding. This paper also proposes various load shedding strategies which reduce and lighten the workload of the system ensuring an acceptable level of mining accuracy using various parameters like transaction, priority and attributes of data mining. A majority chunk of workload in mining algorithm lies in the innumerable item sets, which are counted and enumerated. The approach is based on the frequent pattern matching principle of stream mining which involves reducing the workload to maintain smaller item sets

    UpStream: storage-centric load management for streaming applications with update semantics

    Get PDF
    This paper addresses the problem of minimizing the staleness of query results for streaming applications with update semantics under overload conditions. Staleness is a measure of how out-of-date the results are compared with the latest data arriving on the input. Real-time streaming applications are subject to overload due to unpredictably increasing data rates, while in many of them, we observe that data streams and queries in fact exhibit "update semantics” (i.e., the latest input data are all that really matters when producing a query result). Under such semantics, overload will cause staleness to build up. The key to avoid this is to exploit the update semantics of applications as early as possible in the processing pipeline. In this paper, we propose UpStream, a storage-centric framework for load management over streaming applications with update semantics. We first describe how we model streams and queries that possess the update semantics, providing definitions for correctness and staleness for the query results. Then, we show how staleness can be minimized based on intelligent update key scheduling techniques applied at the queue level, while preserving the correctness of the results, even for complex queries that involve sliding windows. UpStream is based on the simple idea of applying the updates in place, yet with great returns in terms of lowering staleness and memory consumption, as we also experimentally verify on the Borealis syste
    • …
    corecore