28 research outputs found

    Engineering Crowdsourced Stream Processing Systems

    Full text link
    A crowdsourced stream processing system (CSP) is a system that incorporates crowdsourced tasks in the processing of a data stream. This can be seen as enabling crowdsourcing work to be applied on a sample of large-scale data at high speed, or equivalently, enabling stream processing to employ human intelligence. It also leads to a substantial expansion of the capabilities of data processing systems. Engineering a CSP system requires the combination of human and machine computation elements. From a general systems theory perspective, this means taking into account inherited as well as emerging properties from both these elements. In this paper, we position CSP systems within a broader taxonomy, outline a series of design principles and evaluation metrics, present an extensible framework for their design, and describe several design patterns. We showcase the capabilities of CSP systems by performing a case study that applies our proposed framework to the design and analysis of a real system (AIDR) that classifies social media messages during time-critical crisis events. Results show that compared to a pure stream processing system, AIDR can achieve a higher data classification accuracy, while compared to a pure crowdsourcing solution, the system makes better use of human workers by requiring much less manual work effort

    Time To Live: Temporal Management of Large-Scale RFID Applications

    Get PDF
    In coming years, there will be billions of RFID tags living in the world tagging almost everything for tracking and identification purposes. This phenomenon will impose a new challenge not only to the network capacity but also to the scalability of event processing of RFID applications. Since most RFID applications are time sensitive, we propose a notion of Time To Live (TTL), representing the period of time that an RFID event can legally live in an RFID data management system, to manage various temporal event patterns. TTL is critical in the "Internet of Things" for handling a tremendous amount of partial event-tracking results. Also, TTL can be used to provide prompt responses to time-critical events so that the RFID data streams can be handled timely. We divide TTL into four categories according to the general event-handling patterns. Moreover, to extract event sequence from an unordered event stream correctly and handle TTL constrained event sequence effectively, we design a new data structure, namely Double Level Sequence Instance List (DLSIList), to record intermediate stages of event sequences. On the basis of this, an RFID data management system, namely Temporal Management System over RFID data streams (TMS-RFID), has been developed. This system can be constructed as a stand-alone middleware component to manage temporal event patterns. We demonstrate the effectiveness of TMS-RFID on extracting complex temporal event patterns through a detailed performance study using a range of high-speed data streams and various queries. The results show that TMS-RFID has a very high throughout, namely 170,000 - 870,000 events per second for different highly complex continuous queries. Moreover, the experiments also show that the main structure to record the intermediate stages in TMS-RFID does not increase exponentially with the number of events. These illustrate that TMS-RFID not only has a high processing speed, but also has a good scalability

    TESLA: a formally defined event specification language

    Get PDF
    The need for timely processing large amounts of information, flowing from the peripheral to the center of a system, is common to different application domains, and it has justified the development of several languages to describe how such information has to be processed. In this paper, we analyze such languages showing how most approaches lack the expressiveness required for the applications we target, or do not provide the precise semantics required to clearly state how the system should behave. Moving from these premises, we present TESLA, a complex event specification language. Each TESLA rule considers incoming data items as notifications of events and defines how certain patterns of events cause the occurrence of others, said to be "complex". TESLA has a simple syntax and a formal semantics, given in terms of a first order, metric temporal logic. It provides high expressiveness and flexibility in a rigorous framework, by offering content and temporal filters, negations, timers, aggregates, and fully customizable policies for event selection and consumption. The paper ends by showing how TESLA rules can be interpreted by a processing system, introducing an efficient event detection algorithm based on automata

    MQTT+: Enhanced syntax and broker functionalities for data filtering, processing and aggregation

    Get PDF
    In the last few years, the Message Queueing Telemetry Transport (MQTT) publish/subscribe protocol emerged as the de facto standard communication protocol for IoT, M2M and wireless sensor networks applications. Such popularity is mainly due to the extreme simplicity of the protocol at the client side, appropriate for low-cost and resource-constrained edge devices. Other nice features include a very low protocol overhead, ideal for limited bandwidth scenarios, the support of different Quality of Services (QoS) and many others. However, when an edge device is interested in performing processing operations over the data published by multiple clients, the use of MQTT may result in high network bandwidth usage and high energy consumption for the end devices, which is unacceptable in resource constrained scenarios. To overcome these issues, we propose in this paper MQTT+, which provides an enhanced protocol syntax and enrich the pub/sub broker with data filtering, processing and aggregation functionalities. MQTT+ is implemented starting from an open source MQTT broker and evaluated in different application scenarios