141,626 research outputs found

    Scalable distributed event detection for Twitter

    Get PDF
    Social media streams, such as Twitter, have shown themselves to be useful sources of real-time information about what is happening in the world. Automatic detection and tracking of events identified in these streams have a variety of real-world applications, e.g. identifying and automatically reporting road accidents for emergency services. However, to be useful, events need to be identified within the stream with a very low latency. This is challenging due to the high volume of posts within these social streams. In this paper, we propose a novel event detection approach that can both effectively detect events within social streams like Twitter and can scale to thousands of posts every second. Through experimentation on a large Twitter dataset, we show that our approach can process the equivalent to the full Twitter Firehose stream, while maintaining event detection accuracy and outperforming an alternative distributed event detection system

    Knowledge-infused and Consistent Complex Event Processing over Real-time and Persistent Streams

    Full text link
    Emerging applications in Internet of Things (IoT) and Cyber-Physical Systems (CPS) present novel challenges to Big Data platforms for performing online analytics. Ubiquitous sensors from IoT deployments are able to generate data streams at high velocity, that include information from a variety of domains, and accumulate to large volumes on disk. Complex Event Processing (CEP) is recognized as an important real-time computing paradigm for analyzing continuous data streams. However, existing work on CEP is largely limited to relational query processing, exposing two distinctive gaps for query specification and execution: (1) infusing the relational query model with higher level knowledge semantics, and (2) seamless query evaluation across temporal spaces that span past, present and future events. These allow accessible analytics over data streams having properties from different disciplines, and help span the velocity (real-time) and volume (persistent) dimensions. In this article, we introduce a Knowledge-infused CEP (X-CEP) framework that provides domain-aware knowledge query constructs along with temporal operators that allow end-to-end queries to span across real-time and persistent streams. We translate this query model to efficient query execution over online and offline data streams, proposing several optimizations to mitigate the overheads introduced by evaluating semantic predicates and in accessing high-volume historic data streams. The proposed X-CEP query model and execution approaches are implemented in our prototype semantic CEP engine, SCEPter. We validate our query model using domain-aware CEP queries from a real-world Smart Power Grid application, and experimentally analyze the benefits of our optimizations for executing these queries, using event streams from a campus-microgrid IoT deployment.Comment: 34 pages, 16 figures, accepted in Future Generation Computer Systems, October 27, 201

    Receive Combining vs. Multi-Stream Multiplexing in Downlink Systems with Multi-Antenna Users

    Full text link
    In downlink multi-antenna systems with many users, the multiplexing gain is strictly limited by the number of transmit antennas NN and the use of these antennas. Assuming that the total number of receive antennas at the multi-antenna users is much larger than NN, the maximal multiplexing gain can be achieved with many different transmission/reception strategies. For example, the excess number of receive antennas can be utilized to schedule users with effective channels that are near-orthogonal, for multi-stream multiplexing to users with well-conditioned channels, and/or to enable interference-aware receive combining. In this paper, we try to answer the question if the NN data streams should be divided among few users (many streams per user) or many users (few streams per user, enabling receive combining). Analytic results are derived to show how user selection, spatial correlation, heterogeneous user conditions, and imperfect channel acquisition (quantization or estimation errors) affect the performance when sending the maximal number of streams or one stream per scheduled user---the two extremes in data stream allocation. While contradicting observations on this topic have been reported in prior works, we show that selecting many users and allocating one stream per user (i.e., exploiting receive combining) is the best candidate under realistic conditions. This is explained by the provably stronger resilience towards spatial correlation and the larger benefit from multi-user diversity. This fundamental result has positive implications for the design of downlink systems as it reduces the hardware requirements at the user devices and simplifies the throughput optimization.Comment: Published in IEEE Transactions on Signal Processing, 16 pages, 11 figures. The results can be reproduced using the following Matlab code: https://github.com/emilbjornson/one-or-multiple-stream
    corecore