141,626 research outputs found
Scalable distributed event detection for Twitter
Social media streams, such as Twitter, have shown themselves to be useful sources of real-time information about what is happening in the world. Automatic detection and tracking of events identified in these streams have a variety of real-world applications, e.g. identifying and automatically reporting road accidents for emergency services. However, to be useful, events need to be identified within the stream with a very low latency. This is challenging due to the high volume of posts within these social streams. In this paper, we propose a novel event detection approach that can both effectively detect events within social streams like Twitter and can scale to thousands of posts every second. Through experimentation on a large Twitter dataset, we show that our approach can process the equivalent to the full Twitter Firehose stream, while maintaining event detection accuracy and outperforming an alternative distributed event detection system
Knowledge-infused and Consistent Complex Event Processing over Real-time and Persistent Streams
Emerging applications in Internet of Things (IoT) and Cyber-Physical Systems
(CPS) present novel challenges to Big Data platforms for performing online
analytics. Ubiquitous sensors from IoT deployments are able to generate data
streams at high velocity, that include information from a variety of domains,
and accumulate to large volumes on disk. Complex Event Processing (CEP) is
recognized as an important real-time computing paradigm for analyzing
continuous data streams. However, existing work on CEP is largely limited to
relational query processing, exposing two distinctive gaps for query
specification and execution: (1) infusing the relational query model with
higher level knowledge semantics, and (2) seamless query evaluation across
temporal spaces that span past, present and future events. These allow
accessible analytics over data streams having properties from different
disciplines, and help span the velocity (real-time) and volume (persistent)
dimensions. In this article, we introduce a Knowledge-infused CEP (X-CEP)
framework that provides domain-aware knowledge query constructs along with
temporal operators that allow end-to-end queries to span across real-time and
persistent streams. We translate this query model to efficient query execution
over online and offline data streams, proposing several optimizations to
mitigate the overheads introduced by evaluating semantic predicates and in
accessing high-volume historic data streams. The proposed X-CEP query model and
execution approaches are implemented in our prototype semantic CEP engine,
SCEPter. We validate our query model using domain-aware CEP queries from a
real-world Smart Power Grid application, and experimentally analyze the
benefits of our optimizations for executing these queries, using event streams
from a campus-microgrid IoT deployment.Comment: 34 pages, 16 figures, accepted in Future Generation Computer Systems,
October 27, 201
Receive Combining vs. Multi-Stream Multiplexing in Downlink Systems with Multi-Antenna Users
In downlink multi-antenna systems with many users, the multiplexing gain is
strictly limited by the number of transmit antennas and the use of these
antennas. Assuming that the total number of receive antennas at the
multi-antenna users is much larger than , the maximal multiplexing gain can
be achieved with many different transmission/reception strategies. For example,
the excess number of receive antennas can be utilized to schedule users with
effective channels that are near-orthogonal, for multi-stream multiplexing to
users with well-conditioned channels, and/or to enable interference-aware
receive combining. In this paper, we try to answer the question if the data
streams should be divided among few users (many streams per user) or many users
(few streams per user, enabling receive combining). Analytic results are
derived to show how user selection, spatial correlation, heterogeneous user
conditions, and imperfect channel acquisition (quantization or estimation
errors) affect the performance when sending the maximal number of streams or
one stream per scheduled user---the two extremes in data stream allocation.
While contradicting observations on this topic have been reported in prior
works, we show that selecting many users and allocating one stream per user
(i.e., exploiting receive combining) is the best candidate under realistic
conditions. This is explained by the provably stronger resilience towards
spatial correlation and the larger benefit from multi-user diversity. This
fundamental result has positive implications for the design of downlink systems
as it reduces the hardware requirements at the user devices and simplifies the
throughput optimization.Comment: Published in IEEE Transactions on Signal Processing, 16 pages, 11
figures. The results can be reproduced using the following Matlab code:
https://github.com/emilbjornson/one-or-multiple-stream
- …