13,183 research outputs found
Processing count queries over event streams at multiple time granularities
Management and analysis of streaming data has become crucial with its applications in web, sensor data, network tra c data, and stock market. Data streams consist of mostly numeric data but what is more interesting is the events derived from the numerical data that need to be monitored. The events obtained from streaming data form event streams. Event streams have similar properties to data streams, i.e., they are seen only once in a fixed order as a continuous stream. Events appearing in the event stream have time stamps associated with them in a certain time granularity, such as second, minute, or hour. One type of frequently asked queries over event streams is count queries, i.e., the frequency of an event occurrence over time. Count queries can be answered over event streams easily, however, users may ask queries over di erent time granularities as well. For example, a broker may ask how many times a stock increased in the same time frame, where the time frames specified could be hour, day, or both. This is crucial especially in the case of event streams where only a window of an event stream is available at a certain time instead of the whole stream. In this paper, we propose a technique for predicting the frequencies of event occurrences in event streams at multiple time granularities. The proposed approximation method e ciently estimates the count of events with a high accuracy in an event stream at any time granularity by examining the distance distributions of event occurrences. The proposed method has been implemented and tested on di erent real data sets and the results obtained are presented to show its e ectiveness
Streaming, Distributed Variational Inference for Bayesian Nonparametrics
This paper presents a methodology for creating streaming, distributed
inference algorithms for Bayesian nonparametric (BNP) models. In the proposed
framework, processing nodes receive a sequence of data minibatches, compute a
variational posterior for each, and make asynchronous streaming updates to a
central model. In contrast to previous algorithms, the proposed framework is
truly streaming, distributed, asynchronous, learning-rate-free, and
truncation-free. The key challenge in developing the framework, arising from
the fact that BNP models do not impose an inherent ordering on their
components, is finding the correspondence between minibatch and central BNP
posterior components before performing each update. To address this, the paper
develops a combinatorial optimization problem over component correspondences,
and provides an efficient solution technique. The paper concludes with an
application of the methodology to the DP mixture model, with experimental
results demonstrating its practical scalability and performance.Comment: This paper was presented at NIPS 2015. Please use the following
BibTeX citation: @inproceedings{Campbell15_NIPS, Author = {Trevor Campbell
and Julian Straub and John W. {Fisher III} and Jonathan P. How}, Title =
{Streaming, Distributed Variational Inference for Bayesian Nonparametrics},
Booktitle = {Advances in Neural Information Processing Systems (NIPS)}, Year
= {2015}
A taxonomy framework for unsupervised outlier detection techniques for multi-type data sets
The term "outlier" can generally be defined as an observation that is significantly different from
the other values in a data set. The outliers may be instances of error or indicate events. The
task of outlier detection aims at identifying such outliers in order to improve the analysis of
data and further discover interesting and useful knowledge about unusual events within numerous
applications domains. In this paper, we report on contemporary unsupervised outlier detection
techniques for multiple types of data sets and provide a comprehensive taxonomy framework and
two decision trees to select the most suitable technique based on data set. Furthermore, we
highlight the advantages, disadvantages and performance issues of each class of outlier detection
techniques under this taxonomy framework
Finding Street Gang Members on Twitter
Most street gang members use Twitter to intimidate others, to present
outrageous images and statements to the world, and to share recent illegal
activities. Their tweets may thus be useful to law enforcement agencies to
discover clues about recent crimes or to anticipate ones that may occur.
Finding these posts, however, requires a method to discover gang member Twitter
profiles. This is a challenging task since gang members represent a very small
population of the 320 million Twitter users. This paper studies the problem of
automatically finding gang members on Twitter. It outlines a process to curate
one of the largest sets of verifiable gang member profiles that have ever been
studied. A review of these profiles establishes differences in the language,
images, YouTube links, and emojis gang members use compared to the rest of the
Twitter population. Features from this review are used to train a series of
supervised classifiers. Our classifier achieves a promising F1 score with a low
false positive rate.Comment: 8 pages, 9 figures, 2 tables, Published as a full paper at 2016
IEEE/ACM International Conference on Advances in Social Networks Analysis and
Mining (ASONAM 2016
- …