89,023 research outputs found
The ABACOC Algorithm: a Novel Approach for Nonparametric Classification of Data Streams
Stream mining poses unique challenges to machine learning: predictive models
are required to be scalable, incrementally trainable, must remain bounded in
size (even when the data stream is arbitrarily long), and be nonparametric in
order to achieve high accuracy even in complex and dynamic environments.
Moreover, the learning system must be parameterless ---traditional tuning
methods are problematic in streaming settings--- and avoid requiring prior
knowledge of the number of distinct class labels occurring in the stream. In
this paper, we introduce a new algorithmic approach for nonparametric learning
in data streams. Our approach addresses all above mentioned challenges by
learning a model that covers the input space using simple local classifiers.
The distribution of these classifiers dynamically adapts to the local (unknown)
complexity of the classification problem, thus achieving a good balance between
model complexity and predictive accuracy. We design four variants of our
approach of increasing adaptivity. By means of an extensive empirical
evaluation against standard nonparametric baselines, we show state-of-the-art
results in terms of accuracy versus model size. For the variant that imposes a
strict bound on the model size, we show better performance against all other
methods measured at the same model size value. Our empirical analysis is
complemented by a theoretical performance guarantee which does not rely on any
stochastic assumption on the source generating the stream
Clustering Memes in Social Media
The increasing pervasiveness of social media creates new opportunities to
study human social behavior, while challenging our capability to analyze their
massive data streams. One of the emerging tasks is to distinguish between
different kinds of activities, for example engineered misinformation campaigns
versus spontaneous communication. Such detection problems require a formal
definition of meme, or unit of information that can spread from person to
person through the social network. Once a meme is identified, supervised
learning methods can be applied to classify different types of communication.
The appropriate granularity of a meme, however, is hardly captured from
existing entities such as tags and keywords. Here we present a framework for
the novel task of detecting memes by clustering messages from large streams of
social data. We evaluate various similarity measures that leverage content,
metadata, network features, and their combinations. We also explore the idea of
pre-clustering on the basis of existing entities. A systematic evaluation is
carried out using a manually curated dataset as ground truth. Our analysis
shows that pre-clustering and a combination of heterogeneous features yield the
best trade-off between number of clusters and their quality, demonstrating that
a simple combination based on pairwise maximization of similarity is as
effective as a non-trivial optimization of parameters. Our approach is fully
automatic, unsupervised, and scalable for real-time detection of memes in
streaming data.Comment: Proceedings of the 2013 IEEE/ACM International Conference on Advances
in Social Networks Analysis and Mining (ASONAM'13), 201
- …