25 research outputs found
The ABACOC Algorithm: a Novel Approach for Nonparametric Classification of Data Streams
Stream mining poses unique challenges to machine learning: predictive models
are required to be scalable, incrementally trainable, must remain bounded in
size (even when the data stream is arbitrarily long), and be nonparametric in
order to achieve high accuracy even in complex and dynamic environments.
Moreover, the learning system must be parameterless ---traditional tuning
methods are problematic in streaming settings--- and avoid requiring prior
knowledge of the number of distinct class labels occurring in the stream. In
this paper, we introduce a new algorithmic approach for nonparametric learning
in data streams. Our approach addresses all above mentioned challenges by
learning a model that covers the input space using simple local classifiers.
The distribution of these classifiers dynamically adapts to the local (unknown)
complexity of the classification problem, thus achieving a good balance between
model complexity and predictive accuracy. We design four variants of our
approach of increasing adaptivity. By means of an extensive empirical
evaluation against standard nonparametric baselines, we show state-of-the-art
results in terms of accuracy versus model size. For the variant that imposes a
strict bound on the model size, we show better performance against all other
methods measured at the same model size value. Our empirical analysis is
complemented by a theoretical performance guarantee which does not rely on any
stochastic assumption on the source generating the stream
Adapting to the Shifting Intent of Search Queries
Search engines today present results that are often oblivious to abrupt
shifts in intent. For example, the query `independence day' usually refers to a
US holiday, but the intent of this query abruptly changed during the release of
a major film by that name. While no studies exactly quantify the magnitude of
intent-shifting traffic, studies suggest that news events, seasonal topics, pop
culture, etc account for 50% of all search queries. This paper shows that the
signals a search engine receives can be used to both determine that a shift in
intent has happened, as well as find a result that is now more relevant. We
present a meta-algorithm that marries a classifier with a bandit algorithm to
achieve regret that depends logarithmically on the number of query impressions,
under certain assumptions. We provide strong evidence that this regret is close
to the best achievable. Finally, via a series of experiments, we demonstrate
that our algorithm outperforms prior approaches, particularly as the amount of
intent-shifting traffic increases.Comment: This is the full version of the paper in NIPS'0
RELEAF: An Algorithm for Learning and Exploiting Relevance
Recommender systems, medical diagnosis, network security, etc., require
on-going learning and decision-making in real time. These -- and many others --
represent perfect examples of the opportunities and difficulties presented by
Big Data: the available information often arrives from a variety of sources and
has diverse features so that learning from all the sources may be valuable but
integrating what is learned is subject to the curse of dimensionality. This
paper develops and analyzes algorithms that allow efficient learning and
decision-making while avoiding the curse of dimensionality. We formalize the
information available to the learner/decision-maker at a particular time as a
context vector which the learner should consider when taking actions. In
general the context vector is very high dimensional, but in many settings, the
most relevant information is embedded into only a few relevant dimensions. If
these relevant dimensions were known in advance, the problem would be simple --
but they are not. Moreover, the relevant dimensions may be different for
different actions. Our algorithm learns the relevant dimensions for each
action, and makes decisions based in what it has learned. Formally, we build on
the structure of a contextual multi-armed bandit by adding and exploiting a
relevance relation. We prove a general regret bound for our algorithm whose
time order depends only on the maximum number of relevant dimensions among all
the actions, which in the special case where the relevance relation is
single-valued (a function), reduces to ; in the
absence of a relevance relation, the best known contextual bandit algorithms
achieve regret , where is the full dimension of
the context vector.Comment: to appear in IEEE Journal of Selected Topics in Signal Processing,
201
Dataretrieving for varied in different Composition Databases using Content aggregation
Keeping in mind with a variety of content choices, consumers are exhibiting diverse preferences for content; their preferences often depend on the context in which they consume content as well as various exogenous events. To satisfy the consumers� demand for such diverse content, multimedia content aggregators (CAs) haveemerged which gather content from numerous multimedia sources. A key challenge for such systems is to accurately predict whattype of content each of its consumers prefers in a certain context,and adapt these predictions to the evolving consumers preferences, contexts, and content characteristics This paper addressesgenerate text based file data sets, such as word, text files, image file data sets, and video file data sets, It also extract data from multiple databases, evaluate user preference based query, reduce time complexity by clustering data, and increase fetching speed by using query classification