7 research outputs found
Interactive System-wise Anomaly Detection
Anomaly detection, where data instances are discovered containing feature
patterns different from the majority, plays a fundamental role in various
applications. However, it is challenging for existing methods to handle the
scenarios where the instances are systems whose characteristics are not readily
observed as data. Appropriate interactions are needed to interact with the
systems and identify those with abnormal responses. Detecting system-wise
anomalies is a challenging task due to several reasons including: how to
formally define the system-wise anomaly detection problem; how to find the
effective activation signal for interacting with systems to progressively
collect the data and learn the detector; how to guarantee stable training in
such a non-stationary scenario with real-time interactions? To address the
challenges, we propose InterSAD (Interactive System-wise Anomaly Detection).
Specifically, first, we adopt Markov decision process to model the interactive
systems, and define anomalous systems as anomalous transition and anomalous
reward systems. Then, we develop an end-to-end approach which includes an
encoder-decoder module that learns system embeddings, and a policy network to
generate effective activation for separating embeddings of normal and anomaly
systems. Finally, we design a training method to stabilize the learning
process, which includes a replay buffer to store historical interaction data
and allow them to be re-sampled. Experiments on two benchmark environments,
including identifying the anomalous robotic systems and detecting user data
poisoning in recommendation models, demonstrate the superiority of InterSAD
compared with state-of-the-art baselines methods
MaxGap Bandit: Adaptive Algorithms for Approximate Ranking
This paper studies the problem of adaptively sampling from K distributions
(arms) in order to identify the largest gap between any two adjacent means. We
call this the MaxGap-bandit problem. This problem arises naturally in
approximate ranking, noisy sorting, outlier detection, and top-arm
identification in bandits. The key novelty of the MaxGap-bandit problem is that
it aims to adaptively determine the natural partitioning of the distributions
into a subset with larger means and a subset with smaller means, where the
split is determined by the largest gap rather than a pre-specified rank or
threshold. Estimating an arm's gap requires sampling its neighboring arms in
addition to itself, and this dependence results in a novel hardness parameter
that characterizes the sample complexity of the problem. We propose elimination
and UCB-style algorithms and show that they are minimax optimal. Our
experiments show that the UCB-style algorithms require 6-8x fewer samples than
non-adaptive sampling to achieve the same error
Sequential Multi-hypothesis Testing in Multi-armed Bandit Problems:An Approach for Asymptotic Optimality
We consider a multi-hypothesis testing problem involving a K-armed bandit.
Each arm's signal follows a distribution from a vector exponential family. The
actual parameters of the arms are unknown to the decision maker. The decision
maker incurs a delay cost for delay until a decision and a switching cost
whenever he switches from one arm to another. His goal is to minimise the
overall cost until a decision is reached on the true hypothesis. Of interest
are policies that satisfy a given constraint on the probability of false
detection. This is a sequential decision making problem where the decision
maker gets only a limited view of the true state of nature at each stage, but
can control his view by choosing the arm to observe at each stage. An
information-theoretic lower bound on the total cost (expected time for a
reliable decision plus total switching cost) is first identified, and a
variation on a sequential policy based on the generalised likelihood ratio
statistic is then studied. Due to the vector exponential family assumption, the
signal processing at each stage is simple; the associated conjugate prior
distribution on the unknown model parameters enables easy updates of the
posterior distribution. The proposed policy, with a suitable threshold for
stopping, is shown to satisfy the given constraint on the probability of false
detection. Under a continuous selection assumption, the policy is also shown to
be asymptotically optimal in terms of the total cost among all policies that
satisfy the constraint on the probability of false detection
Online Sign Identification: Minimization of the Number of Errors in Thresholding Bandits
International audienceIn the fixed budget thresholding bandit problem, an algorithm sequentially allocates a budgeted number of samples to different distributions. It then predicts whether the mean of each distribution is larger or lower than a given threshold. We introduce a large family of algorithms (containing most existing relevant ones), inspired by the Frank-Wolfe algorithm, and provide a thorough yet generic analysis of their performance. This allowed us to construct new explicit algorithms, for a broad class of problems, whose losses are within a small constant factor of the non-adaptive oracle ones. Quite interestingly, we observed that adaptive methods empirically greatly out-perform non-adaptive oracles, an uncommon behavior in standard online learning settings, such as regret minimization. We explain this surprising phenomenon on an insightful toy problem