170 research outputs found
Submodularity in Batch Active Learning and Survey Problems on Gaussian Random Fields
Many real-world datasets can be represented in the form of a graph whose edge
weights designate similarities between instances. A discrete Gaussian random
field (GRF) model is a finite-dimensional Gaussian process (GP) whose prior
covariance is the inverse of a graph Laplacian. Minimizing the trace of the
predictive covariance Sigma (V-optimality) on GRFs has proven successful in
batch active learning classification problems with budget constraints. However,
its worst-case bound has been missing. We show that the V-optimality on GRFs as
a function of the batch query set is submodular and hence its greedy selection
algorithm guarantees an (1-1/e) approximation ratio. Moreover, GRF models have
the absence-of-suppressor (AofS) condition. For active survey problems, we
propose a similar survey criterion which minimizes 1'(Sigma)1. In practice,
V-optimality criterion performs better than GPs with mutual information gain
criteria and allows nonuniform costs for different nodes
Modeling Local Video Statistics for Anomaly Detection
This paper promotes a probabilistic approach for building models of local video statistics for use in background subtraction schemes. By shifting into a probabilistic framework, additional analytical tools become available for the creation and evaluation of these models. This paper continues to suggest the use of nonparametric statistical methods for measuring the quality of efficient local spatio-temporal models of video background distributions. Beginning with the familiar relative entropy distance between probability distributions, we create a new distance measure that can be used to quantitatively measure the quality of a probabilistic background model
Anomaly Detection and Removal Using Non-Stationary Gaussian Processes
This paper proposes a novel Gaussian process approach to fault removal in
time-series data. Fault removal does not delete the faulty signal data but,
instead, massages the fault from the data. We assume that only one fault occurs
at any one time and model the signal by two separate non-parametric Gaussian
process models for both the physical phenomenon and the fault. In order to
facilitate fault removal we introduce the Markov Region Link kernel for
handling non-stationary Gaussian processes. This kernel is piece-wise
stationary but guarantees that functions generated by it and their derivatives
(when required) are everywhere continuous. We apply this kernel to the removal
of drift and bias errors in faulty sensor data and also to the recovery of EOG
artifact corrupted EEG signals.Comment: 9 pages, 14 figure
Propagation Kernels
We introduce propagation kernels, a general graph-kernel framework for
efficiently measuring the similarity of structured data. Propagation kernels
are based on monitoring how information spreads through a set of given graphs.
They leverage early-stage distributions from propagation schemes such as random
walks to capture structural information encoded in node labels, attributes, and
edge information. This has two benefits. First, off-the-shelf propagation
schemes can be used to naturally construct kernels for many graph types,
including labeled, partially labeled, unlabeled, directed, and attributed
graphs. Second, by leveraging existing efficient and informative propagation
schemes, propagation kernels can be considerably faster than state-of-the-art
approaches without sacrificing predictive performance. We will also show that
if the graphs at hand have a regular structure, for instance when modeling
image or video data, one can exploit this regularity to scale the kernel
computation to large databases of graphs with thousands of nodes. We support
our contributions by exhaustive experiments on a number of real-world graphs
from a variety of application domains
Active search in intensionally specified structured spaces
We consider an active search problem in intensionally specified structured spaces. The ultimate goal in this setting is to discover structures from structurally different partitions of a fixed but unknown target class. An example of such a process is that of computer-aided de novo drug design. In the past 20 years several Monte Carlo search heuristics have been developed for this process. Motivated by these hand-crafted search heuristics, we devise a Metropolis--Hastings sampling scheme where the acceptance probability is given by a probabilistic surrogate of the target property, modeled with a max entropy conditional model. The surrogate model is updated in each iteration upon the evaluation of a selected structure. The proposed approach is consistent and the empirical evidence indicates that it achieves a large structural variety of discovered targets
- …