22 research outputs found
Uncertainty quantification in graph-based classification of high dimensional data
Classification of high dimensional data finds wide-ranging applications. In
many of these applications equipping the resulting classification with a
measure of uncertainty may be as important as the classification itself. In
this paper we introduce, develop algorithms for, and investigate the properties
of, a variety of Bayesian models for the task of binary classification; via the
posterior distribution on the classification labels, these methods
automatically give measures of uncertainty. The methods are all based around
the graph formulation of semi-supervised learning.
We provide a unified framework which brings together a variety of methods
which have been introduced in different communities within the mathematical
sciences. We study probit classification in the graph-based setting, generalize
the level-set method for Bayesian inverse problems to the classification
setting, and generalize the Ginzburg-Landau optimization-based classifier to a
Bayesian setting; we also show that the probit and level set approaches are
natural relaxations of the harmonic function approach introduced in [Zhu et al
2003].
We introduce efficient numerical methods, suited to large data-sets, for both
MCMC-based sampling as well as gradient-based MAP estimation. Through numerical
experiments we study classification accuracy and uncertainty quantification for
our models; these experiments showcase a suite of datasets commonly used to
evaluate graph-based semi-supervised learning algorithms.Comment: 33 pages, 14 figure
Efficient Graph-Based Active Learning with Probit Likelihood via Gaussian Approximations
We present a novel adaptation of active learning to graph-based
semi-supervised learning (SSL) under non-Gaussian Bayesian models. We present
an approximation of non-Gaussian distributions to adapt previously
Gaussian-based acquisition functions to these more general cases. We develop an
efficient rank-one update for applying "look-ahead" based methods as well as
model retraining. We also introduce a novel "model change" acquisition function
based on these approximations that further expands the available collection of
active learning acquisition functions for such methods.Comment: Accepted in ICML Workshop on Real World Experiment Design and Active
Learning 202
Semi-Supervised First-Person Activity Recognition in Body-Worn Video
Body-worn cameras are now commonly used for logging daily life, sports, and
law enforcement activities, creating a large volume of archived footage. This
paper studies the problem of classifying frames of footage according to the
activity of the camera-wearer with an emphasis on application to real-world
police body-worn video. Real-world datasets pose a different set of challenges
from existing egocentric vision datasets: the amount of footage of different
activities is unbalanced, the data contains personally identifiable
information, and in practice it is difficult to provide substantial training
footage for a supervised approach. We address these challenges by extracting
features based exclusively on motion information then segmenting the video
footage using a semi-supervised classification algorithm. On publicly available
datasets, our method achieves results comparable to, if not better than,
supervised and/or deep learning methods using a fraction of the training data.
It also shows promising results on real-world police body-worn video
Estimating network dimension when the spectrum struggles
What is the dimension of a network? Here, we view it as the smallest dimension of Euclidean space into which nodes can be embedded so that pairwise distances accurately reflect the connectivity structure. We show that a recently proposed and extremely efficient algorithm for data clouds, based on computing first- and second-nearest neighbour distances, can be used as the basis of an approach for estimating the dimension of a network with weighted edges. We also show how the algorithm can be extended to unweighted networks when combined with spectral embedding. We illustrate the advantages of this technique over the widely used approach of characterizing dimension by visually searching for a suitable gap in the spectrum of the Laplacian
Stochastic Block Models are a Discrete Surface Tension
Networks, which represent agents and interactions between them, arise in
myriad applications throughout the sciences, engineering, and even the
humanities. To understand large-scale structure in a network, a common task is
to cluster a network's nodes into sets called "communities", such that there
are dense connections within communities but sparse connections between them. A
popular and statistically principled method to perform such clustering is to
use a family of generative models known as stochastic block models (SBMs). In
this paper, we show that maximum likelihood estimation in an SBM is a network
analog of a well-known continuum surface-tension problem that arises from an
application in metallurgy. To illustrate the utility of this relationship, we
implement network analogs of three surface-tension algorithms, with which we
successfully recover planted community structure in synthetic networks and
which yield fascinating insights on empirical networks that we construct from
hyperspectral videos.Comment: to appear in Journal of Nonlinear Scienc