178 research outputs found
Guarantees for Efficient and Adaptive Online Learning
In this thesis, we study the problem of adaptive online learning in several different settings. We first study the problem of predicting graph labelings online which are assumed to change over time. We develop the machinery of cluster specialists which probabilistically exploit any cluster structure in the graph. We give a mistake-bounded algorithm that surprisingly requires only O(log n) time per trial for an n-vertex graph, an exponential improvement over existing methods.
We then consider the model of non-stationary prediction with expert advice with long-term memory guarantees in the sense of Bousquet and Warmuth, in which we learn a small pool of experts. We consider relative entropy projection-based algorithms, giving a linear-time algorithm that improves on the best known regret bound. We show that such projection updates may be advantageous over previous "weight-sharing" approaches when weight updates come with implicit costs such as in portfolio optimization. We give an algorithm to compute the relative entropy projection onto the simplex with non-uniform (lower) box constraints in linear time, which may be of independent interest.
We finally extend the model of long-term memory by introducing a new model of adaptive long-term memory. Here the small pool is assumed to change over time, with the trial sequence being partitioned into epochs and a small pool associated with each epoch. We give an efficient linear-time regret-bounded algorithm for this setting and present results in the setting of contextual bandits
Mistake Bounds for Binary Matrix Completion
We study the problem of completing a binary matrix in an online learning setting.On each trial we predict a matrix entry and then receive the true entry. We propose a Matrix Exponentiated Gradient algorithm [1] to solve this problem. We provide a mistake bound for the algorithm, which scales with the margin complexity [2, 3] of the underlying matrix. The bound suggests an interpretation where each row of the matrix is a prediction task over a finite set of objects, the columns. Using this we show that the algorithm makes a number of mistakes which is comparable up to a logarithmic factor to the number of mistakes made by the Kernel Perceptron with an optimal kernel in hindsight. We discuss applications of the algorithm to predicting as well as the best biclustering and to the problem of predicting the labeling of a graph without knowing the graph in advance
Sketch-based Randomized Algorithms for Dynamic Graph Regression
A well-known problem in data science and machine learning is {\em linear
regression}, which is recently extended to dynamic graphs. Existing exact
algorithms for updating the solution of dynamic graph regression problem
require at least a linear time (in terms of : the size of the graph).
However, this time complexity might be intractable in practice. In the current
paper, we utilize {\em subsampled randomized Hadamard transform} and
\textsf{CountSketch} to propose the first randomized algorithms. Suppose that
we are given an matrix embedding of the graph, where .
Let be the number of samples required for a guaranteed approximation error,
which is a sublinear function of . Our first algorithm reduces time
complexity of pre-processing to .
Then after an edge insertion or an edge deletion, it updates the approximate
solution in time. Our second algorithm reduces time complexity of
pre-processing to , where is the number of nonzero elements of . Then after
an edge insertion or an edge deletion or a node insertion or a node deletion,
it updates the approximate solution in time, with
. Finally, we show
that under some assumptions, if our first algorithm
outperforms our second algorithm and if our second
algorithm outperforms our first algorithm
Improved Regret Bounds for Tracking Experts with Memory
We address the problem of sequential prediction with expert advice in a
non-stationary environment with long-term memory guarantees in the sense of
Bousquet and Warmuth [4]. We give a linear-time algorithm that improves on the
best known regret bounds [26]. This algorithm incorporates a relative entropy
projection step. This projection is advantageous over previous weight-sharing
approaches in that weight updates may come with implicit costs as in for
example portfolio optimization. We give an algorithm to compute this projection
step in linear time, which may be of independent interest
DeepCut: Joint Subset Partition and Labeling for Multi Person Pose Estimation
This paper considers the task of articulated human pose estimation of
multiple people in real world images. We propose an approach that jointly
solves the tasks of detection and pose estimation: it infers the number of
persons in a scene, identifies occluded body parts, and disambiguates body
parts between people in close proximity of each other. This joint formulation
is in contrast to previous strategies, that address the problem by first
detecting people and subsequently estimating their body pose. We propose a
partitioning and labeling formulation of a set of body-part hypotheses
generated with CNN-based part detectors. Our formulation, an instance of an
integer linear program, implicitly performs non-maximum suppression on the set
of part candidates and groups them to form configurations of body parts
respecting geometric and appearance constraints. Experiments on four different
datasets demonstrate state-of-the-art results for both single person and multi
person pose estimation. Models and code available at
http://pose.mpi-inf.mpg.de.Comment: Accepted at IEEE Conference on Computer Vision and Pattern
Recognition (CVPR 2016
Regression and Singular Value Decomposition in Dynamic Graphs
Most of real-world graphs are {\em dynamic}, i.e., they change over time.
However, while problems such as regression and Singular Value Decomposition
(SVD) have been studied for {\em static} graphs, they have not been
investigated for {\em dynamic} graphs, yet. In this paper, we introduce,
motivate and study regression and SVD over dynamic graphs. First, we present
the notion of {\em update-efficient matrix embedding} that defines the
conditions sufficient for a matrix embedding to be used for the dynamic graph
regression problem (under norm). We prove that given an
update-efficient matrix embedding (e.g., adjacency matrix), after an update
operation in the graph, the optimal solution of the graph regression problem
for the revised graph can be computed in time. We also study dynamic
graph regression under least absolute deviation. Then, we characterize a class
of matrix embeddings that can be used to efficiently update SVD of a dynamic
graph. For adjacency matrix and Laplacian matrix, we study those graph update
operations for which SVD (and low rank approximation) can be updated
efficiently
Online Matrix Completion with Side Information
We give an online algorithm and prove novel mistake and regret bounds for
online binary matrix completion with side information. The mistake bounds we
prove are of the form . The term is
analogous to the usual margin term in SVM (perceptron) bounds. More
specifically, if we assume that there is some factorization of the underlying
matrix into where the rows of are interpreted
as "classifiers" in and the rows of as "instances" in
, then is the maximum (normalized) margin over all
factorizations consistent with the observed matrix. The
quasi-dimension term measures the quality of side information. In the
presence of vacuous side information, . However, if the side
information is predictive of the underlying factorization of the matrix, then
in an ideal case, where is the number of distinct row
factors and is the number of distinct column factors. We additionally
provide a generalization of our algorithm to the inductive setting. In this
setting, we provide an example where the side information is not directly
specified in advance. For this example, the quasi-dimension is now bounded
by
On palimpsests in neural memory: an information theory viewpoint
The finite capacity of neural memory and the
reconsolidation phenomenon suggest it is important to be able
to update stored information as in a palimpsest, where new
information overwrites old information. Moreover, changing
information in memory is metabolically costly. In this paper, we
suggest that information-theoretic approaches may inform the
fundamental limits in constructing such a memory system. In
particular, we define malleable coding, that considers not only
representation length but also ease of representation update,
thereby encouraging some form of recycling to convert an old
codeword into a new one. Malleability cost is the difficulty of
synchronizing compressed versions, and malleable codes are of
particular interest when representing information and modifying
the representation are both expensive. We examine the tradeoff
between compression efficiency and malleability cost, under a
malleability metric defined with respect to a string edit distance.
This introduces a metric topology to the compressed domain. We
characterize the exact set of achievable rates and malleability as
the solution of a subgraph isomorphism problem. This is all done
within the optimization approach to biology framework.Accepted manuscrip
- …