Search CORE

178 research outputs found

Guarantees for Efficient and Adaptive Online Learning

Author: Robinson James David
Publication venue: UCL (University College London)
Publication date: 28/06/2023
Field of study

In this thesis, we study the problem of adaptive online learning in several different settings. We first study the problem of predicting graph labelings online which are assumed to change over time. We develop the machinery of cluster specialists which probabilistically exploit any cluster structure in the graph. We give a mistake-bounded algorithm that surprisingly requires only O(log n) time per trial for an n-vertex graph, an exponential improvement over existing methods. We then consider the model of non-stationary prediction with expert advice with long-term memory guarantees in the sense of Bousquet and Warmuth, in which we learn a small pool of experts. We consider relative entropy projection-based algorithms, giving a linear-time algorithm that improves on the best known regret bound. We show that such projection updates may be advantageous over previous "weight-sharing" approaches when weight updates come with implicit costs such as in portfolio optimization. We give an algorithm to compute the relative entropy projection onto the simplex with non-uniform (lower) box constraints in linear time, which may be of independent interest. We finally extend the model of long-term memory by introducing a new model of adaptive long-term memory. Here the small pool is assumed to change over time, with the trial sequence being partitioned into epochs and a small pool associated with each epoch. We give an efficient linear-time regret-bounded algorithm for this setting and present results in the setting of contextual bandits

UCL Discovery

Mistake Bounds for Binary Matrix Completion

Author: Herbster MJ
Pasteris S
Pontil M
Publication venue: NIPS 2016
Publication date: 01/12/2016
Field of study

We study the problem of completing a binary matrix in an online learning setting.On each trial we predict a matrix entry and then receive the true entry. We propose a Matrix Exponentiated Gradient algorithm [1] to solve this problem. We provide a mistake bound for the algorithm, which scales with the margin complexity [2, 3] of the underlying matrix. The bound suggests an interpretation where each row of the matrix is a prediction task over a finite set of objects, the columns. Using this we show that the algorithm makes a number of mistakes which is comparable up to a logarithmic factor to the number of mistakes made by the Kernel Perceptron with an optimal kernel in hindsight. We discuss applications of the algorithm to predicting as well as the best biclustering and to the problem of predicting the labeling of a graph without knowing the graph in advance

UCL Discovery

Sketch-based Randomized Algorithms for Dynamic Graph Regression

Author: Chehreghani Mostafa Haghir
Publication venue
Publication date: 04/06/2019
Field of study

A well-known problem in data science and machine learning is {\em linear regression}, which is recently extended to dynamic graphs. Existing exact algorithms for updating the solution of dynamic graph regression problem require at least a linear time (in terms of

n

: the size of the graph). However, this time complexity might be intractable in practice. In the current paper, we utilize {\em subsampled randomized Hadamard transform} and \textsf{CountSketch} to propose the first randomized algorithms. Suppose that we are given an

n\times m

matrix embedding

M

of the graph, where

m \ll n

. Let

r

be the number of samples required for a guaranteed approximation error, which is a sublinear function of

n

. Our first algorithm reduces time complexity of pre-processing to

O(n(m + 1) + 2n(m + 1) \log_2(r + 1) + rm^2)

. Then after an edge insertion or an edge deletion, it updates the approximate solution in

O(rm)

time. Our second algorithm reduces time complexity of pre-processing to

O \left( nnz(M) + m^3 \epsilon^{-2} \log^7(m/\epsilon) \right)

, where

nnz(M)

is the number of nonzero elements of

M

. Then after an edge insertion or an edge deletion or a node insertion or a node deletion, it updates the approximate solution in

O(qm)

time, with

q=O\left(\frac{m^2}{\epsilon^2} \log^6(m/\epsilon) \right)

. Finally, we show that under some assumptions, if

\ln n < \epsilon^{-1}

our first algorithm outperforms our second algorithm and if

\ln n \geq \epsilon^{-1}

our second algorithm outperforms our first algorithm

arXiv.org e-Print Archive

Improved Regret Bounds for Tracking Experts with Memory

Author: Herbster Mark
Robinson James
Publication venue
Publication date: 24/06/2021
Field of study

We address the problem of sequential prediction with expert advice in a non-stationary environment with long-term memory guarantees in the sense of Bousquet and Warmuth [4]. We give a linear-time algorithm that improves on the best known regret bounds [26]. This algorithm incorporates a relative entropy projection step. This projection is advantageous over previous weight-sharing approaches in that weight updates may come with implicit costs as in for example portfolio optimization. We give an algorithm to compute this projection step in linear time, which may be of independent interest

arXiv.org e-Print Archive

UCL Discovery

DeepCut: Joint Subset Partition and Labeling for Multi Person Pose Estimation

Author: Andres Bjoern
Andriluka Mykhaylo
Gehler Peter
Insafutdinov Eldar
Pishchulin Leonid
Schiele Bernt
Tang Siyu
Publication venue
Publication date: 01/01/2016
Field of study

This paper considers the task of articulated human pose estimation of multiple people in real world images. We propose an approach that jointly solves the tasks of detection and pose estimation: it infers the number of persons in a scene, identifies occluded body parts, and disambiguates body parts between people in close proximity of each other. This joint formulation is in contrast to previous strategies, that address the problem by first detecting people and subsequently estimating their body pose. We propose a partitioning and labeling formulation of a set of body-part hypotheses generated with CNN-based part detectors. Our formulation, an instance of an integer linear program, implicitly performs non-maximum suppression on the set of part candidates and groups them to form configurations of body parts respecting geometric and appearance constraints. Experiments on four different datasets demonstrate state-of-the-art results for both single person and multi person pose estimation. Models and code available at http://pose.mpi-inf.mpg.de.Comment: Accepted at IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2016

arXiv.org e-Print Archive

Crossref

CISPA – Helmholtz-Zentrum für Informationssicherheit

MPG.PuRe

Regression and Singular Value Decomposition in Dynamic Graphs

Author: Chehreghani Mostafa Haghir
Publication venue
Publication date: 09/04/2020
Field of study

Most of real-world graphs are {\em dynamic}, i.e., they change over time. However, while problems such as regression and Singular Value Decomposition (SVD) have been studied for {\em static} graphs, they have not been investigated for {\em dynamic} graphs, yet. In this paper, we introduce, motivate and study regression and SVD over dynamic graphs. First, we present the notion of {\em update-efficient matrix embedding} that defines the conditions sufficient for a matrix embedding to be used for the dynamic graph regression problem (under

l_2

norm). We prove that given an

n \times m

update-efficient matrix embedding (e.g., adjacency matrix), after an update operation in the graph, the optimal solution of the graph regression problem for the revised graph can be computed in

O(nm)

time. We also study dynamic graph regression under least absolute deviation. Then, we characterize a class of matrix embeddings that can be used to efficiently update SVD of a dynamic graph. For adjacency matrix and Laplacian matrix, we study those graph update operations for which SVD (and low rank approximation) can be updated efficiently

arXiv.org e-Print Archive

Online Matrix Completion with Side Information

Author: Herbster Mark
Pasteris Stephen
Tse Lisa
Publication venue
Publication date: 15/05/2020
Field of study

We give an online algorithm and prove novel mistake and regret bounds for online binary matrix completion with side information. The mistake bounds we prove are of the form

\tilde{O}(D/\gamma^2)

. The term

1/\gamma^2

is analogous to the usual margin term in SVM (perceptron) bounds. More specifically, if we assume that there is some factorization of the underlying

m \times n

matrix into

P Q^\intercal

where the rows of

P

are interpreted as "classifiers" in

\mathcal{R}^d

and the rows of

Q

as "instances" in

\mathcal{R}^d

, then

\gamma

is the maximum (normalized) margin over all factorizations

P Q^\intercal

consistent with the observed matrix. The quasi-dimension term

D

measures the quality of side information. In the presence of vacuous side information,

D= m+n

. However, if the side information is predictive of the underlying factorization of the matrix, then in an ideal case,

D \in O(k + \ell)

where

k

is the number of distinct row factors and

\ell

is the number of distinct column factors. We additionally provide a generalization of our algorithm to the inductive setting. In this setting, we provide an example where the side information is not directly specified in advance. For this example, the quasi-dimension

D

is now bounded by

O(k^2 + \ell^2)

arXiv.org e-Print Archive

UCL Discovery

On palimpsests in neural memory: an information theory viewpoint

Author: Goyal Vivek K.
Kusuma Julius
Varshney Lav R.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/12/2016
Field of study

The finite capacity of neural memory and the reconsolidation phenomenon suggest it is important to be able to update stored information as in a palimpsest, where new information overwrites old information. Moreover, changing information in memory is metabolically costly. In this paper, we suggest that information-theoretic approaches may inform the fundamental limits in constructing such a memory system. In particular, we define malleable coding, that considers not only representation length but also ease of representation update, thereby encouraging some form of recycling to convert an old codeword into a new one. Malleability cost is the difficulty of synchronizing compressed versions, and malleable codes are of particular interest when representing information and modifying the representation are both expensive. We examine the tradeoff between compression efficiency and malleability cost, under a malleability metric defined with respect to a string edit distance. This introduces a metric topology to the compressed domain. We characterize the exact set of achievable rates and malleability as the solution of a subgraph isomorphism problem. This is all done within the optimization approach to biology framework.Accepted manuscrip

Boston University Institutional Repository (OpenBU)