Search CORE

1,082 research outputs found

Online Matrix Completion with Side Information

Author: Herbster Mark
Pasteris Stephen
Tse Lisa
Publication venue
Publication date: 15/05/2020
Field of study

We give an online algorithm and prove novel mistake and regret bounds for online binary matrix completion with side information. The mistake bounds we prove are of the form

\tilde{O}(D/\gamma^2)

. The term

1/\gamma^2

is analogous to the usual margin term in SVM (perceptron) bounds. More specifically, if we assume that there is some factorization of the underlying

m \times n

matrix into

P Q^\intercal

where the rows of

P

are interpreted as "classifiers" in

\mathcal{R}^d

and the rows of

Q

as "instances" in

\mathcal{R}^d

, then

\gamma

is the maximum (normalized) margin over all factorizations

P Q^\intercal

consistent with the observed matrix. The quasi-dimension term

D

measures the quality of side information. In the presence of vacuous side information,

D= m+n

. However, if the side information is predictive of the underlying factorization of the matrix, then in an ideal case,

D \in O(k + \ell)

where

k

is the number of distinct row factors and

\ell

is the number of distinct column factors. We additionally provide a generalization of our algorithm to the inductive setting. In this setting, we provide an example where the side information is not directly specified in advance. For this example, the quasi-dimension

D

is now bounded by

O(k^2 + \ell^2)

arXiv.org e-Print Archive

UCL Discovery

Surrogate Functions for Maximizing Precision at the Top

Author: Jain Prateek
Kar Purushottam
Narasimhan Harikrishna
Publication venue
Publication date: 26/05/2015
Field of study

The problem of maximizing precision at the top of a ranked list, often dubbed Precision@k (prec@k), finds relevance in myriad learning applications such as ranking, multi-label classification, and learning with severe label imbalance. However, despite its popularity, there exist significant gaps in our understanding of this problem and its associated performance measure. The most notable of these is the lack of a convex upper bounding surrogate for prec@k. We also lack scalable perceptron and stochastic gradient descent algorithms for optimizing this performance measure. In this paper we make key contributions in these directions. At the heart of our results is a family of truly upper bounding surrogates for prec@k. These surrogates are motivated in a principled manner and enjoy attractive properties such as consistency to prec@k under various natural margin/noise conditions. These surrogates are then used to design a class of novel perceptron algorithms for optimizing prec@k with provable mistake bounds. We also devise scalable stochastic gradient descent style methods for this problem with provable convergence bounds. Our proofs rely on novel uniform convergence bounds which require an in-depth analysis of the structural properties of prec@k and its surrogates. We conclude with experimental results comparing our algorithms with state-of-the-art cutting plane and stochastic gradient algorithms for maximizing [email protected]: To appear in the the proceedings of the 32nd International Conference on Machine Learning (ICML 2015

arXiv.org e-Print Archive

CiteSeerX

Selective Sampling with Drift

Author: Crammer Koby
Moroshko Edward
Publication venue
Publication date: 17/02/2014
Field of study

Recently there has been much work on selective sampling, an online active learning setting, in which algorithms work in rounds. On each round an algorithm receives an input and makes a prediction. Then, it can decide whether to query a label, and if so to update its model, otherwise the input is discarded. Most of this work is focused on the stationary case, where it is assumed that there is a fixed target model, and the performance of the algorithm is compared to a fixed model. However, in many real-world applications, such as spam prediction, the best target function may drift over time, or have shifts from time to time. We develop a novel selective sampling algorithm for the drifting setting, analyze it under no assumptions on the mechanism generating the sequence of instances, and derive new mistake bounds that depend on the amount of drift in the problem. Simulations on synthetic and real-world datasets demonstrate the superiority of our algorithms as a selective sampling algorithm in the drifting setting

arXiv.org e-Print Archive

CiteSeerX