Search CORE

10 research outputs found

Online Similarity Prediction of Networked Data from Known and Unknown Graphs

Author: Gentile C
Herbster M
Pasteris S
Publication venue: Journal of Machine Learning Research
Publication date: 01/01/2013
Field of study

We consider online similarity prediction problems over networked data. We begin by relating this task to the more standard class prediction problem, showing that, given an arbitrary algorithm for class prediction, we can construct an algorithm for similarity prediction with "nearly" the same mistake bound, and vice versa. After noticing that this general construction is computationally infeasible, we target our study to {\em feasible} similarity prediction algorithms on networked data. We initially assume that the network structure is {\em known} to the learner. Here we observe that Matrix Winnow \cite{w07} has a near-optimal mistake guarantee, at the price of cubic prediction time per round. This motivates our effort for an efficient implementation of a Perceptron algorithm with a weaker mistake guarantee but with only poly-logarithmic prediction time. Our focus then turns to the challenging case of networks whose structure is initially {\em unknown} to the learner. In this novel setting, where the network structure is only incrementally revealed, we obtain a mistake-bounded algorithm with a quadratic prediction time per round

arXiv.org e-Print Archive

CiteSeerX

Archivio istituzionale della ricerca - Università dell'Insubria

UCL Discovery

Mistake Bounds for Binary Matrix Completion

Author: Herbster MJ
Pasteris S
Pontil M
Publication venue: NIPS 2016
Publication date: 01/12/2016
Field of study

We study the problem of completing a binary matrix in an online learning setting.On each trial we predict a matrix entry and then receive the true entry. We propose a Matrix Exponentiated Gradient algorithm [1] to solve this problem. We provide a mistake bound for the algorithm, which scales with the margin complexity [2, 3] of the underlying matrix. The bound suggests an interpretation where each row of the matrix is a prediction task over a finite set of objects, the columns. Using this we show that the algorithm makes a number of mistakes which is comparable up to a logarithmic factor to the number of mistakes made by the Kernel Perceptron with an optimal kernel in hindsight. We discuss applications of the algorithm to predicting as well as the best biclustering and to the problem of predicting the labeling of a graph without knowing the graph in advance

UCL Discovery

On similarity prediction and pairwise clustering

Author: Claudio Gentile
Mark Herbster
Stephen Pasteris
Vitale Fabio
Publication venue
Publication date: 01/01/2018
Field of study

We consider the problem of clustering a finite set of items from pairwise similarity information. Unlike what is done in the literature on this subject, we do so in a passive learning setting, and with no specific constraints on the cluster shapes other than their size. We investigate the problem in different settings: i. an online setting, where we provide a tight characterization of the prediction complexity in the mistake bound model, and ii. a standard stochastic batch setting, where we give tight upper and lower bounds on the achievable generalization error. Prediction performance is measured both in terms of the ability to recover the similarity function encoding the hidden clustering and in terms of how well we classify each item within the set. The proposed algorithms are time efficient

Archivio della ricerca- Università di Roma La Sapienza

Online Matrix Completion with Side Information

Author: Herbster Mark
Pasteris Stephen
Tse Lisa
Publication venue
Publication date: 15/05/2020
Field of study

We give an online algorithm and prove novel mistake and regret bounds for online binary matrix completion with side information. The mistake bounds we prove are of the form

\tilde{O}(D/\gamma^2)

. The term

1/\gamma^2

is analogous to the usual margin term in SVM (perceptron) bounds. More specifically, if we assume that there is some factorization of the underlying

m \times n

matrix into

P Q^\intercal

where the rows of

P

are interpreted as "classifiers" in

\mathcal{R}^d

and the rows of

Q

as "instances" in

\mathcal{R}^d

, then

\gamma

is the maximum (normalized) margin over all factorizations

P Q^\intercal

consistent with the observed matrix. The quasi-dimension term

D

measures the quality of side information. In the presence of vacuous side information,

D= m+n

. However, if the side information is predictive of the underlying factorization of the matrix, then in an ideal case,

D \in O(k + \ell)

where

k

is the number of distinct row factors and

\ell

is the number of distinct column factors. We additionally provide a generalization of our algorithm to the inductive setting. In this setting, we provide an example where the side information is not directly specified in advance. For this example, the quasi-dimension

D

is now bounded by

O(k^2 + \ell^2)

arXiv.org e-Print Archive

UCL Discovery

Online Reciprocal Recommendation with Theoretical Performance Guarantees

Author: Gentile Claudio
Parotsidis Nikos
Vitale Fabio
Publication venue
Publication date: 04/06/2018
Field of study

A reciprocal recommendation problem is one where the goal of learning is not just to predict a user's preference towards a passive item (e.g., a book), but to recommend the targeted user on one side another user from the other side such that a mutual interest between the two exists. The problem thus is sharply different from the more traditional items-to-users recommendation, since a good match requires meeting the preferences of both users. We initiate a rigorous theoretical investigation of the reciprocal recommendation task in a specific framework of sequential learning. We point out general limitations, formulate reasonable assumptions enabling effective learning and, under these assumptions, we design and analyze a computationally efficient algorithm that uncovers mutual likes at a pace comparable to those achieved by a clearvoyant algorithm knowing all user preferences in advance. Finally, we validate our algorithm against synthetic and real-world datasets, showing improved empirical performance over simple baselines

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

HAL Descartes

Hal-Diderot

On Similarity Prediction and Pairwise Clustering

Author: Gentile Claudio
Herbster Mark
Pasteris Stephen
Vitale Fabio
Publication venue: HAL CCSD
Publication date: 01/01/2018
Field of study

International audienceWe consider the problem of clustering a finite set of items from pairwise similarity information. Unlike what is done in the literature on this subject, we do so in a passive learning setting, and with no specific constraints on the cluster shapes other than their size. We investigate the problem in different settings: i. an online setting, where we provide a tight characterization of the prediction complexity in the mistake bound model, and ii. a standard stochastic batch setting, where we give tight upper and lower bounds on the achievable generalization error. Prediction performance is measured both in terms of the ability to recover the similarity function encoding the hidden clustering and in terms of how well we classify each item within the set. The proposed algorithms are time efficient

INRIA a CCSD electronic archive server

HAL Descartes

UCL Discovery

Archivio della ricerca- Università di Roma La Sapienza

Hal-Diderot

Efficient algorithms for online learning over graphs

Author: Pasteris SU
Publication venue: UCL (University College London)
Publication date: 28/09/2016
Field of study

In this thesis we consider the problem of online learning with labelled graphs, in particular designing algorithms that can perform this problem quickly and with low memory requirements. We consider the tasks of Classification (in which we are asked to predict the labels of vertices) and Similarity Prediction (in which we are asked to predict whether two given vertices have the same label). The first half of the thesis considers non- probabilistic online learning, where there is no probability distribution on the labelling and we bound the number of mistakes of an algorithm by a function of the labelling’s complexity (i.e. its “naturalness"), often the cut- size. The second half of the thesis considers probabilistic machine learning in which we have a known probability distribution on the labelling. Before considering probabilistic online learning we first analyse the junction tree algorithm, on which we base our online algorithms, and design a new ver- sion of it, superior to the otherwise current state of the art. Explicitly, the novel contributions of this thesis are as follows: • A new algorithm for online prediction of the labelling of a graph which has better performance than previous algorithms on certain graph and labelling families. • Two algorithms for online similarity prediction on a graph (a novel problem solved in this thesis). One performs very well whilst the other not so well but which runs exponentially faster. • A new (better than before, in terms of time and space complexity) state of the art junction tree algorithm, as well as an application of it to the problem of online learning in an Ising model. • An algorithm that, in linear time, finds the optimal junction tree for online inference in tree-structured Ising models, the resulting online junction tree algorithm being far superior to the previous state of the art. All claims in this thesis are supported by mathematical proofs

UCL Discovery