Search CORE

79,146 research outputs found

Active Sampling of Pairs and Points for Large-scale Linear Bipartite Ranking

Author: Lin Hsuan-Tien
Shen Wei-Yuan
Publication venue
Publication date: 24/08/2017
Field of study

Bipartite ranking is a fundamental ranking problem that learns to order relevant instances ahead of irrelevant ones. The pair-wise approach for bi-partite ranking construct a quadratic number of pairs to solve the problem, which is infeasible for large-scale data sets. The point-wise approach, albeit more efficient, often results in inferior performance. That is, it is difficult to conduct bipartite ranking accurately and efficiently at the same time. In this paper, we develop a novel active sampling scheme within the pair-wise approach to conduct bipartite ranking efficiently. The scheme is inspired from active learning and can reach a competitive ranking performance while focusing only on a small subset of the many pairs during training. Moreover, we propose a general Combined Ranking and Classification (CRC) framework to accurately conduct bipartite ranking. The framework unifies point-wise and pair-wise approaches and is simply based on the idea of treating each instance point as a pseudo-pair. Experiments on 14 real-word large-scale data sets demonstrate that the proposed algorithm of Active Sampling within CRC, when coupled with a linear Support Vector Machine, usually outperforms state-of-the-art point-wise and pair-wise ranking approaches in terms of both accuracy and efficiency.Comment: a shorter version was presented in ACML 201

arXiv.org e-Print Archive

CiteSeerX

Active classification with comparison queries

Author: Kane Daniel M.
Lovett Shachar
Moran Shay
Zhang Jiapeng
Publication venue
Publication date: 01/06/2017
Field of study

We study an extension of active learning in which the learning algorithm may ask the annotator to compare the distances of two examples from the boundary of their label-class. For example, in a recommendation system application (say for restaurants), the annotator may be asked whether she liked or disliked a specific restaurant (a label query); or which one of two restaurants did she like more (a comparison query). We focus on the class of half spaces, and show that under natural assumptions, such as large margin or bounded bit-description of the input examples, it is possible to reveal all the labels of a sample of size

n

using approximately

O(\log n)

queries. This implies an exponential improvement over classical active learning, where only label queries are allowed. We complement these results by showing that if any of these assumptions is removed then, in the worst case,

\Omega(n)

queries are required. Our results follow from a new general framework of active learning with additional queries. We identify a combinatorial dimension, called the \emph{inference dimension}, that captures the query complexity when each additional query is determined by

O(1)

examples (such as comparison queries, each of which is determined by the two compared examples). Our results for half spaces follow by bounding the inference dimension in the cases discussed above.Comment: 23 pages (not including references), 1 figure. The new version contains a minor fix in the proof of Lemma 4.

arXiv.org e-Print Archive

Crossref

Network Model Selection for Task-Focused Attributed Network Inference

Author: Berger-Wolf Tanya Y.
Brugere Ivan
Kanich Chris
Publication venue
Publication date: 16/09/2017
Field of study

Networks are models representing relationships between entities. Often these relationships are explicitly given, or we must learn a representation which generalizes and predicts observed behavior in underlying individual data (e.g. attributes or labels). Whether given or inferred, choosing the best representation affects subsequent tasks and questions on the network. This work focuses on model selection to evaluate network representations from data, focusing on fundamental predictive tasks on networks. We present a modular methodology using general, interpretable network models, task neighborhood functions found across domains, and several criteria for robust model selection. We demonstrate our methodology on three online user activity datasets and show that network model selection for the appropriate network task vs. an alternate task increases performance by an order of magnitude in our experiments

arXiv.org e-Print Archive

Crossref

Learning preferences for large scale multi-label problems

Author: CW Hsu
G Ou
G Tsoumakas
G Tsoumakas
J Allan
J Read
ML Zhang
S Vembu
Y Freund
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2018
Field of study

Despite that the majority of machine learning approaches aim to solve binary classification problems, several real-world applications require specialized algorithms able to handle many different classes, as in the case of single-label multi-class and multi-label classification problems. The Label Ranking framework is a generalization of the above mentioned settings, which aims to map instances from the input space to a total order over the set of possible labels. However, generally these algorithms are more complex than binary ones, and their application on large-scale datasets could be untractable. The main contribution of this work is the proposal of a novel general online preference-based label ranking framework. The proposed framework is able to solve binary, multi-class, multi-label and ranking problems. A comparison with other baselines has been performed, showing effectiveness and efficiency in a real-world large-scale multi-label task

Crossref

Archivio della ricerca - Fondazione Bruno Kessler

ZORA

Archivio istituzionale della ricerca - Università di Padova