5,282 research outputs found
Optimal Data Collection For Informative Rankings Expose Well-Connected Graphs
Given a graph where vertices represent alternatives and arcs represent
pairwise comparison data, the statistical ranking problem is to find a
potential function, defined on the vertices, such that the gradient of the
potential function agrees with the pairwise comparisons. Our goal in this paper
is to develop a method for collecting data for which the least squares
estimator for the ranking problem has maximal Fisher information. Our approach,
based on experimental design, is to view data collection as a bi-level
optimization problem where the inner problem is the ranking problem and the
outer problem is to identify data which maximizes the informativeness of the
ranking. Under certain assumptions, the data collection problem decouples,
reducing to a problem of finding multigraphs with large algebraic connectivity.
This reduction of the data collection problem to graph-theoretic questions is
one of the primary contributions of this work. As an application, we study the
Yahoo! Movie user rating dataset and demonstrate that the addition of a small
number of well-chosen pairwise comparisons can significantly increase the
Fisher informativeness of the ranking. As another application, we study the
2011-12 NCAA football schedule and propose schedules with the same number of
games which are significantly more informative. Using spectral clustering
methods to identify highly-connected communities within the division, we argue
that the NCAA could improve its notoriously poor rankings by simply scheduling
more out-of-conference games.Comment: 31 pages, 10 figures, 3 table
Analysis of Crowdsourced Sampling Strategies for HodgeRank with Sparse Random Graphs
Crowdsourcing platforms are now extensively used for conducting subjective
pairwise comparison studies. In this setting, a pairwise comparison dataset is
typically gathered via random sampling, either \emph{with} or \emph{without}
replacement. In this paper, we use tools from random graph theory to analyze
these two random sampling methods for the HodgeRank estimator. Using the
Fiedler value of the graph as a measurement for estimator stability
(informativeness), we provide a new estimate of the Fiedler value for these two
random graph models. In the asymptotic limit as the number of vertices tends to
infinity, we prove the validity of the estimate. Based on our findings, for a
small number of items to be compared, we recommend a two-stage sampling
strategy where a greedy sampling method is used initially and random sampling
\emph{without} replacement is used in the second stage. When a large number of
items is to be compared, we recommend random sampling with replacement as this
is computationally inexpensive and trivially parallelizable. Experiments on
synthetic and real-world datasets support our analysis
Gossip Dual Averaging for Decentralized Optimization of Pairwise Functions
In decentralized networks (of sensors, connected objects, etc.), there is an
important need for efficient algorithms to optimize a global cost function, for
instance to learn a global model from the local data collected by each
computing unit. In this paper, we address the problem of decentralized
minimization of pairwise functions of the data points, where these points are
distributed over the nodes of a graph defining the communication topology of
the network. This general problem finds applications in ranking, distance
metric learning and graph inference, among others. We propose new gossip
algorithms based on dual averaging which aims at solving such problems both in
synchronous and asynchronous settings. The proposed framework is flexible
enough to deal with constrained and regularized variants of the optimization
problem. Our theoretical analysis reveals that the proposed algorithms preserve
the convergence rate of centralized dual averaging up to an additive bias term.
We present numerical simulations on Area Under the ROC Curve (AUC) maximization
and metric learning problems which illustrate the practical interest of our
approach
- …