21 research outputs found
Analysis of Crowdsourced Sampling Strategies for HodgeRank with Sparse Random Graphs
Crowdsourcing platforms are now extensively used for conducting subjective
pairwise comparison studies. In this setting, a pairwise comparison dataset is
typically gathered via random sampling, either \emph{with} or \emph{without}
replacement. In this paper, we use tools from random graph theory to analyze
these two random sampling methods for the HodgeRank estimator. Using the
Fiedler value of the graph as a measurement for estimator stability
(informativeness), we provide a new estimate of the Fiedler value for these two
random graph models. In the asymptotic limit as the number of vertices tends to
infinity, we prove the validity of the estimate. Based on our findings, for a
small number of items to be compared, we recommend a two-stage sampling
strategy where a greedy sampling method is used initially and random sampling
\emph{without} replacement is used in the second stage. When a large number of
items is to be compared, we recommend random sampling with replacement as this
is computationally inexpensive and trivially parallelizable. Experiments on
synthetic and real-world datasets support our analysis
HodgeRank with Information Maximization for Crowdsourced Pairwise Ranking Aggregation
Recently, crowdsourcing has emerged as an effective paradigm for
human-powered large scale problem solving in various domains. However, task
requester usually has a limited amount of budget, thus it is desirable to have
a policy to wisely allocate the budget to achieve better quality. In this
paper, we study the principle of information maximization for active sampling
strategies in the framework of HodgeRank, an approach based on Hodge
Decomposition of pairwise ranking data with multiple workers. The principle
exhibits two scenarios of active sampling: Fisher information maximization that
leads to unsupervised sampling based on a sequential maximization of graph
algebraic connectivity without considering labels; and Bayesian information
maximization that selects samples with the largest information gain from prior
to posterior, which gives a supervised sampling involving the labels collected.
Experiments show that the proposed methods boost the sampling efficiency as
compared to traditional sampling schemes and are thus valuable to practical
crowdsourcing experiments.Comment: Accepted by AAAI201
Sparse Recovery via Differential Inclusions
In this paper, we recover sparse signals from their noisy linear measurements
by solving nonlinear differential inclusions, which is based on the notion of
inverse scale space (ISS) developed in applied mathematics. Our goal here is to
bring this idea to address a challenging problem in statistics, \emph{i.e.}
finding the oracle estimator which is unbiased and sign-consistent using
dynamics. We call our dynamics \emph{Bregman ISS} and \emph{Linearized Bregman
ISS}. A well-known shortcoming of LASSO and any convex regularization
approaches lies in the bias of estimators. However, we show that under proper
conditions, there exists a bias-free and sign-consistent point on the solution
paths of such dynamics, which corresponds to a signal that is the unbiased
estimate of the true signal and whose entries have the same signs as those of
the true signs, \emph{i.e.} the oracle estimator. Therefore, their solution
paths are regularization paths better than the LASSO regularization path, since
the points on the latter path are biased when sign-consistency is reached. We
also show how to efficiently compute their solution paths in both continuous
and discretized settings: the full solution paths can be exactly computed piece
by piece, and a discretization leads to \emph{Linearized Bregman iteration},
which is a simple iterative thresholding rule and easy to parallelize.
Theoretical guarantees such as sign-consistency and minimax optimal -error
bounds are established in both continuous and discrete settings for specific
points on the paths. Early-stopping rules for identifying these points are
given. The key treatment relies on the development of differential inequalities
for differential inclusions and their discretizations, which extends the
previous results and leads to exponentially fast recovering of sparse signals
before selecting wrong ones.Comment: In Applied and Computational Harmonic Analysis, 201
Stochastic Non-convex Ordinal Embedding with Stabilized Barzilai-Borwein Step Size
Learning representation from relative similarity comparisons, often called
ordinal embedding, gains rising attention in recent years. Most of the existing
methods are batch methods designed mainly based on the convex optimization,
say, the projected gradient descent method. However, they are generally
time-consuming due to that the singular value decomposition (SVD) is commonly
adopted during the update, especially when the data size is very large. To
overcome this challenge, we propose a stochastic algorithm called SVRG-SBB,
which has the following features: (a) SVD-free via dropping convexity, with
good scalability by the use of stochastic algorithm, i.e., stochastic variance
reduced gradient (SVRG), and (b) adaptive step size choice via introducing a
new stabilized Barzilai-Borwein (SBB) method as the original version for convex
problems might fail for the considered stochastic \textit{non-convex}
optimization problem. Moreover, we show that the proposed algorithm converges
to a stationary point at a rate in our setting,
where is the number of total iterations. Numerous simulations and
real-world data experiments are conducted to show the effectiveness of the
proposed algorithm via comparing with the state-of-the-art methods,
particularly, much lower computational cost with good prediction performance.Comment: 11 pages, 3 figures, 2 tables, accepted by AAAI201