26,463 research outputs found
Learning Output Kernels for Multi-Task Problems
Simultaneously solving multiple related learning tasks is beneficial under a
variety of circumstances, but the prior knowledge necessary to correctly model
task relationships is rarely available in practice. In this paper, we develop a
novel kernel-based multi-task learning technique that automatically reveals
structural inter-task relationships. Building over the framework of output
kernel learning (OKL), we introduce a method that jointly learns multiple
functions and a low-rank multi-task kernel by solving a non-convex
regularization problem. Optimization is carried out via a block coordinate
descent strategy, where each subproblem is solved using suitable conjugate
gradient (CG) type iterative methods for linear operator equations. The
effectiveness of the proposed approach is demonstrated on pharmacological and
collaborative filtering data
Gradient descent for sparse rank-one matrix completion for crowd-sourced aggregation of sparsely interacting workers
We consider worker skill estimation for the singlecoin
Dawid-Skene crowdsourcing model. In
practice skill-estimation is challenging because
worker assignments are sparse and irregular due
to the arbitrary, and uncontrolled availability of
workers. We formulate skill estimation as a
rank-one correlation-matrix completion problem,
where the observed components correspond to
observed label correlation between workers. We
show that the correlation matrix can be successfully
recovered and skills identifiable if and only
if the sampling matrix (observed components) is
irreducible and aperiodic. We then propose an
efficient gradient descent scheme and show that
skill estimates converges to the desired global optima
for such sampling matrices. Our proof is
original and the results are surprising in light of
the fact that even the weighted rank-one matrix
factorization problem is NP hard in general. Next
we derive sample complexity bounds for the noisy
case in terms of spectral properties of the signless
Laplacian of the sampling matrix. Our proposed
scheme achieves state-of-art performance on a
number of real-world datasets.Published versio
Large-scale Multi-label Learning with Missing Labels
The multi-label classification problem has generated significant interest in
recent years. However, existing approaches do not adequately address two key
challenges: (a) the ability to tackle problems with a large number (say
millions) of labels, and (b) the ability to handle data with missing labels. In
this paper, we directly address both these problems by studying the multi-label
problem in a generic empirical risk minimization (ERM) framework. Our
framework, despite being simple, is surprisingly able to encompass several
recent label-compression based methods which can be derived as special cases of
our method. To optimize the ERM problem, we develop techniques that exploit the
structure of specific loss functions - such as the squared loss function - to
offer efficient algorithms. We further show that our learning framework admits
formal excess risk bounds even in the presence of missing labels. Our risk
bounds are tight and demonstrate better generalization performance for low-rank
promoting trace-norm regularization when compared to (rank insensitive)
Frobenius norm regularization. Finally, we present extensive empirical results
on a variety of benchmark datasets and show that our methods perform
significantly better than existing label compression based methods and can
scale up to very large datasets such as the Wikipedia dataset
Learning with the Weighted Trace-norm under Arbitrary Sampling Distributions
We provide rigorous guarantees on learning with the weighted trace-norm under
arbitrary sampling distributions. We show that the standard weighted trace-norm
might fail when the sampling distribution is not a product distribution (i.e.
when row and column indexes are not selected independently), present a
corrected variant for which we establish strong learning guarantees, and
demonstrate that it works better in practice. We provide guarantees when
weighting by either the true or empirical sampling distribution, and suggest
that even if the true distribution is known (or is uniform), weighting by the
empirical distribution may be beneficial
- …