406 research outputs found
Lipschitz Adaptivity with Multiple Learning Rates in Online Learning
We aim to design adaptive online learning algorithms that take advantage of
any special structure that might be present in the learning task at hand, with
as little manual tuning by the user as possible. A fundamental obstacle that
comes up in the design of such adaptive algorithms is to calibrate a so-called
step-size or learning rate hyperparameter depending on variance, gradient
norms, etc. A recent technique promises to overcome this difficulty by
maintaining multiple learning rates in parallel. This technique has been
applied in the MetaGrad algorithm for online convex optimization and the Squint
algorithm for prediction with expert advice. However, in both cases the user
still has to provide in advance a Lipschitz hyperparameter that bounds the norm
of the gradients. Although this hyperparameter is typically not available in
advance, tuning it correctly is crucial: if it is set too small, the methods
may fail completely; but if it is taken too large, performance deteriorates
significantly. In the present work we remove this Lipschitz hyperparameter by
designing new versions of MetaGrad and Squint that adapt to its optimal value
automatically. We achieve this by dynamically updating the set of active
learning rates. For MetaGrad, we further improve the computational efficiency
of handling constraints on the domain of prediction, and we remove the need to
specify the number of rounds in advance.Comment: 22 pages. To appear in COLT 201
Crowdsourced PAC Learning under Classification Noise
In this paper, we analyze PAC learnability from labels produced by
crowdsourcing. In our setting, unlabeled examples are drawn from a distribution
and labels are crowdsourced from workers who operate under classification
noise, each with their own noise parameter. We develop an end-to-end
crowdsourced PAC learning algorithm that takes unlabeled data points as input
and outputs a trained classifier. Our three-step algorithm incorporates
majority voting, pure-exploration bandits, and noisy-PAC learning. We prove
several guarantees on the number of tasks labeled by workers for PAC learning
in this setting and show that our algorithm improves upon the baseline by
reducing the total number of tasks given to workers. We demonstrate the
robustness of our algorithm by exploring its application to additional
realistic crowdsourcing settings.Comment: 14 page
Semi-Supervised Speech Emotion Recognition with Ladder Networks
Speech emotion recognition (SER) systems find applications in various fields
such as healthcare, education, and security and defense. A major drawback of
these systems is their lack of generalization across different conditions. This
problem can be solved by training models on large amounts of labeled data from
the target domain, which is expensive and time-consuming. Another approach is
to increase the generalization of the models. An effective way to achieve this
goal is by regularizing the models through multitask learning (MTL), where
auxiliary tasks are learned along with the primary task. These methods often
require the use of labeled data which is computationally expensive to collect
for emotion recognition (gender, speaker identity, age or other emotional
descriptors). This study proposes the use of ladder networks for emotion
recognition, which utilizes an unsupervised auxiliary task. The primary task is
a regression problem to predict emotional attributes. The auxiliary task is the
reconstruction of intermediate feature representations using a denoising
autoencoder. This auxiliary task does not require labels so it is possible to
train the framework in a semi-supervised fashion with abundant unlabeled data
from the target domain. This study shows that the proposed approach creates a
powerful framework for SER, achieving superior performance than fully
supervised single-task learning (STL) and MTL baselines. The approach is
implemented with several acoustic features, showing that ladder networks
generalize significantly better in cross-corpus settings. Compared to the STL
baselines, the proposed approach achieves relative gains in concordance
correlation coefficient (CCC) between 3.0% and 3.5% for within corpus
evaluations, and between 16.1% and 74.1% for cross corpus evaluations,
highlighting the power of the architecture
- …