1,020 research outputs found
Contrastive representation learning: a framework and review
Contrastive Learning has recently received interest due to its success in self-supervised representation learning in the computer vision domain. However, the origins of Contrastive Learning date as far back as the 1990s and its development has spanned across many fields and domains including Metric Learning and natural language processing. In this paper, we provide a comprehensive literature review and we propose a general Contrastive Representation Learning framework that simplifies and unifies many different contrastive learning methods. We also provide a taxonomy for each of the components of contrastive learning in order to summarise it and distinguish it from other forms of machine learning. We then discuss the inductive biases which are present in any contrastive learning system and we analyse our framework under different views from various sub-fields of Machine Learning. Examples of how contrastive learning has been applied in computer vision, natural language processing, audio processing, and others, as well as in Reinforcement Learning are also presented. Finally, we discuss the challenges and some of the most promising future research directions ahead
Beyond Unimodal: Generalising Neural Processes for Multimodal Uncertainty Estimation
Uncertainty estimation is an important research area to make deep neural
networks (DNNs) more trustworthy. While extensive research on uncertainty
estimation has been conducted with unimodal data, uncertainty estimation for
multimodal data remains a challenge. Neural processes (NPs) have been
demonstrated to be an effective uncertainty estimation method for unimodal data
by providing the reliability of Gaussian processes with efficient and powerful
DNNs. While NPs hold significant potential for multimodal uncertainty
estimation, the adaptation of NPs for multimodal data has not been carefully
studied. To bridge this gap, we propose Multimodal Neural Processes (MNPs) by
generalising NPs for multimodal uncertainty estimation. Based on the framework
of NPs, MNPs consist of several novel and principled mechanisms tailored to the
characteristics of multimodal data. In extensive empirical evaluation, our
method achieves state-of-the-art multimodal uncertainty estimation performance,
showing its appealing robustness against noisy samples and reliability in
out-of-distribution detection with faster computation time compared to the
current state-of-the-art multimodal uncertainty estimation method.Comment: Accepted to NeurIPS 202
Audio self-supervised learning: a survey
Inspired by the humans' cognitive ability to generalise knowledge and skills,
Self-Supervised Learning (SSL) targets at discovering general representations
from large-scale data without requiring human annotations, which is an
expensive and time consuming task. Its success in the fields of computer vision
and natural language processing have prompted its recent adoption into the
field of audio and speech processing. Comprehensive reviews summarising the
knowledge in audio SSL are currently missing. To fill this gap, in the present
work, we provide an overview of the SSL methods used for audio and speech
processing applications. Herein, we also summarise the empirical works that
exploit the audio modality in multi-modal SSL frameworks, and the existing
suitable benchmarks to evaluate the power of SSL in the computer audition
domain. Finally, we discuss some open problems and point out the future
directions on the development of audio SSL
Herding as a Learning System with Edge-of-Chaos Dynamics
Herding defines a deterministic dynamical system at the edge of chaos. It
generates a sequence of model states and parameters by alternating parameter
perturbations with state maximizations, where the sequence of states can be
interpreted as "samples" from an associated MRF model. Herding differs from
maximum likelihood estimation in that the sequence of parameters does not
converge to a fixed point and differs from an MCMC posterior sampling approach
in that the sequence of states is generated deterministically. Herding may be
interpreted as a"perturb and map" method where the parameter perturbations are
generated using a deterministic nonlinear dynamical system rather than randomly
from a Gumbel distribution. This chapter studies the distinct statistical
characteristics of the herding algorithm and shows that the fast convergence
rate of the controlled moments may be attributed to edge of chaos dynamics. The
herding algorithm can also be generalized to models with latent variables and
to a discriminative learning setting. The perceptron cycling theorem ensures
that the fast moment matching property is preserved in the more general
framework
MINIMALIST: Mutual INformatIon Maximization for Amortized Likelihood Inference from Sampled Trajectories
Simulation-based inference enables learning the parameters of a model even
when its likelihood cannot be computed in practice. One class of methods uses
data simulated with different parameters to infer an amortized estimator for
the likelihood-to-evidence ratio, or equivalently the posterior function. We
show that this approach can be formulated in terms of mutual information
maximization between model parameters and simulated data. We use this
equivalence to reinterpret existing approaches for amortized inference and
propose two new methods that rely on lower bounds of the mutual information. We
apply our framework to the inference of parameters of stochastic processes and
chaotic dynamical systems from sampled trajectories, using artificial neural
networks for posterior prediction. Our approach provides a unified framework
that leverages the power of mutual information estimators for inference
Data-driven robotic manipulation of cloth-like deformable objects : the present, challenges and future prospects
Manipulating cloth-like deformable objects (CDOs) is a long-standing problem in the robotics community. CDOs are flexible (non-rigid) objects that do not show a detectable level of compression strength while two points on the article are pushed towards each other and include objects such as ropes (1D), fabrics (2D) and bags (3D). In general, CDOs’ many degrees of freedom (DoF) introduce severe self-occlusion and complex state–action dynamics as significant obstacles to perception and manipulation systems. These challenges exacerbate existing issues of modern robotic control methods such as imitation learning (IL) and reinforcement learning (RL). This review focuses on the application details of data-driven control methods on four major task families in this domain: cloth shaping, knot tying/untying, dressing and bag manipulation. Furthermore, we identify specific inductive biases in these four domains that present challenges for more general IL and RL algorithms.Publisher PDFPeer reviewe
- …