Search CORE

43 research outputs found

A Commentary on the Unsupervised Learning of Disentangled Representations

Author: Bachem Olivier
Bauer Stefan
Gelly Sylvain
Locatello Francesco
Lucic Mario
Rätsch Gunnar
Schölkopf Bernhard
Publication venue
Publication date: 03/04/2020
Field of study

The goal of the unsupervised learning of disentangled representations is to separate the independent explanatory factors of variation in the data without access to supervision. In this paper, we summarize the results of Locatello et al., 2019, and focus on their implications for practitioners. We discuss the theoretical result showing that the unsupervised learning of disentangled representations is fundamentally impossible without inductive biases and the practical challenges it entails. Finally, we comment on our experimental findings, highlighting the limitations of state-of-the-art approaches and directions for future research

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Understanding Self-Supervised Learning of Speech Representation via Invariance and Redundancy Reduction

Author: Brima Yusuf
Heidemann Gunther
Krumnack Ulf
Pika Simone
Publication venue
Publication date: 24/01/2024
Field of study

Self-supervised learning (SSL) has emerged as a promising paradigm for learning flexible speech representations from unlabeled data. By designing pretext tasks that exploit statistical regularities, SSL models can capture useful representations that are transferable to downstream tasks. This study provides an empirical analysis of Barlow Twins (BT), an SSL technique inspired by theories of redundancy reduction in human perception. On downstream tasks, BT representations accelerated learning and transferred across domains. However, limitations exist in disentangling key explanatory factors, with redundancy reduction and invariance alone insufficient for factorization of learned latents into modular, compact, and informative codes. Our ablations study isolated gains from invariance constraints, but the gains were context-dependent. Overall, this work substantiates the potential of Barlow Twins for sample-efficient speech encoding. However, challenges remain in achieving fully hierarchical representations. The analysis methodology and insights pave a path for extensions incorporating further inductive priors and perceptual principles to further enhance the BT self-supervision framework.Comment: 13 pages, 5 figures, in submission to MDPI Informatio

arXiv.org e-Print Archive