1,034 research outputs found
To Compress or Not to Compress -- Self-Supervised Learning and Information Theory: A Review
Deep neural networks have demonstrated remarkable performance in supervised
learning tasks but require large amounts of labeled data. Self-supervised
learning offers an alternative paradigm, enabling the model to learn from data
without explicit labels. Information theory has been instrumental in
understanding and optimizing deep neural networks. Specifically, the
information bottleneck principle has been applied to optimize the trade-off
between compression and relevant information preservation in supervised
settings. However, the optimal information objective in self-supervised
learning remains unclear. In this paper, we review various approaches to
self-supervised learning from an information-theoretic standpoint and present a
unified framework that formalizes the \textit{self-supervised
information-theoretic learning problem}. We integrate existing research into a
coherent framework, examine recent self-supervised methods, and identify
research opportunities and challenges. Moreover, we discuss empirical
measurement of information-theoretic quantities and their estimators. This
paper offers a comprehensive review of the intersection between information
theory, self-supervised learning, and deep neural networks
Deep Variational Multivariate Information Bottleneck -- A Framework for Variational Losses
Variational dimensionality reduction methods are known for their high
accuracy, generative abilities, and robustness. We introduce a framework to
unify many existing variational methods and design new ones. The framework is
based on an interpretation of the multivariate information bottleneck, in which
an encoder graph, specifying what information to compress, is traded-off
against a decoder graph, specifying a generative model. Using this framework,
we rederive existing dimensionality reduction methods including the deep
variational information bottleneck and variational auto-encoders. The framework
naturally introduces a trade-off parameter extending the deep variational CCA
(DVCCA) family of algorithms to beta-DVCCA. We derive a new method, the deep
variational symmetric informational bottleneck (DVSIB), which simultaneously
compresses two variables to preserve information between their compressed
representations. We implement these algorithms and evaluate their ability to
produce shared low dimensional latent spaces on Noisy MNIST dataset. We show
that algorithms that are better matched to the structure of the data (in our
case, beta-DVCCA and DVSIB) produce better latent spaces as measured by
classification accuracy, dimensionality of the latent variables, and sample
efficiency. We believe that this framework can be used to unify other
multi-view representation learning algorithms and to derive and implement novel
problem-specific loss functions
Learning Intrinsic Dimension via Information Bottleneck for Explainable Aspect-based Sentiment Analysis
Gradient-based explanation methods are increasingly used to interpret neural
models in natural language processing (NLP) due to their high fidelity. Such
methods determine word-level importance using dimension-level gradient values
through a norm function, often presuming equal significance for all gradient
dimensions. However, in the context of Aspect-based Sentiment Analysis (ABSA),
our preliminary research suggests that only specific dimensions are pertinent.
To address this, we propose the Information Bottleneck-based Gradient
(\texttt{IBG}) explanation framework for ABSA. This framework leverages an
information bottleneck to refine word embeddings into a concise intrinsic
dimension, maintaining essential features and omitting unrelated information.
Comprehensive tests show that our \texttt{IBG} approach considerably improves
both the models' performance and interpretability by identifying
sentiment-aware features.Comment: Accepted by COLING 202
- …