1,289 research outputs found
How deep is deep enough? -- Quantifying class separability in the hidden layers of deep neural networks
Deep neural networks typically outperform more traditional machine learning
models in their ability to classify complex data, and yet is not clear how the
individual hidden layers of a deep network contribute to the overall
classification performance. We thus introduce a Generalized Discrimination
Value (GDV) that measures, in a non-invasive manner, how well different data
classes separate in each given network layer. The GDV can be used for the
automatic tuning of hyper-parameters, such as the width profile and the total
depth of a network. Moreover, the layer-dependent GDV(L) provides new insights
into the data transformations that self-organize during training: In the case
of multi-layer perceptrons trained with error backpropagation, we find that
classification of highly complex data sets requires a temporal {\em reduction}
of class separability, marked by a characteristic 'energy barrier' in the
initial part of the GDV(L) curve. Even more surprisingly, for a given data set,
the GDV(L) is running through a fixed 'master curve', independently from the
total number of network layers. Furthermore, applying the GDV to Deep Belief
Networks reveals that also unsupervised training with the Contrastive
Divergence method can systematically increase class separability over tens of
layers, even though the system does not 'know' the desired class labels. These
results indicate that the GDV may become a useful tool to open the black box of
deep learning
Contrastive Learning with Prompt-derived Virtual Semantic Prototypes for Unsupervised Sentence Embedding
Contrastive learning has become a new paradigm for unsupervised sentence
embeddings. Previous studies focus on instance-wise contrastive learning,
attempting to construct positive pairs with textual data augmentation. In this
paper, we propose a novel Contrastive learning method with Prompt-derived
Virtual semantic Prototypes (ConPVP). Specifically, with the help of prompts,
we construct virtual semantic prototypes to each instance, and derive negative
prototypes by using the negative form of the prompts. Using a prototypical
contrastive loss, we enforce the anchor sentence embedding to be close to its
corresponding semantic prototypes, and far apart from the negative prototypes
as well as the prototypes of other sentences. Extensive experimental results on
semantic textual similarity, transfer, and clustering tasks demonstrate the
effectiveness of our proposed model compared to strong baselines. Code is
available at https://github.com/lemon0830/promptCSE.Comment: Findings of EMNLP 202
Learning Representation for Clustering via Prototype Scattering and Positive Sampling
Existing deep clustering methods rely on either contrastive or
non-contrastive representation learning for downstream clustering task.
Contrastive-based methods thanks to negative pairs learn uniform
representations for clustering, in which negative pairs, however, may
inevitably lead to the class collision issue and consequently compromise the
clustering performance. Non-contrastive-based methods, on the other hand, avoid
class collision issue, but the resulting non-uniform representations may cause
the collapse of clustering. To enjoy the strengths of both worlds, this paper
presents a novel end-to-end deep clustering method with prototype scattering
and positive sampling, termed ProPos. Specifically, we first maximize the
distance between prototypical representations, named prototype scattering loss,
which improves the uniformity of representations. Second, we align one
augmented view of instance with the sampled neighbors of another view --
assumed to be truly positive pair in the embedding space -- to improve the
within-cluster compactness, termed positive sampling alignment. The strengths
of ProPos are avoidable class collision issue, uniform representations,
well-separated clusters, and within-cluster compactness. By optimizing ProPos
in an end-to-end expectation-maximization framework, extensive experimental
results demonstrate that ProPos achieves competing performance on
moderate-scale clustering benchmark datasets and establishes new
state-of-the-art performance on large-scale datasets. Source code is available
at \url{https://github.com/Hzzone/ProPos}.Comment: Accepted by TPAMI 202
A Theoretical Analysis of Contrastive Unsupervised Representation Learning
Recent empirical works have successfully used unlabeled data to learn feature
representations that are broadly useful in downstream classification tasks.
Several of these methods are reminiscent of the well-known word2vec embedding
algorithm: leveraging availability of pairs of semantically "similar" data
points and "negative samples," the learner forces the inner product of
representations of similar pairs with each other to be higher on average than
with negative samples. The current paper uses the term contrastive learning for
such algorithms and presents a theoretical framework for analyzing them by
introducing latent classes and hypothesizing that semantically similar points
are sampled from the same latent class. This framework allows us to show
provable guarantees on the performance of the learned representations on the
average classification task that is comprised of a subset of the same set of
latent classes. Our generalization bound also shows that learned
representations can reduce (labeled) sample complexity on downstream tasks. We
conduct controlled experiments in both the text and image domains to support
the theory.Comment: 19 pages, 5 figure
Automatically Discovering Novel Visual Categories with Self-supervised Prototype Learning
This paper tackles the problem of novel category discovery (NCD), which aims
to discriminate unknown categories in large-scale image collections. The NCD
task is challenging due to the closeness to the real-world scenarios, where we
have only encountered some partial classes and images. Unlike other works on
the NCD, we leverage the prototypes to emphasize the importance of category
discrimination and alleviate the issue of missing annotations of novel classes.
Concretely, we propose a novel adaptive prototype learning method consisting of
two main stages: prototypical representation learning and prototypical
self-training. In the first stage, we obtain a robust feature extractor, which
could serve for all images with base and novel categories. This ability of
instance and category discrimination of the feature extractor is boosted by
self-supervised learning and adaptive prototypes. In the second stage, we
utilize the prototypes again to rectify offline pseudo labels and train a final
parametric classifier for category clustering. We conduct extensive experiments
on four benchmark datasets and demonstrate the effectiveness and robustness of
the proposed method with state-of-the-art performance.Comment: In Submissio
- …