44,236 research outputs found
Which Features are Learnt by Contrastive Learning? On the Role of Simplicity Bias in Class Collapse and Feature Suppression
Contrastive learning (CL) has emerged as a powerful technique for
representation learning, with or without label supervision. However, supervised
CL is prone to collapsing representations of subclasses within a class by not
capturing all their features, and unsupervised CL may suppress harder
class-relevant features by focusing on learning easy class-irrelevant features;
both significantly compromise representation quality. Yet, there is no
theoretical understanding of \textit{class collapse} or \textit{feature
suppression} at \textit{test} time. We provide the first unified theoretically
rigorous framework to determine \textit{which} features are learnt by CL. Our
analysis indicate that, perhaps surprisingly, bias of (stochastic) gradient
descent towards finding simpler solutions is a key factor in collapsing
subclass representations and suppressing harder class-relevant features.
Moreover, we present increasing embedding dimensionality and improving the
quality of data augmentations as two theoretically motivated solutions to
{feature suppression}. We also provide the first theoretical explanation for
why employing supervised and unsupervised CL together yields higher-quality
representations, even when using commonly-used stochastic gradient methods.Comment: to appear at ICML 202
Detection of Review Abuse via Semi-Supervised Binary Multi-Target Tensor Decomposition
Product reviews and ratings on e-commerce websites provide customers with
detailed insights about various aspects of the product such as quality,
usefulness, etc. Since they influence customers' buying decisions, product
reviews have become a fertile ground for abuse by sellers (colluding with
reviewers) to promote their own products or to tarnish the reputation of
competitor's products. In this paper, our focus is on detecting such abusive
entities (both sellers and reviewers) by applying tensor decomposition on the
product reviews data. While tensor decomposition is mostly unsupervised, we
formulate our problem as a semi-supervised binary multi-target tensor
decomposition, to take advantage of currently known abusive entities. We
empirically show that our multi-target semi-supervised model achieves higher
precision and recall in detecting abusive entities as compared to unsupervised
techniques. Finally, we show that our proposed stochastic partial natural
gradient inference for our model empirically achieves faster convergence than
stochastic gradient and Online-EM with sufficient statistics.Comment: Accepted to the 25th ACM SIGKDD Conference on Knowledge Discovery and
Data Mining, 2019. Contains supplementary material. arXiv admin note: text
overlap with arXiv:1804.0383
Practical recommendations for gradient-based training of deep architectures
Learning algorithms related to artificial neural networks and in particular
for Deep Learning may seem to involve many bells and whistles, called
hyper-parameters. This chapter is meant as a practical guide with
recommendations for some of the most commonly used hyper-parameters, in
particular in the context of learning algorithms based on back-propagated
gradient and gradient-based optimization. It also discusses how to deal with
the fact that more interesting results can be obtained when allowing one to
adjust many hyper-parameters. Overall, it describes elements of the practice
used to successfully and efficiently train and debug large-scale and often deep
multi-layer neural networks. It closes with open questions about the training
difficulties observed with deeper architectures
- …