33 research outputs found
An Overview of Deep Semi-Supervised Learning
Deep neural networks demonstrated their ability to provide remarkable
performances on a wide range of supervised learning tasks (e.g., image
classification) when trained on extensive collections of labeled data (e.g.,
ImageNet). However, creating such large datasets requires a considerable amount
of resources, time, and effort. Such resources may not be available in many
practical cases, limiting the adoption and the application of many deep
learning methods. In a search for more data-efficient deep learning methods to
overcome the need for large annotated datasets, there is a rising research
interest in semi-supervised learning and its applications to deep neural
networks to reduce the amount of labeled data required, by either developing
novel methods or adopting existing semi-supervised learning frameworks for a
deep learning setting. In this paper, we provide a comprehensive overview of
deep semi-supervised learning, starting with an introduction to the field,
followed by a summarization of the dominant semi-supervised approaches in deep
learning.Comment: Preprin
Spatial Contrastive Learning for Few-Shot Classification
Existing few-shot classification methods rely to some degree on the
cross-entropy (CE) loss to learn transferable representations that facilitate
the test time adaptation to unseen classes with limited data. However, the CE
loss has several shortcomings, e.g., inducing representations with excessive
discrimination towards seen classes, which reduces their transferability to
unseen classes and results in sub-optimal generalization. In this work, we
explore contrastive learning as an additional auxiliary training objective,
acting as a data-dependent regularizer to promote more general and transferable
features. Instead of using the standard contrastive objective, which suppresses
local discriminative features, we propose a novel attention-based spatial
contrastive objective to learn locally discriminative and class-agnostic
features. With extensive experiments, we show that the proposed method
outperforms state-of-the-art approaches, confirming the importance of learning
good and transferable embeddings for few-shot learning.Comment: Preprin
Uncertain Trees: Dealing with Uncertain Inputs in Regression Trees
Tree-based ensemble methods, as Random Forests and Gradient Boosted Trees,
have been successfully used for regression in many applications and research
studies. Furthermore, these methods have been extended in order to deal with
uncertainty in the output variable, using for example a quantile loss in Random
Forests (Meinshausen, 2006). To the best of our knowledge, no extension has
been provided yet for dealing with uncertainties in the input variables, even
though such uncertainties are common in practical situations. We propose here
such an extension by showing how standard regression trees optimizing a
quadratic loss can be adapted and learned while taking into account the
uncertainties in the inputs. By doing so, one no longer assumes that an
observation lies into a single region of the regression tree, but rather that
it belongs to each region with a certain probability. Experiments conducted on
several data sets illustrate the good behavior of the proposed extension.Comment: 9 page
Stochastic Adversarial Gradient Embedding for Active Domain Adaptation
Unsupervised Domain Adaptation (UDA) aims to bridge the gap between a source
domain, where labelled data are available, and a target domain only represented
with unlabelled data. If domain invariant representations have dramatically
improved the adaptability of models, to guarantee their good transferability
remains a challenging problem. This paper addresses this problem by using
active learning to annotate a small budget of target data. Although this setup,
called Active Domain Adaptation (ADA), deviates from UDA's standard setup, a
wide range of practical applications are faced with this situation. To this
purpose, we introduce \textit{Stochastic Adversarial Gradient Embedding}
(SAGE), a framework that makes a triple contribution to ADA. First, we select
for annotation target samples that are likely to improve the representations'
transferability by measuring the variation, before and after annotation, of the
transferability loss gradient. Second, we increase sampling diversity by
promoting different gradient directions. Third, we introduce a novel training
procedure for actively incorporating target samples when learning invariant
representations. SAGE is based on solid theoretical ground and validated on
various UDA benchmarks against several baselines. Our empirical investigation
demonstrates that SAGE takes the best of uncertainty \textit{vs} diversity
samplings and improves representations transferability substantially
Simple Sorting Criteria Help Find the Causal Order in Additive Noise Models
Additive Noise Models (ANM) encode a popular functional assumption that
enables learning causal structure from observational data. Due to a lack of
real-world data meeting the assumptions, synthetic ANM data are often used to
evaluate causal discovery algorithms. Reisach et al. (2021) show that, for
common simulation parameters, a variable ordering by increasing variance is
closely aligned with a causal order and introduce var-sortability to quantify
the alignment. Here, we show that not only variance, but also the fraction of a
variable's variance explained by all others, as captured by the coefficient of
determination , tends to increase along the causal order. Simple baseline
algorithms can use -sortability to match the performance of established
methods. Since -sortability is invariant under data rescaling, these
algorithms perform equally well on standardized or rescaled data, addressing a
key limitation of algorithms exploiting var-sortability. We characterize and
empirically assess -sortability for different simulation parameters. We
show that all simulation parameters can affect -sortability and must be
chosen deliberately to control the difficulty of the causal discovery task and
the real-world plausibility of the simulated data. We provide an implementation
of the sortability measures and sortability-based algorithms in our library
CausalDisco (https://github.com/CausalDisco/CausalDisco).Comment: See https://github.com/CausalDisco/CausalDisco for implementation
Open-Set Likelihood Maximization for Few-Shot Learning
We tackle the Few-Shot Open-Set Recognition (FSOSR) problem, i.e. classifying
instances among a set of classes for which we only have a few labeled samples,
while simultaneously detecting instances that do not belong to any known class.
We explore the popular transductive setting, which leverages the unlabelled
query instances at inference. Motivated by the observation that existing
transductive methods perform poorly in open-set scenarios, we propose a
generalization of the maximum likelihood principle, in which latent scores
down-weighing the influence of potential outliers are introduced alongside the
usual parametric model. Our formulation embeds supervision constraints from the
support set and additional penalties discouraging overconfident predictions on
the query set. We proceed with a block-coordinate descent, with the latent
scores and parametric model co-optimized alternately, thereby benefiting from
each other. We call our resulting formulation \textit{Open-Set Likelihood
Optimization} (OSLO). OSLO is interpretable and fully modular; it can be
applied on top of any pre-trained model seamlessly. Through extensive
experiments, we show that our method surpasses existing inductive and
transductive methods on both aspects of open-set recognition, namely inlier
classification and outlier detection.Comment: arXiv admin note: substantial text overlap with arXiv:2206.0923
Transductive Learning for Textual Few-Shot Classification in API-based Embedding Models
Proprietary and closed APIs are becoming increasingly common to process
natural language, and are impacting the practical applications of natural
language processing, including few-shot classification. Few-shot classification
involves training a model to perform a new classification task with a handful
of labeled data. This paper presents three contributions. First, we introduce a
scenario where the embedding of a pre-trained model is served through a gated
API with compute-cost and data-privacy constraints. Second, we propose a
transductive inference, a learning paradigm that has been overlooked by the NLP
community. Transductive inference, unlike traditional inductive learning,
leverages the statistics of unlabeled data. We also introduce a new
parameter-free transductive regularizer based on the Fisher-Rao loss, which can
be used on top of the gated API embeddings. This method fully utilizes
unlabeled data, does not share any label with the third-party API provider and
could serve as a baseline for future research. Third, we propose an improved
experimental setting and compile a benchmark of eight datasets involving
multiclass classification in four different languages, with up to 151 classes.
We evaluate our methods using eight backbone models, along with an episodic
evaluation over 1,000 episodes, which demonstrate the superiority of
transductive inference over the standard inductive setting.Comment: EMNLP 202
EM estimation of a structural equation model
Les modèles d'équations structurelles à variables latentes permettent de modéliser des relations entre des variables observables et non observables. Les deux paradigmes actuels d'estimation de ces modèles sont les méthodes de moindres carrés partiels sur composantes et l'analyse de la structure de covariance. Dans ce travail, après avoir décrit les deux principales méthodes d'estimation que sont PLS et LISREL, nous proposons une approche d'estimation fondée sur la maximisation par algorithme EM de la vraisemblance globale d'un modèle à facteurs latents et à une équation structurelle. Nous en étudions les performances sur des données simulées et nous montrons, via une application sur des données réelles environnementales, comment construire pratiquement un modèle et en évaluer la qualité. Enfin, nous appliquons l'approche développée dans le contexte d'un essai clinique en cancérologie pour l'étude de données longitudinales de qualité de vie. Nous montrons que par la réduction efficace de la dimension des données, l'approche EM simplifie l'analyse longitudinale de la qualité de vie en évitant les tests multiples. Ainsi, elle contribue à faciliter l'évaluation du bénéfice clinique d'un traitement.Structural equation models enable the modeling of interactions between observed variables and latent ones. The two leading estimation methods are partial least squares on components and covariance-structure analysis. In this work, we first describe the PLS and LISREL methods and, then, we propose an estimation method using the EM algorithm in order to maximize the likelihood of a structural equation model with latent factors. Through a simulation study, we investigate how fast and accurate the method is, and thanks to an application to real environmental data, we show how one can handly construct a model or evaluate its quality. Finally, in the context of oncology, we apply the EM approach on health-related quality-of-life data. We show that it simplifies the longitudinal analysis of quality-of-life and helps evaluating the clinical benefit of a treatment