737 research outputs found
Faithful and Consistent Graph Neural Network Explanations with Rationale Alignment
Uncovering rationales behind predictions of graph neural networks (GNNs) has
received increasing attention over recent years. Instance-level GNN explanation
aims to discover critical input elements, like nodes or edges, that the target
GNN relies upon for making predictions. %These identified sub-structures can
provide interpretations of GNN's behavior. Though various algorithms are
proposed, most of them formalize this task by searching the minimal subgraph
which can preserve original predictions. However, an inductive bias is
deep-rooted in this framework: several subgraphs can result in the same or
similar outputs as the original graphs. Consequently, they have the danger of
providing spurious explanations and failing to provide consistent explanations.
Applying them to explain weakly-performed GNNs would further amplify these
issues. To address this problem, we theoretically examine the predictions of
GNNs from the causality perspective. Two typical reasons for spurious
explanations are identified: confounding effect of latent variables like
distribution shift, and causal factors distinct from the original input.
Observing that both confounding effects and diverse causal rationales are
encoded in internal representations, \tianxiang{we propose a new explanation
framework with an auxiliary alignment loss, which is theoretically proven to be
optimizing a more faithful explanation objective intrinsically. Concretely for
this alignment loss, a set of different perspectives are explored: anchor-based
alignment, distributional alignment based on Gaussian mixture models,
mutual-information-based alignment, etc. A comprehensive study is conducted
both on the effectiveness of this new framework in terms of explanation
faithfulness/consistency and on the advantages of these variants.Comment: TIST2023. arXiv admin note: substantial text overlap with
arXiv:2205.1373
Concept-wise Fine-tuning Matters in Preventing Negative Transfer
A multitude of prevalent pre-trained models mark a major milestone in the
development of artificial intelligence, while fine-tuning has been a common
practice that enables pretrained models to figure prominently in a wide array
of target datasets. Our empirical results reveal that off-the-shelf finetuning
techniques are far from adequate to mitigate negative transfer caused by two
types of underperforming features in a pre-trained model, including rare
features and spuriously correlated features. Rooted in structural causal models
of predictions after fine-tuning, we propose a Concept-wise fine-tuning
(Concept-Tuning) approach which refines feature representations in the level of
patches with each patch encoding a concept. Concept-Tuning minimizes the
negative impacts of rare features and spuriously correlated features by (1)
maximizing the mutual information between examples in the same category with
regard to a slice of rare features (a patch) and (2) applying front-door
adjustment via attention neural networks in channels and feature slices
(patches). The proposed Concept-Tuning consistently and significantly (by up to
4.76%) improves prior state-of-the-art fine-tuning methods on eleven datasets,
diverse pre-training strategies (supervised and self-supervised ones), various
network architectures, and sample sizes in a target dataset
Open-world Semantic Segmentation via Contrasting and Clustering Vision-Language Embedding
To bridge the gap between supervised semantic segmentation and real-world
applications that acquires one model to recognize arbitrary new concepts,
recent zero-shot segmentation attracts a lot of attention by exploring the
relationships between unseen and seen object categories, yet requiring large
amounts of densely-annotated data with diverse base classes. In this paper, we
propose a new open-world semantic segmentation pipeline that makes the first
attempt to learn to segment semantic objects of various open-world categories
without any efforts on dense annotations, by purely exploiting the
image-caption data that naturally exist on the Internet. Our method,
Vision-language-driven Semantic Segmentation (ViL-Seg), employs an image and a
text encoder to generate visual and text embeddings for the image-caption data,
with two core components that endow its segmentation ability: First, the image
encoder is jointly trained with a vision-based contrasting and a cross-modal
contrasting, which encourage the visual embeddings to preserve both
fine-grained semantics and high-level category information that are crucial for
the segmentation task. Furthermore, an online clustering head is devised over
the image encoder, which allows to dynamically segment the visual embeddings
into distinct semantic groups such that they can be classified by comparing
with various text embeddings to complete our segmentation pipeline. Experiments
show that without using any data with dense annotations, our method can
directly segment objects of arbitrary categories, outperforming zero-shot
segmentation methods that require data labeling on three benchmark datasets.Comment: Accepted to ECCV 202
A Comprehensive Survey on Deep-Learning-based Vehicle Re-Identification: Models, Data Sets and Challenges
Vehicle re-identification (ReID) endeavors to associate vehicle images
collected from a distributed network of cameras spanning diverse traffic
environments. This task assumes paramount importance within the spectrum of
vehicle-centric technologies, playing a pivotal role in deploying Intelligent
Transportation Systems (ITS) and advancing smart city initiatives. Rapid
advancements in deep learning have significantly propelled the evolution of
vehicle ReID technologies in recent years. Consequently, undertaking a
comprehensive survey of methodologies centered on deep learning for vehicle
re-identification has become imperative and inescapable. This paper extensively
explores deep learning techniques applied to vehicle ReID. It outlines the
categorization of these methods, encompassing supervised and unsupervised
approaches, delves into existing research within these categories, introduces
datasets and evaluation criteria, and delineates forthcoming challenges and
potential research directions. This comprehensive assessment examines the
landscape of deep learning in vehicle ReID and establishes a foundation and
starting point for future works. It aims to serve as a complete reference by
highlighting challenges and emerging trends, fostering advancements and
applications in vehicle ReID utilizing deep learning models
Representation learning with structured invariance
Invariance is crucial for neural networks, enabling them to generalize effectively across variations of the input data by focusing on key attributes while filtering out irrelevant details. In this thesis, we study representation learning in neural networks through the lens of structured invariance. We start by studying the properties and limitations of the invariance that neural networks can learn from the data. Next, we develop a method to extract the structure of invariance learned by a neural network, providing a more nuanced analysis of the quality of learned invariance. In the next chapter, we focus on contrastive learning, demonstrating how more structured supervision results in a better quality of learned representations. The last two chapters that follow, focus on practical aspects of representation learning with structured invariance in computer vision
Representation Learning: A Review and New Perspectives
The success of machine learning algorithms generally depends on data
representation, and we hypothesize that this is because different
representations can entangle and hide more or less the different explanatory
factors of variation behind the data. Although specific domain knowledge can be
used to help design representations, learning with generic priors can also be
used, and the quest for AI is motivating the design of more powerful
representation-learning algorithms implementing such priors. This paper reviews
recent work in the area of unsupervised feature learning and deep learning,
covering advances in probabilistic models, auto-encoders, manifold learning,
and deep networks. This motivates longer-term unanswered questions about the
appropriate objectives for learning good representations, for computing
representations (i.e., inference), and the geometrical connections between
representation learning, density estimation and manifold learning
Generative models : a critical review
Dans cette thèse, nous introduisons et motivons la modélisation générative comme une tâche centrale pour l’apprentissage automatique et fournissons une vue critique des algorithmes qui ont été proposés pour résoudre cette tâche. Nous montrons comment la modélisation générative peut être définie mathématiquement en essayant de faire une distribution d’estimation identique à une distribution de vérité de terrain inconnue. Ceci peut ensuite être quantifié en termes de valeur d’une divergence statistique entre les deux distributions. Nous décrivons l’approche du maximum de vraisemblance et comment elle peut être interprétée comme minimisant la divergence KL. Nous explorons un certain nombre d’approches dans la famille du maximum de vraisemblance, tout en discutant de leurs limites. Enfin, nous explorons l’approche antagoniste alternative qui consiste à étudier les différences entre une distribution d’estimation et une distribution de données réelles. Nous discutons de la façon dont cette approche peut donner lieu à de nouvelles divergences et méthodes qui sont nécessaires pour réussir l’apprentissage par l’adversité. Nous discutons également des nouveaux paramètres d’évaluation requis par l’approche contradictoire. Le chapitre ref chap: fortnet montre qu’en apprenant des modèles génératifs des couches cachées d’un réseau profond, on peut identifier quand le réseau fonctionne sur des données différentes des données observées pendant la formation. Cela nous permet d’étudier les différences entre les modes de fonctionnement libre et de forçage des enseignants dans les réseaux récurrents. Cela conduit également à une meilleure robustesse face aux attaques adverses. Le chapitre ref chap: gibbsnet a exploré une procédure itérative pour la génération et l’inférence dans les réseaux profonds, qui est inspirée par la procédure MCMC de gibbs bloquées pour l’échantillonnage à partir de modèles basés sur l’énergie. Cela permet d’améliorer l’inpainting, la génération et l’inférence en supprimant l’exigence que les variables a priori sur les variables latentes aient une distribution connue. Le chapitre ref chap: discreg a étudié si les modèles génératifs pouvaient être améliorés en exploitant les connaissances acquises par des modèles de classification discriminants. Nous avons étudié cela en augmentant les autoencoders avec des pertes supplémentaires définies dans les états cachés d’un classificateur fixe. Dans la pratique, nous avons montré que cela conduisait à des modèles générateurs mettant davantage l’accent sur les aspects saillants des données, et discutait également des limites de cette approche.In this thesis we introduce and motivate generative modeling as a central task
for machine learning and provide a critical view of the algorithms which have been
proposed for solving this task. We overview how generative modeling can be de ned
mathematically as trying to make an estimating distribution the same as an unknown
ground truth distribution. This can then be quanti ed in terms of the value of
a statistical divergence between the two distributions. We outline the maximum
likelihood approach and how it can be interpreted as minimizing KL-divergence. We
explore a number of approaches in the maximum likelihood family, while discussing
their limitations. Finally, we explore the alternative adversarial approach which
involves studying the di erences between an estimating distribution and a real data
distribution. We discuss how this approach can give rise to new divergences and
methods that are necessary to make adversarial learning successful. We also discuss
new evaluation metrics which are required by the adversarial approach.
Chapter 2 shows that by learning generative models of the hidden layers of a
deep network can identify when the network is being run on data di ering from
the data seen during training. This allows us to study di erences between freerunning
and teacher forcing modes in recurrent networks. It also leads to improved
robustness to adversarial attacks.
Chapter 3 explored an iterative procedure for generation and inference in deep
networks, which is inspired by the blocked gibbs MCMC procedure for sampling
from energy-based models. This achieves improved inpainting, generation, and
inference by removing the requirement that the prior over the latent variables have
a known distribution.
Chapter 4 studied whether generative models could be improved by exploiting
the knowledge learned by discriminative classi cation models. We studied this by
augmenting autoencoders with additional losses de ned in the hidden states of a
xed classi er. In practice we showed that this led to generative models with better
focus on salient aspects of the data, and also discussed limitations in this approach
- …