5,523 research outputs found
Zero Shot Learning with the Isoperimetric Loss
We introduce the isoperimetric loss as a regularization criterion for
learning the map from a visual representation to a semantic embedding, to be
used to transfer knowledge to unknown classes in a zero-shot learning setting.
We use a pre-trained deep neural network model as a visual representation of
image data, a Word2Vec embedding of class labels, and linear maps between the
visual and semantic embedding spaces. However, the spaces themselves are not
linear, and we postulate the sample embedding to be populated by noisy samples
near otherwise smooth manifolds. We exploit the graph structure defined by the
sample points to regularize the estimates of the manifolds by inferring the
graph connectivity using a generalization of the isoperimetric inequalities
from Riemannian geometry to graphs. Surprisingly, this regularization alone,
paired with the simplest baseline model, outperforms the state-of-the-art among
fully automated methods in zero-shot learning benchmarks such as AwA and CUB.
This improvement is achieved solely by learning the structure of the underlying
spaces by imposing regularity.Comment: Accepted to AAAI-2
Evaluating the Robustness of Interpretability Methods through Explanation Invariance and Equivariance
Interpretability methods are valuable only if their explanations faithfully
describe the explained model. In this work, we consider neural networks whose
predictions are invariant under a specific symmetry group. This includes
popular architectures, ranging from convolutional to graph neural networks. Any
explanation that faithfully explains this type of model needs to be in
agreement with this invariance property. We formalize this intuition through
the notion of explanation invariance and equivariance by leveraging the
formalism from geometric deep learning. Through this rigorous formalism, we
derive (1) two metrics to measure the robustness of any interpretability method
with respect to the model symmetry group; (2) theoretical robustness guarantees
for some popular interpretability methods and (3) a systematic approach to
increase the invariance of any interpretability method with respect to a
symmetry group. By empirically measuring our metrics for explanations of models
associated with various modalities and symmetry groups, we derive a set of 5
guidelines to allow users and developers of interpretability methods to produce
robust explanations.Comment: 26 pages, 7 figure
From Continuous Dynamics to Graph Neural Networks: Neural Diffusion and Beyond
Graph neural networks (GNNs) have demonstrated significant promise in
modelling relational data and have been widely applied in various fields of
interest. The key mechanism behind GNNs is the so-called message passing where
information is being iteratively aggregated to central nodes from their
neighbourhood. Such a scheme has been found to be intrinsically linked to a
physical process known as heat diffusion, where the propagation of GNNs
naturally corresponds to the evolution of heat density. Analogizing the process
of message passing to the heat dynamics allows to fundamentally understand the
power and pitfalls of GNNs and consequently informs better model design.
Recently, there emerges a plethora of works that proposes GNNs inspired from
the continuous dynamics formulation, in an attempt to mitigate the known
limitations of GNNs, such as oversmoothing and oversquashing. In this survey,
we provide the first systematic and comprehensive review of studies that
leverage the continuous perspective of GNNs. To this end, we introduce
foundational ingredients for adapting continuous dynamics to GNNs, along with a
general framework for the design of graph neural dynamics. We then review and
categorize existing works based on their driven mechanisms and underlying
dynamics. We also summarize how the limitations of classic GNNs can be
addressed under the continuous framework. We conclude by identifying multiple
open research directions
Solving Tree Problems with Category Theory
Artificial Intelligence (AI) has long pursued models, theories, and
techniques to imbue machines with human-like general intelligence. Yet even the
currently predominant data-driven approaches in AI seem to be lacking humans'
unique ability to solve wide ranges of problems. This situation begs the
question of the existence of principles that underlie general problem-solving
capabilities. We approach this question through the mathematical formulation of
analogies across different problems and solutions. We focus in particular on
problems that could be represented as tree-like structures. Most importantly,
we adopt a category-theoretic approach in formalising tree problems as
categories, and in proving the existence of equivalences across apparently
unrelated problem domains. We prove the existence of a functor between the
category of tree problems and the category of solutions. We also provide a
weaker version of the functor by quantifying equivalences of problem categories
using a metric on tree problems.Comment: 10 pages, 4 figures, International Conference on Artificial General
Intelligence (AGI) 201
Deep learning systems as complex networks
Thanks to the availability of large scale digital datasets and massive
amounts of computational power, deep learning algorithms can learn
representations of data by exploiting multiple levels of abstraction. These
machine learning methods have greatly improved the state-of-the-art in many
challenging cognitive tasks, such as visual object recognition, speech
processing, natural language understanding and automatic translation. In
particular, one class of deep learning models, known as deep belief networks,
can discover intricate statistical structure in large data sets in a completely
unsupervised fashion, by learning a generative model of the data using
Hebbian-like learning mechanisms. Although these self-organizing systems can be
conveniently formalized within the framework of statistical mechanics, their
internal functioning remains opaque, because their emergent dynamics cannot be
solved analytically. In this article we propose to study deep belief networks
using techniques commonly employed in the study of complex networks, in order
to gain some insights into the structural and functional properties of the
computational graph resulting from the learning process.Comment: 20 pages, 9 figure
- …