86 research outputs found
A Unified Algebraic Perspective on Lipschitz Neural Networks
Important research efforts have focused on the design and training of neural
networks with a controlled Lipschitz constant. The goal is to increase and
sometimes guarantee the robustness against adversarial attacks. Recent
promising techniques draw inspirations from different backgrounds to design
1-Lipschitz neural networks, just to name a few: convex potential layers derive
from the discretization of continuous dynamical systems,
Almost-Orthogonal-Layer proposes a tailored method for matrix rescaling.
However, it is today important to consider the recent and promising
contributions in the field under a common theoretical lens to better design new
and improved layers. This paper introduces a novel algebraic perspective
unifying various types of 1-Lipschitz neural networks, including the ones
previously mentioned, along with methods based on orthogonality and spectral
methods. Interestingly, we show that many existing techniques can be derived
and generalized via finding analytical solutions of a common semidefinite
programming (SDP) condition. We also prove that AOL biases the scaled weight to
the ones which are close to the set of orthogonal matrices in a certain
mathematical manner. Moreover, our algebraic condition, combined with the
Gershgorin circle theorem, readily leads to new and diverse parameterizations
for 1-Lipschitz network layers. Our approach, called SDP-based Lipschitz Layers
(SLL), allows us to design non-trivial yet efficient generalization of convex
potential layers. Finally, the comprehensive set of experiments on image
classification shows that SLLs outperform previous approaches on certified
robust accuracy. Code is available at
https://github.com/araujoalexandre/Lipschitz-SLL-Networks.Comment: ICLR 2023. Spotlight pape
Leveraging the structure of dynamical systems for data-driven modeling
The reliable prediction of the temporal behavior of complex systems is
required in numerous scientific fields. This strong interest is however
hindered by modeling issues: often, the governing equations describing the
physics of the system under consideration are not accessible or, when known,
their solution might require a computational time incompatible with the
prediction time constraints. Nowadays, approximating complex systems at hand in
a generic functional format and informing it ex--nihilo from available
observations has become a common practice, as illustrated by the enormous
amount of scientific work appeared in the last years. Numerous successful
examples based on deep neural networks are already available, although
generalizability of the models and margins of guarantee are often overlooked.
Here, we consider Long-Short Term Memory neural networks and thoroughly
investigate the impact of the training set and its structure on the quality of
the long-term prediction. Leveraging insights from ergodic theory, we perform a
thorough computational analysis to assess the amount of data sufficient for a
priori guaranteeing a faithful model of the physical system. We show how an
informed design of the training set, based on invariants of the system and the
structure of the underlying attractor, significantly improves the resulting
models, opening up avenues for research within the context of active learning.
Further, the non-trivial effects of the memory initializations when relying on
memory-capable models will be illustrated. Our findings provide evidence-based
good-practice on the amount and the choice of data required for an effective
data-driven modeling of any complex dynamical system
Introduction to the special issue on deep learning approaches for machine translation
Deep learning is revolutionizing speech and natural language technologies since it is offering an effective way to train systems and obtaining significant improvements. The main advantage of deep learning is that, by developing the right architecture, the system automatically learns features from data without the need of explicitly designing them. This machine learning perspective is conceptually changing how speech and natural language technologies are addressed. In the case of Machine Translation (MT), deep learning was first introduced in standard statistical systems. By now, end-to-end neural MT systems have reached competitive results. This special issue introductory paper addresses how deep learning has been gradually introduced in MT. This introduction covers all topics contained in the papers included in this special issue, which basically are: integration of deep learning in statistical MT; development of the end-to-end neural MT system; and introduction of deep learning in interactive MT and MT evaluation. Finally, this introduction sketches some research directions that MT is taking guided by deep learning.Peer ReviewedPostprint (published version
Adaptation au domaine pour l'analyse morpho-syntaxique
National audienceCe travail cherche à comprendre pourquoi les performances d'un analyseur morpho-syntaxiques chutent fortement lorsque celui-ci est utilisé sur des données hors domaine. Nous montrons à l'aide d'une expérience jouet que ce comportement peut être dû à un phénomène de masquage des carac- téristiques lexicalisées par les caractéristiques non lexicalisées. Nous proposons plusieurs modèles essayant de réduire cet effet
- …