Search CORE

86 research outputs found

A Unified Algebraic Perspective on Lipschitz Neural Networks

Author: Allauzen Alexandre
Araujo Alexandre
Delattre Blaise
Havens Aaron
Hu Bin
Publication venue
Publication date: 26/10/2023
Field of study

Important research efforts have focused on the design and training of neural networks with a controlled Lipschitz constant. The goal is to increase and sometimes guarantee the robustness against adversarial attacks. Recent promising techniques draw inspirations from different backgrounds to design 1-Lipschitz neural networks, just to name a few: convex potential layers derive from the discretization of continuous dynamical systems, Almost-Orthogonal-Layer proposes a tailored method for matrix rescaling. However, it is today important to consider the recent and promising contributions in the field under a common theoretical lens to better design new and improved layers. This paper introduces a novel algebraic perspective unifying various types of 1-Lipschitz neural networks, including the ones previously mentioned, along with methods based on orthogonality and spectral methods. Interestingly, we show that many existing techniques can be derived and generalized via finding analytical solutions of a common semidefinite programming (SDP) condition. We also prove that AOL biases the scaled weight to the ones which are close to the set of orthogonal matrices in a certain mathematical manner. Moreover, our algebraic condition, combined with the Gershgorin circle theorem, readily leads to new and diverse parameterizations for 1-Lipschitz network layers. Our approach, called SDP-based Lipschitz Layers (SLL), allows us to design non-trivial yet efficient generalization of convex potential layers. Finally, the comprehensive set of experiments on image classification shows that SLLs outperform previous approaches on certified robust accuracy. Code is available at https://github.com/araujoalexandre/Lipschitz-SLL-Networks.Comment: ICLR 2023. Spotlight pape

arXiv.org e-Print Archive

Leveraging the structure of dynamical systems for data-driven modeling

Author: Allauzen Alexandre
Bucci Alessandro
Chibbaro Sergio
Mathelin Lionel
Semeraro Onofrio
Publication venue
Publication date: 21/12/2021
Field of study

The reliable prediction of the temporal behavior of complex systems is required in numerous scientific fields. This strong interest is however hindered by modeling issues: often, the governing equations describing the physics of the system under consideration are not accessible or, when known, their solution might require a computational time incompatible with the prediction time constraints. Nowadays, approximating complex systems at hand in a generic functional format and informing it ex--nihilo from available observations has become a common practice, as illustrated by the enormous amount of scientific work appeared in the last years. Numerous successful examples based on deep neural networks are already available, although generalizability of the models and margins of guarantee are often overlooked. Here, we consider Long-Short Term Memory neural networks and thoroughly investigate the impact of the training set and its structure on the quality of the long-term prediction. Leveraging insights from ergodic theory, we perform a thorough computational analysis to assess the amount of data sufficient for a priori guaranteeing a faithful model of the physical system. We show how an informed design of the training set, based on invariants of the system and the structure of the underlying attractor, significantly improves the resulting models, opening up avenues for research within the context of active learning. Further, the non-trivial effects of the memory initializations when relying on memory-capable models will be illustrated. Our findings provide evidence-based good-practice on the amount and the choice of data required for an effective data-driven modeling of any complex dynamical system

arXiv.org e-Print Archive

HAL-CentraleSupelec

EDP Sciences OAI-PMH repository (1.2.0)

INRIA a CCSD electronic archive server

HAL-MINES ParisTech

HAL-Rennes 1

Introduction to the special issue on deep learning approaches for machine translation

Author: Allauzen Alexandre
Barrault loïc
Cho Kyunghun
Ruiz Costa-Jussà Marta
Schwenk Holger
Publication venue: 'Elsevier BV'
Publication date: 01/01/2017
Field of study

Deep learning is revolutionizing speech and natural language technologies since it is offering an effective way to train systems and obtaining significant improvements. The main advantage of deep learning is that, by developing the right architecture, the system automatically learns features from data without the need of explicitly designing them. This machine learning perspective is conceptually changing how speech and natural language technologies are addressed. In the case of Machine Translation (MT), deep learning was first introduced in standard statistical systems. By now, end-to-end neural MT systems have reached competitive results. This special issue introductory paper addresses how deep learning has been gradually introduced in MT. This introduction covers all topics contained in the papers included in this special issue, which basically are: integration of deep learning in statistical MT; development of the end-to-end neural MT system; and introduction of deep learning in interactive MT and MT evaluation. Finally, this introduction sketches some research directions that MT is taking guided by deep learning.Peer ReviewedPostprint (published version

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Adaptation au domaine pour l'analyse morpho-syntaxique

Author: Allauzen Alexandre
Bartenlian Eléonore
Labeau Matthieu
Lacour Margot
Wisniewski Guillaume
Yvon François
Publication venue: HAL CCSD
Publication date: 26/06/2017
Field of study

National audienceCe travail cherche à comprendre pourquoi les performances d'un analyseur morpho-syntaxiques chutent fortement lorsque celui-ci est utilisé sur des données hors domaine. Nous montrons à l'aide d'une expérience jouet que ce comportement peut être dû à un phénomène de masquage des carac- téristiques lexicalisées par les caractéristiques non lexicalisées. Nous proposons plusieurs modèles essayant de réduire cet effet