2,552 research outputs found
Backpropagation Beyond the Gradient
Automatic differentiation is a key enabler of deep learning: previously, practitioners were limited to models
for which they could manually compute derivatives. Now, they can create sophisticated models with almost
no restrictions and train them using first-order, i. e. gradient, information. Popular libraries like PyTorch
and TensorFlow compute this gradient efficiently, automatically, and conveniently with a single line of
code. Under the hood, reverse-mode automatic differentiation, or gradient backpropagation, powers the
gradient computation in these libraries. Their entire design centers around gradient backpropagation.
These frameworks are specialized around one specific task—computing the average gradient in a mini-batch.
This specialization often complicates the extraction of other information like higher-order statistical moments
of the gradient, or higher-order derivatives like the Hessian. It limits practitioners and researchers to methods
that rely on the gradient. Arguably, this hampers the field from exploring the potential of higher-order
information and there is evidence that focusing solely on the gradient has not lead to significant recent
advances in deep learning optimization.
To advance algorithmic research and inspire novel ideas, information beyond the batch-averaged gradient
must be made available at the same level of computational efficiency, automation, and convenience.
This thesis presents approaches to simplify experimentation with rich information beyond the gradient
by making it more readily accessible. We present an implementation of these ideas as an extension to the
backpropagation procedure in PyTorch. Using this newly accessible information, we demonstrate possible use
cases by (i) showing how it can inform our understanding of neural network training by building a diagnostic
tool, and (ii) enabling novel methods to efficiently compute and approximate curvature information.
First, we extend gradient backpropagation for sequential feedforward models to Hessian backpropagation
which enables computing approximate per-layer curvature. This perspective unifies recently proposed block-
diagonal curvature approximations. Like gradient backpropagation, the computation of these second-order
derivatives is modular, and therefore simple to automate and extend to new operations.
Based on the insight that rich information beyond the gradient can be computed efficiently and at the
same time, we extend the backpropagation in PyTorch with the BackPACK library. It provides efficient and
convenient access to statistical moments of the gradient and approximate curvature information, often at a
small overhead compared to computing just the gradient.
Next, we showcase the utility of such information to better understand neural network training. We build
the Cockpit library that visualizes what is happening inside the model during training through various
instruments that rely on BackPACK’s statistics. We show how Cockpit provides a meaningful statistical
summary report to the deep learning engineer to identify bugs in their machine learning pipeline, guide
hyperparameter tuning, and study deep learning phenomena.
Finally, we use BackPACK’s extended automatic differentiation functionality to develop ViViT, an approach
to efficiently compute curvature information, in particular curvature noise. It uses the low-rank structure
of the generalized Gauss-Newton approximation to the Hessian and addresses shortcomings in existing
curvature approximations. Through monitoring curvature noise, we demonstrate how ViViT’s information
helps in understanding challenges to make second-order optimization methods work in practice.
This work develops new tools to experiment more easily with higher-order information in complex deep
learning models. These tools have impacted works on Bayesian applications with Laplace approximations,
out-of-distribution generalization, differential privacy, and the design of automatic differentia-
tion systems. They constitute one important step towards developing and establishing more efficient deep
learning algorithms
LIPIcs, Volume 251, ITCS 2023, Complete Volume
LIPIcs, Volume 251, ITCS 2023, Complete Volum
A top-down approach to algebraic renormalization in regularity structures based on multi-indices
We provide an algebraic framework to describe renormalization in regularity
structures based on multi-indices for a large class of semi-linear stochastic
PDEs. This framework is ``top-down", in the sense that we postulate the form of
the counterterm and use the renormalized equation to build a canonical smooth
model for it. The core of the construction is a generalization of the Hopf
algebra of derivations in [LOT23], which is extended beyond the structure group
to describe the model equation via an exponential map: This allows to implement
a renormalization procedure which resembles the preparation map approach in our
context.Comment: 65 page
Automated identification and behaviour classification for modelling social dynamics in group-housed mice
Mice are often used in biology as exploratory models of human conditions, due to their similar genetics and physiology. Unfortunately, research on behaviour has traditionally been limited to studying individuals in isolated environments and over short periods of time. This can miss critical time-effects, and, since mice are social creatures, bias results.
This work addresses this gap in research by developing tools to analyse the individual behaviour of group-housed mice in the home-cage over several days and with minimal disruption. Using data provided by the Mary Lyon Centre at MRC Harwell we designed an end-to-end system that (a) tracks and identifies mice in a cage, (b) infers their behaviour, and subsequently (c) models the group dynamics as functions of individual activities. In support of the above, we also curated and made available a large dataset of mouse localisation and behaviour classifications (IMADGE), as well as two smaller annotated datasets for training/evaluating the identification (TIDe) and behaviour inference (ABODe) systems. This research constitutes the first of its kind in terms of the scale and challenges addressed. The data source (side-view single-channel video with clutter and no identification markers for mice) presents challenging conditions for analysis, but has the potential to give richer information while using industry standard housing.
A Tracking and Identification module was developed to automatically detect, track and identify the (visually similar) mice in the cluttered home-cage using only single-channel IR video and coarse position from RFID readings. Existing detectors and trackers were combined with a novel Integer Linear Programming formulation to assign anonymous tracks to mouse identities. This utilised a probabilistic weight model of affinity between detections and RFID pickups.
The next task necessitated the implementation of the Activity Labelling module that classifies the behaviour of each mouse, handling occlusion to avoid giving unreliable classifications when the mice cannot be observed. Two key aspects of this were (a) careful feature-selection, and (b) judicious balancing of the errors of the system in line with the repercussions for our setup.
Given these sequences of individual behaviours, we analysed the interaction dynamics between mice in the same cage by collapsing the group behaviour into a sequence of interpretable latent regimes using both static and temporal (Markov) models. Using a permutation matrix, we were able to automatically assign mice to roles in the HMM, fit a global model to a group of cages and analyse abnormalities in data from a different demographic
La traduzione specializzata all’opera per una piccola impresa in espansione: la mia esperienza di internazionalizzazione in cinese di Bioretics© S.r.l.
Global markets are currently immersed in two all-encompassing and unstoppable processes: internationalization and globalization. While the former pushes companies to look beyond the borders of their country of origin to forge relationships with foreign trading partners, the latter fosters the standardization in all countries, by reducing spatiotemporal distances and breaking down geographical, political, economic and socio-cultural barriers. In recent decades, another domain has appeared to propel these unifying drives: Artificial Intelligence, together with its high technologies aiming to implement human cognitive abilities in machinery. The “Language Toolkit – Le lingue straniere al servizio dell’internazionalizzazione dell’impresa” project, promoted by the Department of Interpreting and Translation (Forlì Campus) in collaboration with the Romagna Chamber of Commerce (Forlì-Cesena and Rimini), seeks to help Italian SMEs make their way into the global market. It is precisely within this project that this dissertation has been conceived. Indeed, its purpose is to present the translation and localization project from English into Chinese of a series of texts produced by Bioretics© S.r.l.: an investor deck, the company website and part of the installation and use manual of the Aliquis© framework software, its flagship product. This dissertation is structured as follows: Chapter 1 presents the project and the company in detail; Chapter 2 outlines the internationalization and globalization processes and the Artificial Intelligence market both in Italy and in China; Chapter 3 provides the theoretical foundations for every aspect related to Specialized Translation, including website localization; Chapter 4 describes the resources and tools used to perform the translations; Chapter 5 proposes an analysis of the source texts; Chapter 6 is a commentary on translation strategies and choices
Cost-Size Semantics for Call-By-Value Higher-Order Rewriting
Higher-order rewriting is a framework in which higher-order programs can be described by transformation rules on expressions. A computation occurs by transforming an expression into another using such rules. This step-by-step computation model induced by rewriting naturally gives rise to a notion of complexity as the number of steps needed to reduce expressions to a normal form, i.e., an expression that cannot be reduced further. The study of complexity analysis focuses on the development of automatable techniques to provide bounds to this number. In this paper, we consider a form of higher-order rewriting with a call-by-value evaluation strategy, so as to model call-by-value programs. We provide a cost-size semantics: a class of algebraic interpretations to map terms to tuples which bound both the reduction cost and the size of normal forms
gaps as derived models and correctness of mice
Assume ZF + AD + V=L(R). Let be a gap with
admissible. We analyze as a natural form of
``derived model'' of a premouse , where is found in a generic extension
of . In particular, we will have , and if ``
exists'', then and in fact have the same universe. This
analysis will be employed in further work, yet to appear, toward a resolution
of a conjecture of Rudominer and Steel on the nature of , for
-small mice . We also establish some preliminary work toward this
conjecture in the present paper.Comment: 128 page
Analytical validation of innovative magneto-inertial outcomes: a controlled environment study.
peer reviewe
Twisted Higgs Bundles, Topological Recursion, and Quantum Curves
The focus of this thesis is on the interplay between Higgs bundles and topological recursion. Our interests lie in the relationship between quantum curves and the quantization of Hitchin spectral curves, and also the relationship between Eynard-Orantin differentials and the geometry of the Hitchin moduli space.
We give an overview of existing results in the literature on quantum curves, covering the necessary material to construct a quantum curve from a meromorphic SL(2, C)-Hitchin spectral curve. Starting from the quantum curve, we offer a new perspective on the quantization that includes the spectral correspondence and C∗-action. We view the quantization as a procedure that happens on the spectral curve, rather than the base. This idea frames quantization around the tautological section, rather than the Higgs field.
Previous works relating meromorphic Higgs bundles to topological recursion have considered non-singular models to allow the recursion to be done on a smooth Riemann surface. In this thesis, we start from an L-twisted Higgs bundle. By studying the deformation theory of the L-twisted moduli space, we interpret L as meromorphic data on a subbundle of an ordinary Higgs bundle. We encode this meromorphic data as a b-structure on the base Riemann surface and spectral curve. We then propose a so-called twisted recursion on the spectral curve, where the Eynard-Orantin differentials live in the twisted cotangent bundle. We show that the g = 0 twisted Eynard-Orantin differentials compute the Taylor expansion of the period matrix of a Hitchin spectral curve, mirroring a result for ordinary Higgs bundles and topological recursion. In particular, this shows that the geometry of the spectral curve is independent of the ambient space in which it resides
- …