23 research outputs found
Eigen-AD: Algorithmic Differentiation of the Eigen Library
In this work we present useful techniques and possible enhancements when
applying an Algorithmic Differentiation (AD) tool to the linear algebra library
Eigen using our in-house AD by overloading (AD-O) tool dco/c++ as a case study.
After outlining performance and feasibility issues when calculating derivatives
for the official Eigen release, we propose Eigen-AD, which enables different
optimization options for an AD-O tool by providing add-on modules for Eigen.
The range of features includes a better handling of expression templates for
general performance improvements, as well as implementations of symbolically
derived expressions for calculating derivatives of certain core operations. The
software design allows an AD-O tool to provide specializations to automatically
include symbolic operations and thereby keep the look and feel of plain AD by
overloading. As a showcase, dco/c++ is provided with such a module and its
significant performance improvements are validated by benchmarks.Comment: Updated with accepted version for ICCS 2020 conference proceedings.
The final authenticated publication is available online at
https://doi.org/10.1007/978-3-030-50371-0_51. See v1 for the original,
extended preprint. 14 pages, 7 figure
Index handling and assign optimization for Algorithmic Differentiation reuse index managers
For operator overloading Algorithmic Differentiation tools, the
identification of primal variables and adjoint variables is usually done via
indices. Two common schemes exist for their management and distribution. The
linear approach is easy to implement and supports memory optimization with
respect to copy statements. On the other hand, the reuse approach requires more
implementation effort but results in much smaller adjoint vectors, which are
more suitable for the vector mode of Algorithmic Differentiation. In this
paper, we present both approaches, how to implement them, and discuss their
advantages, disadvantages and properties of the resulting Algorithmic
Differentiation type. In addition, a new management scheme is presented which
supports copy optimizations and the reuse of indices, thus combining the
advantages of the other two. The implementations of all three schemes are
compared on a simple synthetic example and on a real world example using the
computational fluid dynamics solver in SU2.Comment: 20 pages, 14 figures, 4 table
Reduction of the Random Access Memory Size in Adjoint Algorithmic Differentiation by Overloading
Adjoint algorithmic differentiation by operator and function overloading is
based on the interpretation of directed acyclic graphs resulting from
evaluations of numerical simulation programs. The size of the computer system
memory required to store the graph grows proportional to the number of
floating-point operations executed by the underlying program. It quickly
exceeds the available memory resources. Naive adjoint algorithmic
differentiation often becomes infeasible except for relatively simple numerical
simulations.
Access to the data associated with the graph can be classified as sequential
and random. The latter refers to memory access patterns defined by the
adjacency relationship between vertices within the graph. Sequentially accessed
data can be decomposed into blocks. The blocks can be streamed across the
system memory hierarchy thus extending the amount of available memory, for
example, to hard discs. Asynchronous i/o can help to mitigate the increased
cost due to accesses to slower memory. Much larger problem instances can thus
be solved without resorting to technically challenging user intervention such
as checkpointing. Randomly accessed data should not have to be decomposed. Its
block-wise streaming is likely to yield a substantial overhead in computational
cost due to data accesses across blocks. Consequently, the size of the randomly
accessed memory required by an adjoint should be kept minimal in order to
eliminate the need for decomposition. We propose a combination of dedicated
memory for adjoint -values with the exploitation of remainder bandwidth as a
possible solution. Test results indicate significant savings in random access
memory size while preserving overall computational efficiency
Automatic differentiation in machine learning: a survey
Derivatives, mostly in the form of gradients and Hessians, are ubiquitous in
machine learning. Automatic differentiation (AD), also called algorithmic
differentiation or simply "autodiff", is a family of techniques similar to but
more general than backpropagation for efficiently and accurately evaluating
derivatives of numeric functions expressed as computer programs. AD is a small
but established field with applications in areas including computational fluid
dynamics, atmospheric sciences, and engineering design optimization. Until very
recently, the fields of machine learning and AD have largely been unaware of
each other and, in some cases, have independently discovered each other's
results. Despite its relevance, general-purpose AD has been missing from the
machine learning toolbox, a situation slowly changing with its ongoing adoption
under the names "dynamic computational graphs" and "differentiable
programming". We survey the intersection of AD and machine learning, cover
applications where AD has direct relevance, and address the main implementation
techniques. By precisely defining the main differentiation techniques and their
interrelationships, we aim to bring clarity to the usage of the terms
"autodiff", "automatic differentiation", and "symbolic differentiation" as
these are encountered more and more in machine learning settings.Comment: 43 pages, 5 figure
Forward-Mode Automatic Differentiation of Compiled Programs
Algorithmic differentiation (AD) is a set of techniques that provide partial
derivatives of computer-implemented functions. Such a function can be supplied
to state-of-the-art AD tools via its source code, or via an intermediate
representation produced while compiling its source code.
We present the novel AD tool Derivgrind, which augments the machine code of
compiled programs with forward-mode AD logic. Derivgrind leverages the Valgrind
instrumentation framework for a structured access to the machine code, and a
shadow memory tool to store dot values. Access to the source code is required
at most for the files in which input and output variables are defined.
Derivgrind's versatility comes at the price of scaling the run-time by a
factor between 30 and 75, measured on a benchmark based on a numerical solver
for a partial differential equation. Results of our extensive regression test
suite indicate that Derivgrind produces correct results on GCC- and
Clang-compiled programs, including a Python interpreter, with a small number of
exceptions. While we provide a list of scenarios that Derivgrind does not
handle correctly, nearly all of them are academic counterexamples or originate
from highly optimized math libraries. As long as differentiating those is
avoided, Derivgrind can be applied to an unprecedentedly wide range of
cross-language or partially closed-source software with little integration
efforts.Comment: 21 pages, 3 figures, 3 tables, 5 listing
Differentiable world programs
L'intelligence artificielle (IA) moderne a ouvert de nouvelles perspectives prometteuses pour la création de robots intelligents. En particulier, les architectures d'apprentissage basées sur le gradient (réseaux neuronaux profonds) ont considérablement amélioré la compréhension des scènes 3D en termes de perception, de raisonnement et d'action.
Cependant, ces progrès ont affaibli l'attrait de nombreuses techniques ``classiques'' développées au cours des dernières décennies.
Nous postulons qu'un mélange de méthodes ``classiques'' et ``apprises'' est la voie la plus prometteuse pour développer des modèles du monde flexibles, interprétables et exploitables : une nécessité pour les agents intelligents incorporés.
La question centrale de cette thèse est : ``Quelle est la manière idéale de combiner les techniques classiques avec des architectures d'apprentissage basées sur le gradient pour une compréhension riche du monde 3D ?''. Cette vision ouvre la voie à une multitude d'applications qui ont un impact fondamental sur la façon dont les agents physiques perçoivent et interagissent avec leur environnement. Cette thèse, appelée ``programmes différentiables pour modèler l'environnement'', unifie les efforts de plusieurs domaines étroitement liés mais actuellement disjoints, notamment la robotique, la vision par ordinateur, l'infographie et l'IA.
Ma première contribution---gradSLAM--- est un système de localisation et de cartographie simultanées (SLAM) dense et entièrement différentiable. En permettant le calcul du gradient à travers des composants autrement non différentiables tels que l'optimisation non linéaire par moindres carrés, le raycasting, l'odométrie visuelle et la cartographie dense, gradSLAM ouvre de nouvelles voies pour intégrer la reconstruction 3D classique et l'apprentissage profond.
Ma deuxième contribution - taskography - propose une sparsification conditionnée par la tâche de grandes scènes 3D encodées sous forme de graphes de scènes 3D. Cela permet aux planificateurs classiques d'égaler (et de surpasser) les planificateurs de pointe basés sur l'apprentissage en concentrant le calcul sur les attributs de la scène pertinents pour la tâche.
Ma troisième et dernière contribution---gradSim--- est un simulateur entièrement différentiable qui combine des moteurs physiques et graphiques différentiables pour permettre l'estimation des paramètres physiques et le contrôle visuomoteur, uniquement à partir de vidéos ou d'une image fixe.Modern artificial intelligence (AI) has created exciting new opportunities for building intelligent robots. In particular, gradient-based learning architectures (deep neural networks) have tremendously improved 3D scene understanding in terms of perception, reasoning, and action.
However, these advancements have undermined many ``classical'' techniques developed over the last few decades.
We postulate that a blend of ``classical'' and ``learned'' methods is the most promising path to developing flexible, interpretable, and actionable models of the world: a necessity for intelligent embodied agents.
``What is the ideal way to combine classical techniques with gradient-based learning architectures for a rich understanding of the 3D world?'' is the central question in this dissertation. This understanding enables a multitude of applications that fundamentally impact how embodied agents perceive and interact with their environment. This dissertation, dubbed ``differentiable world programs'', unifies efforts from multiple closely-related but currently-disjoint fields including robotics, computer vision, computer graphics, and AI.
Our first contribution---gradSLAM---is a fully differentiable dense simultaneous localization and mapping (SLAM) system. By enabling gradient computation through otherwise non-differentiable components such as nonlinear least squares optimization, ray casting, visual odometry, and dense mapping, gradSLAM opens up new avenues for integrating classical 3D reconstruction and deep learning.
Our second contribution---taskography---proposes a task-conditioned sparsification of large 3D scenes encoded as 3D scene graphs. This enables classical planners to match (and surpass) state-of-the-art learning-based planners by focusing computation on task-relevant scene attributes.
Our third and final contribution---gradSim---is a fully differentiable simulator that composes differentiable physics and graphics engines to enable physical parameter estimation and visuomotor control, solely from videos or a still image
Music and time: tempomorphism: nested temporalities in perceived experience of music.
This thesis represents the results of a theoretical and practical investigation of acoustic and electro-acoustic elements of Western music at the start of the twentyfirst
century, with specific attention to soundscapes. A commentary on the development of soundscapes is drawn from a multidisciplinary overview of concepts of time, followed by an examination of concepts of time in music. As a response to Jonathan Kramer's concept of `vertical' music (a characteristic aesthetic of which is an absence of conventional harmonic teleology), particular attention is paid to those theories of multiple nested temporalities which have been referred to by Kramer in support of non-teleological musical structures.
The survey suggests that new musical concepts, such as vertical music, have emerged from sensibilities resulting from the musical and associated styles of minimalism, and represent an ontological development of aesthetics characteristic of the twentieth century. An original contention of the debate is that innovations in the
practice of music as the result of technological developments have led to the possibility of defining a methodology of process in addition to auditive strategies,
resulting in a duality defined as 'tempomorphic'. Further observations are supplied, using findings derived from original creative practical research, to define
tempomorphic performance, which complete the contribution to knowledge offered by the investigation. Tempomorphism, therefore, is defined as a duality of process and audition: as auditive tool, tempomorphic analysis provides a listening strategy suited to harmonically static music; as a procedural tool, it affords a methodology based primarily on duration