439 research outputs found

    ICICLE: Interpretable Class Incremental Continual Learning

    Full text link
    Continual learning enables incremental learning of new tasks without forgetting those previously learned, resulting in positive knowledge transfer that can enhance performance on both new and old tasks. However, continual learning poses new challenges for interpretability, as the rationale behind model predictions may change over time, leading to interpretability concept drift. We address this problem by proposing Interpretable Class-InCremental LEarning (ICICLE), an exemplar-free approach that adopts a prototypical part-based approach. It consists of three crucial novelties: interpretability regularization that distills previously learned concepts while preserving user-friendly positive reasoning; proximity-based prototype initialization strategy dedicated to the fine-grained setting; and task-recency bias compensation devoted to prototypical parts. Our experimental results demonstrate that ICICLE reduces the interpretability concept drift and outperforms the existing exemplar-free methods of common class-incremental learning when applied to concept-based models. We make the code available.Comment: Under review, code will be shared after the acceptanc

    Advances in generative models for dynamic scenes

    Full text link
    Les réseaux de neurones sont un type de modèle d'apprentissage automatique (ML) qui résolvent des tâches complexes d'intelligence artificielle (AI) sans nécessiter de représentations de données élaborées manuellement. Bien qu'ils aient obtenu des résultats impressionnants dans des tâches nécessitant un traitement de la parole, d’image, et du langage, les réseaux de neurones ont encore de la difficulté à résoudre des tâches de compréhension de scènes dynamiques. De plus, l’entraînement de réseaux de neurones nécessite généralement de nombreuses données annotées manuellement, ce qui peut être un processus long et coûteux. Cette thèse est composée de quatre articles proposant des modèles génératifs pour des scènes dynamiques. La modélisation générative est un domaine du ML qui étudie comment apprendre les mécanismes par lesquels les données sont produites. La principale motivation derrière les modèles génératifs est de pouvoir, sans utiliser d’étiquettes, apprendre des représentations de données utiles; c’est un sous-produit de l'approximation du processus de génération de données. De plus, les modèles génératifs sont utiles pour un large éventail d'applications telles que la super-résolution d'images, la synthèse vocale ou le résumé de texte. Le premier article se concentre sur l'amélioration de la performance des précédents auto-encodeurs variationnels (VAE) pour la prédiction vidéo. Il s’agit d’une tâche qui consiste à générer les images futures d'une scène dynamique, compte tenu de certaines observations antérieures. Les VAE sont une famille de modèles à variables latentes qui peuvent être utilisés pour échantillonner des points de données. Comparés à d'autres modèles génératifs, les VAE sont faciles à entraîner et ont tendance à couvrir tous les modes des données, mais produisent souvent des résultats de moindre qualité. En prédiction vidéo, les VAE ont été les premiers modèles capables de produire des images futures plausibles à partir d’un contexte donné, un progrès marquant par rapport aux modèles précédents car, pour la plupart des scènes dynamiques, le futur n'est pas une fonction déterministe du passé. Cependant, les premiers VAE pour la prédiction vidéo produisaient des résultats avec des artefacts visuels visibles et ne fonctionnaient pas sur des ensembles de données réalistes complexes. Dans cet article, nous identifions certains des facteurs limitants de ces modèles, et nous proposons pour chacun d’eux une solution pour en atténuer l'impact. Grâce à ces modifications, nous montrons que les VAE pour la prédiction vidéo peuvent obtenir des résultats de qualité nettement supérieurs par rapport aux références précédentes, et qu'ils peuvent être utilisés pour modéliser des scènes de conduite autonome. Dans le deuxième article, nous proposons un nouveau modèle en cascade pour la génération vidéo basé sur les réseaux antagonistes génératifs (GAN). Après le succès des VAE pour prédiction vidéo, il a été démontré que les GAN produisaient des échantillons vidéo de meilleure qualité pour la génération vidéo conditionnelle à des classes. Cependant, les GAN nécessitent de très grandes tailles de lots ainsi que des modèles de grande capacité, ce qui rend l’entraînement des GAN pour la génération vidéo coûteux computationnellement, à la fois en termes de mémoire et en temps de calcul. Nous proposons de scinder le processus génératif en une cascade de sous-modèles, chacun d'eux résolvant un problème plus simple. Cette division nous permet de réduire considérablement le coût computationnel tout en conservant la qualité de l'échantillon, et nous démontrons que ce modèle peut s'adapter à de très grands ensembles de données ainsi qu’à des vidéos de haute résolution. Dans le troisième article, nous concevons un modèle basé sur le principe qu'une scène est composée de différents objets, mais que les transitions de trame (également appelées règles dynamiques) sont partagées entre les objets. Pour mettre en œuvre cette hypothèse de modélisation, nous concevons un modèle qui extrait d'abord les différentes entités d'une image. Ensuite, le modèle apprend à mettre à jour la représentation de l'objet d'une image à l'autre en choisissant parmi différentes transitions possibles qui sont toutes partagées entre les différents objets. Nous montrons que, lors de l'apprentissage d'un tel modèle, les règles de transition sont fondées sémantiquement, et peuvent être appliquées à des objets non vus lors de l'apprentissage. De plus, nous pouvons utiliser ce modèle pour prédire les observations multimodales futures d'une scène dynamique en choisissant différentes transitions. Dans le dernier article nous proposons un modèle génératif basé sur des techniques de rendu 3D qui permet de générer des scènes avec plusieurs objets. Nous concevons un mécanisme d'inférence pour apprendre les représentations qui peuvent être rendues avec notre modèle et nous optimisons simultanément ce mécanisme d'inférence et le moteur de rendu. Nous montrons que ce modèle possède une représentation interprétable dans laquelle des changements sémantiques appliqués à la représentation de la scène sont rendus dans la scène générée. De plus, nous montrons que, suite au processus d’entraînement, notre modèle apprend à segmenter les objets dans une scène sans annotations et que la représentation apprise peut être utilisée pour résoudre des tâches de compréhension de scène dynamique en déduisant la représentation de chaque observation.Neural networks are a type of Machine Learning (ML) models that solve complex Artificial Intelligence (AI) tasks without requiring handcrafted data representations. Although they have achieved impressive results in tasks requiring speech, image and language processing, neural networks still struggle to solve dynamic scene understanding tasks. Furthermore, training neural networks usually demands lots data that is annotated manually, which can be an expensive and time-consuming process. This thesis is comprised of four articles proposing generative models for dynamic scenes. Generative modelling is an area of ML that investigates how to learn the mechanisms by which data is produced. The main motivation for generative models is to learn useful data representations without labels as a by-product of approximating the data generation process. Furthermore, generative models are useful for a wide range of applications such as image super-resolution, voice synthesis or text summarization. The first article focuses on improving the performance of previous Variational AutoEncoders (VAEs) for video prediction, which is the task of generating future frames of a dynamic scene given some previous occurred observations. VAEs are a family of latent variable models that can be used to sample data points. Compared to other generative models, VAEs are easy to train and tend to cover all data modes, but often produce lower quality results. In video prediction VAEs were the first models that were able to produce multiple plausible future outcomes given a context, marking an advancement over previous models as for most dynamic scenes the future is not a deterministic function of the past. However, the first VAEs for video prediction produced results with visible visual artifacts and could not operate on complex realistic datasets. In this article we identify some of the limiting factors for these models, and for each of them we propose a solution to ease its impact. With our proposed modifications, we show that VAEs for video prediction can obtain significant higher quality results over previous baselines and that they can be used to model autonomous driving scenes. In the second article we propose a new cascaded model for video generation based on Generative Adversarial Networks (GANs). After the success of VAEs in video prediction, GANs were shown to produce higher quality video samples for class-conditional video generation. However, GANs require very large batch sizes and high capacity models, which makes training GANs for video generation computationally expensive, both in terms of memory and training time. We propose to split the generative process into a cascade of submodels, each of them solving a smaller generative problem. This split allows us to significantly reduce the computational requirements while retaining sample quality, and we show that this model can scale to very large datasets and video resolutions. In the third article we design a model based on the premise that a scene is comprised of different objects but that frame transitions (also known as dynamic rules) are shared among objects. To implement this modeling assumption we design a model that first extracts the different entities in a frame, and then learns to update the object representation from one frame to another by choosing among different possible transitions, all shared among objects. We show that, when learning such a model, the transition rules are semantically grounded and can be applied to objects not seen during training. Further, we can use this model for predicting multimodal future observations of a dynamic scene by choosing different transitions. In the last article we propose a generative model based on 3D rendering techniques that can generate scenes with multiple objects. We design an inference mechanism to learn representations that can be rendered with our model and we simultaneously optimize this inference mechanism and the renderer. We show that this model has an interpretable representation in which semantic changes to the scene representation are shown in the output. Furthermore, we show that, as a by product of the training process, our model learns to segment the objects in a scene without annotations and that the learned representation can be used to solve dynamic scene understanding tasks by inferring the representation of each observation

    The Geometry of Self-supervised Learning Models and its Impact on Transfer Learning

    Full text link
    Self-supervised learning (SSL) has emerged as a desirable paradigm in computer vision due to the inability of supervised models to learn representations that can generalize in domains with limited labels. The recent popularity of SSL has led to the development of several models that make use of diverse training strategies, architectures, and data augmentation policies with no existing unified framework to study or assess their effectiveness in transfer learning. We propose a data-driven geometric strategy to analyze different SSL models using local neighborhoods in the feature space induced by each. Unlike existing approaches that consider mathematical approximations of the parameters, individual components, or optimization landscape, our work aims to explore the geometric properties of the representation manifolds learned by SSL models. Our proposed manifold graph metrics (MGMs) provide insights into the geometric similarities and differences between available SSL models, their invariances with respect to specific augmentations, and their performances on transfer learning tasks. Our key findings are two fold: (i) contrary to popular belief, the geometry of SSL models is not tied to its training paradigm (contrastive, non-contrastive, and cluster-based); (ii) we can predict the transfer learning capability for a specific model based on the geometric properties of its semantic and augmentation manifolds.Comment: 22 page

    Curve Skeleton and Moments of Area Supported Beam Parametrization in Multi-Objective Compliance Structural Optimization

    Get PDF
    This work addresses the end-to-end virtual automation of structural optimization up to the derivation of a parametric geometry model that can be used for application areas such as additive manufacturing or the verification of the structural optimization result with the finite element method. A holistic design in structural optimization can be achieved with the weighted sum method, which can be automatically parameterized with curve skeletonization and cross-section regression to virtually verify the result and control the local size for additive manufacturing. is investigated in general. In this paper, a holistic design is understood as a design that considers various compliances as an objective function. This parameterization uses the automated determination of beam parameters by so-called curve skeletonization with subsequent cross-section shape parameter estimation based on moments of area, especially for multi-objective optimized shapes. An essential contribution is the linking of the parameterization with the results of the structural optimization, e.g., to include properties such as boundary conditions, load conditions, sensitivities or even density variables in the curve skeleton parameterization. The parameterization focuses on guiding the skeletonization based on the information provided by the optimization and the finite element model. In addition, the cross-section detection considers circular, elliptical, and tensor product spline cross-sections that can be applied to various shape descriptors such as convolutional surfaces, subdivision surfaces, or constructive solid geometry. The shape parameters of these cross-sections are estimated using stiffness distributions, moments of area of 2D images, and convolutional neural networks with a tailored loss function to moments of area. Each final geometry is designed by extruding the cross-section along the appropriate curve segment of the beam and joining it to other beams by using only unification operations. The focus of multi-objective structural optimization considering 1D, 2D and 3D elements is on cases that can be modeled using equations by the Poisson equation and linear elasticity. This enables the development of designs in application areas such as thermal conduction, electrostatics, magnetostatics, potential flow, linear elasticity and diffusion, which can be optimized in combination or individually. Due to the simplicity of the cases defined by the Poisson equation, no experts are required, so that many conceptual designs can be generated and reconstructed by ordinary users with little effort. Specifically for 1D elements, a element stiffness matrices for tensor product spline cross-sections are derived, which can be used to optimize a variety of lattice structures and automatically convert them into free-form surfaces. For 2D elements, non-local trigonometric interpolation functions are used, which should significantly increase interpretability of the density distribution. To further improve the optimization, a parameter-free mesh deformation is embedded so that the compliances can be further reduced by locally shifting the node positions. Finally, the proposed end-to-end optimization and parameterization is applied to verify a linear elasto-static optimization result for and to satisfy local size constraint for the manufacturing with selective laser melting of a heat transfer optimization result for a heat sink of a CPU. For the elasto-static case, the parameterization is adjusted until a certain criterion (displacement) is satisfied, while for the heat transfer case, the manufacturing constraints are satisfied by automatically changing the local size with the proposed parameterization. This heat sink is then manufactured without manual adjustment and experimentally validated to limit the temperature of a CPU to a certain level.:TABLE OF CONTENT III I LIST OF ABBREVIATIONS V II LIST OF SYMBOLS V III LIST OF FIGURES XIII IV LIST OF TABLES XVIII 1. INTRODUCTION 1 1.1 RESEARCH DESIGN AND MOTIVATION 6 1.2 RESEARCH THESES AND CHAPTER OVERVIEW 9 2. PRELIMINARIES OF TOPOLOGY OPTIMIZATION 12 2.1 MATERIAL INTERPOLATION 16 2.2 TOPOLOGY OPTIMIZATION WITH PARAMETER-FREE SHAPE OPTIMIZATION 17 2.3 MULTI-OBJECTIVE TOPOLOGY OPTIMIZATION WITH THE WEIGHTED SUM METHOD 18 3. SIMULTANEOUS SIZE, TOPOLOGY AND PARAMETER-FREE SHAPE OPTIMIZATION OF WIREFRAMES WITH B-SPLINE CROSS-SECTIONS 21 3.1 FUNDAMENTALS IN WIREFRAME OPTIMIZATION 22 3.2 SIZE AND TOPOLOGY OPTIMIZATION WITH PERIODIC B-SPLINE CROSS-SECTIONS 27 3.3 PARAMETER-FREE SHAPE OPTIMIZATION EMBEDDED IN SIZE OPTIMIZATION 32 3.4 WEIGHTED SUM SIZE AND TOPOLOGY OPTIMIZATION 36 3.5 CROSS-SECTION COMPARISON 39 4. NON-LOCAL TRIGONOMETRIC INTERPOLATION IN TOPOLOGY OPTIMIZATION 41 4.1 FUNDAMENTALS IN MATERIAL INTERPOLATIONS 43 4.2 NON-LOCAL TRIGONOMETRIC SHAPE FUNCTIONS 45 4.3 NON-LOCAL PARAMETER-FREE SHAPE OPTIMIZATION WITH TRIGONOMETRIC SHAPE FUNCTIONS 49 4.4 NON-LOCAL AND PARAMETER-FREE MULTI-OBJECTIVE TOPOLOGY OPTIMIZATION 54 5. FUNDAMENTALS IN SKELETON GUIDED SHAPE PARAMETRIZATION IN TOPOLOGY OPTIMIZATION 58 5.1 SKELETONIZATION IN TOPOLOGY OPTIMIZATION 61 5.2 CROSS-SECTION RECOGNITION FOR IMAGES 66 5.3 SUBDIVISION SURFACES 67 5.4 CONVOLUTIONAL SURFACES WITH META BALL KERNEL 71 5.5 CONSTRUCTIVE SOLID GEOMETRY 73 6. CURVE SKELETON GUIDED BEAM PARAMETRIZATION OF TOPOLOGY OPTIMIZATION RESULTS 75 6.1 FUNDAMENTALS IN SKELETON SUPPORTED RECONSTRUCTION 76 6.2 SUBDIVISION SURFACE PARAMETRIZATION WITH PERIODIC B-SPLINE CROSS-SECTIONS 78 6.3 CURVE SKELETONIZATION TAILORED TO TOPOLOGY OPTIMIZATION WITH PRE-PROCESSING 82 6.4 SURFACE RECONSTRUCTION USING LOCAL STIFFNESS DISTRIBUTION 86 7. CROSS-SECTION SHAPE PARAMETRIZATION FOR PERIODIC B-SPLINES 96 7.1 PRELIMINARIES IN B-SPLINE CONTROL GRID ESTIMATION 97 7.2 CROSS-SECTION EXTRACTION OF 2D IMAGES 101 7.3 TENSOR SPLINE PARAMETRIZATION WITH MOMENTS OF AREA 105 7.4 B-SPLINE PARAMETRIZATION WITH MOMENTS OF AREA GUIDED CONVOLUTIONAL NEURAL NETWORK 110 8. FULLY AUTOMATED COMPLIANCE OPTIMIZATION AND CURVE-SKELETON PARAMETRIZATION FOR A CPU HEAT SINK WITH SIZE CONTROL FOR SLM 115 8.1 AUTOMATED 1D THERMAL COMPLIANCE MINIMIZATION, CONSTRAINED SURFACE RECONSTRUCTION AND ADDITIVE MANUFACTURING 118 8.2 AUTOMATED 2D THERMAL COMPLIANCE MINIMIZATION, CONSTRAINT SURFACE RECONSTRUCTION AND ADDITIVE MANUFACTURING 120 8.3 USING THE HEAT SINK PROTOTYPES COOLING A CPU 123 9. CONCLUSION 127 10. OUTLOOK 131 LITERATURE 133 APPENDIX 147 A PREVIOUS STUDIES 147 B CROSS-SECTION PROPERTIES 149 C CASE STUDIES FOR THE CROSS-SECTION PARAMETRIZATION 155 D EXPERIMENTAL SETUP 15

    Stratified Principal Component Analysis

    Full text link
    This paper investigates a general family of models that stratifies the space of covariance matrices by eigenvalue multiplicity. This family, coined Stratified Principal Component Analysis (SPCA), includes in particular Probabilistic PCA (PPCA) models, where the noise component is assumed to be isotropic. We provide an explicit maximum likelihood and a geometric characterization relying on flag manifolds. A key outcome of this analysis is that PPCA's parsimony (with respect to the full covariance model) is due to the eigenvalue-equality constraint in the noise space and the subsequent inference of a multidimensional eigenspace. The sequential nature of flag manifolds enables to extend this constraint to the signal space and bring more parsimonious models. Moreover, the stratification and the induced partial order on SPCA yield efficient model selection heuristics. Experiments on simulated and real datasets substantiate the interest of equalising adjacent sample eigenvalues when the gaps are small and the number of samples is limited. They notably demonstrate that SPCA models achieve a better complexity/goodness-of-fit tradeoff than PPCA

    A Survey of Zero-shot Generalisation in Deep Reinforcement Learning

    Get PDF
    The study of zero-shot generalisation (ZSG) in deep Reinforcement Learning (RL) aims to produce RL algorithms whose policies generalise well to novel unseen situations at deployment time, avoiding overfitting to their training environments. Tackling this is vital if we are to deploy reinforcement learning algorithms in real world scenarios, where the environment will be diverse, dynamic and unpredictable. This survey is an overview of this nascent field. We rely on a unifying formalism and terminology for discussing different ZSG problems, building upon previous works. We go on to categorise existing benchmarks for ZSG, as well as current methods for tackling these problems. Finally, we provide a critical discussion of the current state of the field, including recommendations for future work. Among other conclusions, we argue that taking a purely procedural content generation approach to benchmark design is not conducive to progress in ZSG, we suggest fast online adaptation and tackling RL-specific problems as some areas for future work on methods for ZSG, and we recommend building benchmarks in underexplored problem settings such as offline RL ZSG and reward-function variation

    A perceptual sound space for auditory displays based on sung-vowel synthesis

    Get PDF
    When designing displays for the human senses, perceptual spaces are of great importance to give intuitive access to physical attributes. Similar to how perceptual spaces based on hue, saturation, and lightness were constructed for visual color, research has explored perceptual spaces for sounds of a given timbral family based on timbre, brightness, and pitch. To promote an embodied approach to the design of auditory displays, we introduce the Vowel-Type-Pitch (VTP) space, a cylindrical sound space based on human sung vowels, whose timbres can be synthesized by the composition of acoustic formants and can be categorically labeled. Vowels are arranged along the circular dimension, while voice type and pitch of the vowel correspond to the remaining two axes of the cylindrical VTP space. The decoupling and perceptual effectiveness of the three dimensions of the VTP space are tested through a vowel labeling experiment, whose results are visualized as maps on circular slices of the VTP cylinder. We discuss implications for the design of auditory and multi-sensory displays that account for human perceptual capabilities
    • …
    corecore