405 research outputs found

    Opportunities and risks of stochastic deep learning

    Get PDF
    This thesis studies opportunities and risks associated with stochasticity in deep learning that specifically manifest in the context of adversarial robustness and neural architecture search (NAS). On the one hand, opportunities arise because stochastic methods have a strong impact on robustness and generalisation, both from a theoretical and an empirical standpoint. In addition, they provide a framework for navigating non-differentiable search spaces, and for expressing data and model uncertainty. On the other hand, trade-offs (i.e., risks) that are coupled with these benefits need to be carefully considered. The three novel contributions that comprise the main body of this thesis are, by these standards, instances of opportunities and risks. In the context of adversarial robustness, our first contribution proves that the impact of an adversarial input perturbation on the output of a stochastic neural network (SNN) is theoretically bounded. Specifically, we demonstrate that SNNs are maximally robust when they achieve weight-covariance alignment, i.e., when the vectors of their classifier layer are aligned with the eigenvectors of that layer's covariance matrix. Based on our theoretical insights, we develop a novel SNN architecture with excellent empirical adversarial robustness and show that our theoretical guarantees also hold experimentally. Furthermore, we discover that SNNs partially owe their robustness to having a noisy loss landscape. Gradient-based adversaries find this landscape difficult to ascend during adversarial perturbation search, and therefore fail to create strong adversarial examples. We show that inducing a noisy loss landscape is not an effective defence mechanism, as it is easy to circumvent. To demonstrate that point, we develop a stochastic loss-smoothing extension to state-of-the-art gradient-based adversaries that allows them to attack successfully. Interestingly, our loss-smoothing extension can also (i) be successful against non-stochastic neural networks that defend by altering their loss landscape in different ways, and (ii) strengthen gradient-free adversaries. Our third and final contribution lies in the field of few-shot learning, where we develop a stochastic NAS method for adapting pre-trained neural networks to previously unseen classes, by observing only a few training examples of each new class. We determine that the adaptation of a pre-trained backbone is not as simple as adapting all of its parameters. In fact, adapting or fine-tuning the entire architecture is sub-optimal, as a lot of layers already encode knowledge optimally. Our NAS algorithm searches for the optimal subset of pre-trained parameters to be adapted or fine-tuned, which yields a significant improvement over the existing paradigm for few-shot adaptation

    On the Generation of Realistic and Robust Counterfactual Explanations for Algorithmic Recourse

    Get PDF
    This recent widespread deployment of machine learning algorithms presents many new challenges. Machine learning algorithms are usually opaque and can be particularly difficult to interpret. When humans are involved, algorithmic and automated decisions can negatively impact people’s lives. Therefore, end users would like to be insured against potential harm. One popular way to achieve this is to provide end users access to algorithmic recourse, which gives end users negatively affected by algorithmic decisions the opportunity to reverse unfavorable decisions, e.g., from a loan denial to a loan acceptance. In this thesis, we design recourse algorithms to meet various end user needs. First, we propose methods for the generation of realistic recourses. We use generative models to suggest recourses likely to occur under the data distribution. To this end, we shift the recourse action from the input space to the generative model’s latent space, allowing to generate counterfactuals that lie in regions with data support. Second, we observe that small changes applied to the recourses prescribed to end users likely invalidate the suggested recourse after being nosily implemented in practice. Motivated by this observation, we design methods for the generation of robust recourses and for assessing the robustness of recourse algorithms to data deletion requests. Third, the lack of a commonly used code-base for counterfactual explanation and algorithmic recourse algorithms and the vast array of evaluation measures in literature make it difficult to compare the per formance of different algorithms. To solve this problem, we provide an open source benchmarking library that streamlines the evaluation process and can be used for benchmarking, rapidly developing new methods, and setting up new experiments. In summary, our work contributes to a more reliable interaction of end users and machine learned models by covering fundamental aspects of the recourse process and suggests new solutions towards generating realistic and robust counterfactual explanations for algorithmic recourse

    Tools for efficient Deep Learning

    Get PDF
    In the era of Deep Learning (DL), there is a fast-growing demand for building and deploying Deep Neural Networks (DNNs) on various platforms. This thesis proposes five tools to address the challenges for designing DNNs that are efficient in time, in resources and in power consumption. We first present Aegis and SPGC to address the challenges in improving the memory efficiency of DL training and inference. Aegis makes mixed precision training (MPT) stabler by layer-wise gradient scaling. Empirical experiments show that Aegis can improve MPT accuracy by at most 4\%. SPGC focuses on structured pruning: replacing standard convolution with group convolution (GConv) to avoid irregular sparsity. SPGC formulates GConv pruning as a channel permutation problem and proposes a novel heuristic polynomial-time algorithm. Common DNNs pruned by SPGC have maximally 1\% higher accuracy than prior work. This thesis also addresses the challenges lying in the gap between DNN descriptions and executables by Polygeist for software and POLSCA for hardware. Many novel techniques, e.g. statement splitting and memory partitioning, are explored and used to expand polyhedral optimisation. Polygeist can speed up software execution in sequential and parallel by 2.53 and 9.47 times on Polybench/C. POLSCA achieves 1.5 times speedup over hardware designs directly generated from high-level synthesis on Polybench/C. Moreover, this thesis presents Deacon, a framework that generates FPGA-based DNN accelerators of streaming architectures with advanced pipelining techniques to address the challenges from heterogeneous convolution and residual connections. Deacon provides fine-grained pipelining, graph-level optimisation, and heuristic exploration by graph colouring. Compared with prior designs, Deacon shows resource/power consumption efficiency improvement of 1.2x/3.5x for MobileNets and 1.0x/2.8x for SqueezeNets. All these tools are open source, some of which have already gained public engagement. We believe they can make efficient deep learning applications easier to build and deploy.Open Acces

    Novel neural architectures & algorithms for efficient inference

    Get PDF
    In the last decade, the machine learning universe embraced deep neural networks (DNNs) wholeheartedly with the advent of neural architectures such as recurrent neural networks (RNNs), convolutional neural networks (CNNs), transformers, etc. These models have empowered many applications, such as ChatGPT, Imagen, etc., and have achieved state-of-the-art (SOTA) performance on many vision, speech, and language modeling tasks. However, SOTA performance comes with various issues, such as large model size, compute-intensive training, increased inference latency, higher working memory, etc. This thesis aims at improving the resource efficiency of neural architectures, i.e., significantly reducing the computational, storage, and energy consumption of a DNN without any significant loss in performance. Towards this goal, we explore novel neural architectures as well as training algorithms that allow low-capacity models to achieve near SOTA performance. We divide this thesis into two dimensions: \textit{Efficient Low Complexity Models}, and \textit{Input Hardness Adaptive Models}. Along the first dimension, i.e., \textit{Efficient Low Complexity Models}, we improve DNN performance by addressing instabilities in the existing architectures and training methods. We propose novel neural architectures inspired by ordinary differential equations (ODEs) to reinforce input signals and attend to salient feature regions. In addition, we show that carefully designed training schemes improve the performance of existing neural networks. We divide this exploration into two parts: \textsc{(a) Efficient Low Complexity RNNs.} We improve RNN resource efficiency by addressing poor gradients, noise amplifications, and BPTT training issues. First, we improve RNNs by solving ODEs that eliminate vanishing and exploding gradients during the training. To do so, we present Incremental Recurrent Neural Networks (iRNNs) that keep track of increments in the equilibrium surface. Next, we propose Time Adaptive RNNs that mitigate the noise propagation issue in RNNs by modulating the time constants in the ODE-based transition function. We empirically demonstrate the superiority of ODE-based neural architectures over existing RNNs. Finally, we propose Forward Propagation Through Time (FPTT) algorithm for training RNNs. We show that FPTT yields significant gains compared to the more conventional Backward Propagation Through Time (BPTT) scheme. \textsc{(b) Efficient Low Complexity CNNs.} Next, we improve CNN architectures by reducing their resource usage. They require greater depth to generate high-level features, resulting in computationally expensive models. We design a novel residual block, the Global layer, that constrains the input and output features by approximately solving partial differential equations (PDEs). It yields better receptive fields than traditional convolutional blocks and thus results in shallower networks. Further, we reduce the model footprint by enforcing a novel inductive bias that formulates the output of a residual block as a spatial interpolation between high-compute anchor pixels and low-compute cheaper pixels. This results in spatially interpolated convolutional blocks (SI-CNNs) that have better compute and performance trade-offs. Finally, we propose an algorithm that enforces various distributional constraints during training in order to achieve better generalization. We refer to this scheme as distributionally constrained learning (DCL). In the second dimension, i.e., \textit{Input Hardness Adaptive Models}, we introduce the notion of the hardness of any input relative to any architecture. In the first dimension, a neural network allocates the same resources, such as compute, storage, and working memory, for all the inputs. It inherently assumes that all examples are equally hard for a model. In this dimension, we challenge this assumption using input hardness as our reasoning that some inputs are relatively easy for a network to predict compared to others. Input hardness enables us to create selective classifiers wherein a low-capacity network handles simple inputs while abstaining from a prediction on the complex inputs. Next, we create hybrid models that route the hard inputs from the low-capacity abstaining network to a high-capacity expert model. We design various architectures that adhere to this hybrid inference style. Further, input hardness enables us to selectively distill the knowledge of a high-capacity model into a low-capacity model by cleverly discarding hard inputs during the distillation procedure. Finally, we conclude this thesis by sketching out various interesting future research directions that emerge as an extension of different ideas explored in this work

    Insights on Learning Tractable Probabilistic Graphical Models

    Get PDF

    On discretisation drift and smoothness regularisation in neural network training

    Get PDF
    The deep learning recipe of casting real-world problems as mathematical optimisation and tackling the optimisation by training deep neural networks using gradient-based optimisation has undoubtedly proven to be a fruitful one. The understanding behind why deep learning works, however, has lagged behind its practical significance. We aim to make steps towards an improved understanding of deep learning with a focus on optimisation and model regularisation. We start by investigating gradient descent (GD), a discrete-time algorithm at the basis of most popular deep learning optimisation algorithms. Understanding the dynamics of GD has been hindered by the presence of discretisation drift, the numerical integration error between GD and its often studied continuous-time counterpart, the negative gradient flow (NGF). To add to the toolkit available to study GD, we derive novel continuous-time flows that account for discretisation drift. Unlike the NGF, these new flows can be used to describe learning rate specific behaviours of GD, such as training instabilities observed in supervised learning and two-player games. We then translate insights from continuous time into mitigation strategies for unstable GD dynamics, by constructing novel learning rate schedules and regularisers that do not require additional hyperparameters. Like optimisation, smoothness regularisation is another pillar of deep learning's success with wide use in supervised learning and generative modelling. Despite their individual significance, the interactions between smoothness regularisation and optimisation have yet to be explored. We find that smoothness regularisation affects optimisation across multiple deep learning domains, and that incorporating smoothness regularisation in reinforcement learning leads to a performance boost that can be recovered using adaptions to optimisation methods

    LIPIcs, Volume 261, ICALP 2023, Complete Volume

    Get PDF
    LIPIcs, Volume 261, ICALP 2023, Complete Volum

    Training Feedforward Neural Networks with Bayesian Hyper-Heuristics

    Get PDF
    Dissertation (MSc (Computer Science))--University of Pretoria, 2023.Many different heuristics have been developed and used to train feedforward neural networks (FFNNs). However, selection of the best heuristic to train FFNNs is a time consuming and non-trivial exercise. Careful, systematic selection is required to ensure that the best heuristic is used to train FFNNs. In the past, selection was done by trial and error. A modern approach is to automate the heuristic selection process. Often it is found that a single approach is not sufficient. Research has proposed the use of hybridisation of heuristics. One such approach is referred to as hyper-heuristics (HHs). HHs focus on dynamically finding the best heuristic or combinations of heuristics in heuristic-space by making use of heuristic performance information. One such implementation of a HH is a population-based approach that guides the search process by dynamically selecting heuristics from a heuristic-pool to be applied to different entities that represent candidate solutions to the problem-space, and work together to find good solutions. This dissertation introduces a novel population-based Bayesian hyper-heuristic (BHH). An empirical study is done by using the BHH to train FFNNs. An in-depth behaviour analysis is done and the performance of the BHH is compared to that of ten popular low-level heuristics each with different search behaviours. The chosen heuristic pool consists of classic gradient-based heuristics as well as meta-heuristics. The empirical process is executed on fourteen datasets consisting of classification and regression problems with varying characteristics. Results are analysed for statistical significance and the BHH is shown to be able to train FFNNs well and provide an automated method for finding the best heuristic to train the FFNNs at various stages of the training process.Computer ScienceMSc (Computer Science)Unrestricte

    Predictive Techniques for Scene Understanding by using Deep Learning in Autonomous Driving

    Get PDF
    La conducción autónoma es considerada uno de los más grandes retos tecnológicos de la actualidad. Cuando los coches autónomos conquisten nuestras carreteras, los accidentes se reducirán notablemente, hasta casi desaparecer, ya que la tecnología estará testada y no incumplirá las normas de conducción, entre otros beneficios sociales y económicos. Uno de los aspectos más críticos a la hora de desarrollar un vehículo autónomo es percibir y entender la escena que le rodea. Esta tarea debe ser tan precisa y eficiente como sea posible para posteriormente predecir el futuro de esta misma y ayudar a la toma de decisiones. De esta forma, las acciones tomadas por el vehículo garantizarán tanto la seguridad del vehículo en sí mismo y sus ocupantes, como la de los obstáculos circundantes, tales como viandantes, otros vehículos o infraestructura de la carretera. En ese sentido, esta tesis doctoral se centra en el estudio y desarrollo de distintas técnicas predictivas para el entendimiento de la escena en el contexto de la conducción autónoma. Durante la tesis, se observa una incorporación progresiva de técnicas de aprendizaje profundo en los distintos algoritmos propuestos para mejorar el razonamiento sobre qué está ocurriendo en el escenario de tráfico, así como para modelar las complejas interacciones entre la información social (distintos participantes o agentes del escenario, tales como vehículos, ciclistas o peatones) y física (es decir, la información geométrica, semántica y topológica del mapa de alta definición) presente en la escena. La capa de percepción de un vehículo autónomo se divide modularmente en tres etapas: Detección, Seguimiento (Tracking), y Predicción. Para iniciar el estudio de las etapas de seguimiento y predicción, se propone un algoritmo de Multi-Object Tracking basado en técnicas clásicas de estimación de movimiento y asociación validado en el dataset KITTI, el cual obtiene métricas del estado del arte. Por otra parte, se propone el uso de un filtro inteligente basado en información contextual de mapa, cuyo objetivo es monitorizar los agentes más relevantes de la escena en el tiempo, representando estos agentes filtrados la entrada preliminar para realizar predicciones unimodales basadas en un modelo cinemático. Para validar esta propuesta de filtro inteligente se usa CARLA (CAR Learning to Act), uno de los simuladores hiperrealistas para conducción autónoma más prometedores en la actualidad, comprobando cómo al usar información contextual de mapa se puede reducir notablemente el tiempo de inferencia de un algoritmo de tracking y predicción basados en métodos físicos, prestando atención a los agentes realmente relevantes del escenario de tráfico. Tras observar las limitaciones de un modelo de predicción basado en cinemática para la predicción a largo plazo de un agente, los distintos algoritmos de la tesis se centran en el módulo de predicción, usando los datasets Argoverse 1 y Argoverse 2, donde se asume que los agentes proporcionados en cada escenario de tráfico ya están monitorizados durante un cierto número de observaciones. En primer lugar, se introduce un modelo basado en redes neuronales recurrentes (particularmente redes LSTM, Long-Short Term Memory) y mecanismo de atención para codificar las trayectorias pasadas de los agentes, y una representación simplificada del mapa en forma de posiciones finales potenciales en la carretera para calcular las trayectorias futuras unimodales, todo envuelto en un marco GAN (Generative Adversarial Network), obteniendo métricas similares al estado del arte en el caso unimodal. Una vez validado el modelo anterior en Argoverse 1, se proponen distintos modelos base (sólo social, incorporando mapa, y una mejora final basada en Transformer encoder, redes convolucionales 1D y mecanismo de atención cruzada para la fusión de características) precisos y eficientes basados en el modelo de predicción anterior, introduciendo dos nuevos conceptos. Por un lado, el uso de redes neuronales gráficas (particularmente GCN, Graph Convolutional Network) para codificar de una forma potente las interacciones de los agentes. Por otro lado, se propone el preprocesamiento de trayectorias preliminares a partir de un mapa con un método heurístico. Gracias a estas entradas y una arquitectura más potente de codificación, los modelos base serán capaces de predecir distintas trayectorias futuras multimodales, es decir, cubriendo distintos posibles futuros para el agente de interés. Los modelos base propuestos obtienen métricas de regresión del estado del arte tanto en el caso multimodal como unimodal manteniendo un claro compromiso de eficiencia con respecto a otras propuestas. El modelo final de la tesis, inspirado en los modelos anteriores y validado en el más reciente dataset para algoritmos de predicción en conducción autónoma (Argoverse 2), introduce varias mejoras para entender mejor el escenario de tráfico y decodificar la información de una forma precisa y eficiente. Se propone incorporar información topológica y semántica de los carriles futuros preliminares con el método heurístico antes mencionado, codificación de mapa basada en aprendizaje profundo con redes GCN, ciclo de fusión de características físicas y sociales, estimación de posiciones finales en la carretera y agregación de su entorno circundante con aprendizaje profundo y finalmente módulo de refinado para mejorar la calidad de las predicciones multimodales finales de un modo elegante y eficiente. Comparado con el estado del arte, nuestro método logra métricas de predicción a la par con los métodos mejor posicionados en el Leaderboard de Argoverse 2, reduciendo de forma notable el número de parámetros y operaciones de coma flotante por segundo. Por último, el modelo final de la tesis ha sido validado en simulación en distintas aplicaciones de conducción autónoma. En primer lugar, se integra el modelo para proporcionar predicciones a un algoritmo de toma de decisiones basado en aprendizaje por refuerzo en el simulador SMARTS (Scalable Multi-Agent Reinforcement Learning Training School), observando en los estudios como el vehículo es capaz de tomar mejores decisiones si conoce el comportamiento futuro de la escena y no solo el estado actual o pasado de esta misma. En segundo lugar, se ha realizado un estudio de adaptación de dominio exitoso en el simulador hiperrealista CARLA en distintos escenarios desafiantes donde el entendimiento de la escena y predicción del entorno son muy necesarios, como una autopista o rotonda con gran densidad de tráfico o la aparición de un usuario vulnerable de la carretera de forma repentina. En ese sentido, el modelo de predicción ha sido integrado junto con el resto de capas de la arquitectura de navegación autónoma del grupo de investigación donde se desarrolla la tesis como paso previo a su implementación en un vehículo autónomo real
    corecore