67 research outputs found

    Truly Sparse Neural Networks at Scale

    Get PDF
    Recently, sparse training methods have started to be established as a de facto approach for training and inference efficiency in artificial neural networks. Yet, this efficiency is just in theory. In practice, everyone uses a binary mask to simulate sparsity since the typical deep learning software and hardware are optimized for dense matrix operations. In this paper, we take an orthogonal approach, and we show that we can train truly sparse neural networks to harvest their full potential. To achieve this goal, we introduce three novel contributions, specially designed for sparse neural networks: (1) a parallel training algorithm and its corresponding sparse implementation from scratch, (2) an activation function with non-trainable parameters to favour the gradient flow, and (3) a hidden neurons importance metric to eliminate redundancies. All in one, we are able to break the record and to train the largest neural network ever trained in terms of representational power -- reaching the bat brain size. The results show that our approach has state-of-the-art performance while opening the path for an environmentally friendly artificial intelligence era.Comment: 30 pages, 17 figure

    Machine Learning Approaches for Energy Forecasting

    Get PDF
    This thesis focuses on the computation of time series predictions of electricity load series and applies different machine learning methods to the task. In this context, established models were tested and a novel forecasting model utilising convolutional neural networks (CNNs) was developed. The progressing digitisation of the energy market increases the availability of consumption data. At the same time, the demand for accurate consumption forecasts is increasing. The forecasts are necessary for accurately planning machine schedules of complex energy systems. Within the scope of this thesis, several forecast models were evaluated. The used data for that were accumulated electricity loads of 15, 40, and 350 residential households. The predictions were calculated for 36 hours into the future with a 30-minute sampling rate. Various linear and non-linear models were trained for that task. The naïve approach of assuming that the electricity consumption of the previous day or week resembles the future consumption can already be used as a rough estimate. Computing the load forecasts with a linear ridge regression based on manually extracted features results in more accurate forecasts. Using a neural network or a random forest model for computing a non-linear regression further increases the forecast accuracy. The most accurate forecasts could be computed using the newly developed CNN model. Another advantage of this model is that it works directly with the load data, hence, no time-consuming manual feature-engineering is necessary

    Machine Learning for Informed Representation Learning

    Get PDF
    The way we view reality and reason about the processes surrounding us is intimately connected to our perception and the representations we form about our observations and experiences. The popularity of machine learning and deep learning techniques in that regard stems from their ability to form useful representations by learning from large sets of observations. Typical application examples include image recognition or language processing for which artificial neural networks are powerful tools to extract regularity patterns or relevant statistics. In this thesis, we leverage and further develop this representation learning capability to address relevant but challenging real-world problems in geoscience and chemistry, to learn representations in an informed manner relevant to the task at hand, and reason about representation learning in neural networks, in general. Firstly, we develop an approach for efficient and scalable semantic segmentation of degraded soil in alpine grasslands in remotely-sensed images based on convolutional neural networks. To this end, we consider different grassland erosion phenomena in several Swiss valleys. We find that we are able to monitor soil degradation consistent with state-of-the-art methods in geoscience and can improve detection of affected areas. Furthermore, our approach provides a scalable method for large-scale analysis which is infeasible with established methods. Secondly, we address the question of how to identify suitable latent representations to enable generation of novel objects with selected properties. For this, we introduce a new deep generative model in the context of manifold learning and disentanglement. Our model improves targeted generation of novel objects by making use of property cycle consistency in property-relevant and property-invariant latent subspaces. We demonstrate the improvements on the generation of molecules with desired physical or chemical properties. Furthermore, we show that our model facilitates interpretability and exploration of the latent representation. Thirdly, in the context of recent advances in deep learning theory and the neural tangent kernel, we empirically investigate the learning of feature representations in standard convolutional neural networks and corresponding random feature models given by the linearisation of the neural networks. We find that performance differences between standard and linearised networks generally increase with the difficulty of the task but decrease with the considered width or over-parametrisation of these networks. Our results indicate interesting implications for feature learning and random feature models as well as the generalisation performance of highly over-parametrised neural networks. In summary, we employ and study feature learning in neural networks and review how we may use informed representation learning for challenging tasks

    Inducing sparsity in deep neural networks through unstructured pruning for lower computational footprint

    Get PDF
    Deep learning has revolutionised the way we deal with media analytics, opening up and improving many fields such as machine language translation, autonomous driver assistant systems, smart cities and medical imaging to only cite a few. But to handle complex decision making, neural networks are getting bigger and bigger resulting in heavy compute loads. This has significant implications for universal accessibility of the technology with high costs, the potential environmental impact of increasing energy consumption and the inability to use the models on low-power devices. A simple way to cut down the size of a neural network is to remove parameters that are not useful to the model prediction. In unstructured pruning, the goal is to remove parameters (ie. set them to 0) based on some importance heuristic while maintaining good prediction accuracy, resulting in a high-performing network with a smaller computational footprint. Many pruning methods seek to find the optimal capacity for which the network is the most compute efficient while reaching better generalisation. The action of inducing sparsity – setting zero-weights – in a neural network greatly contributes to reducing over-parametrisation, lowering the cost for running inference, but also leveraging complexity at training time. Moreover, it can help us better understand what parts of the network account the most for learning, to design more efficient architectures and training procedures. This thesis assesses the integrity of unstructured pruning criteria. After presenting a use-case application for the deployment of an AI application in a real-world setting, this thesis demonstrates that unstructured pruning criteria are ill-defined and not adapted to large scale networks due to the over-parametrisation regime during training, resulting in sparse networks lacking regularisation. Furthermore, beyond solely looking at the performance accuracy, the fairness of different unstructured pruning networks is evaluated highlighting the need to rethink how we design unstructured pruning

    Large Scale Sparse Neural Networks

    Get PDF

    Towards Deep Learning with Competing Generalisation Objectives

    Get PDF
    The unreasonable effectiveness of Deep Learning continues to deliver unprecedented Artificial Intelligence capabilities to billions of people. Growing datasets and technological advances keep extending the reach of expressive model architectures trained through efficient optimisations. Thus, deep learning approaches continue to provide increasingly proficient subroutines for, among others, computer vision and natural interaction through speech and text. Due to their scalable learning and inference priors, higher performance is often gained cost-effectively through largely automatic training. As a result, new and improved capabilities empower more people while the costs of access drop. The arising opportunities and challenges have profoundly influenced research. Quality attributes of scalable software became central desiderata of deep learning paradigms, including reusability, efficiency, robustness and safety. Ongoing research into continual, meta- and robust learning aims to maximise such scalability metrics in addition to multiple generalisation criteria, despite possible conflicts. A significant challenge is to satisfy competing criteria automatically and cost-effectively. In this thesis, we introduce a unifying perspective on learning with competing generalisation objectives and make three additional contributions. When autonomous learning through multi-criteria optimisation is impractical, it is reasonable to ask whether knowledge of appropriate trade-offs could make it simultaneously effective and efficient. Informed by explicit trade-offs of interest to particular applications, we developed and evaluated bespoke model architecture priors. We introduced a novel architecture for sim-to-real transfer of robotic control policies by learning progressively to generalise anew. Competing desiderata of continual learning were balanced through disjoint capacity and hierarchical reuse of previously learnt representations. A new state-of-the-art meta-learning approach is then proposed. We showed that meta-trained hypernetworks efficiently store and flexibly reuse knowledge for new generalisation criteria through few-shot gradient-based optimisation. Finally, we characterised empirical trade-offs between the many desiderata of adversarial robustness and demonstrated a novel defensive capability of implicit neural networks to hinder many attacks simultaneously

    Machine learning for particle identification & deep generative models towards fast simulations for the Alice Transition Radiation Detector at CERN

    Get PDF
    This Masters thesis outlines the application of machine learning techniques, predominantly deep learning techniques, towards certain aspects of particle physics. Its two main aims: particle identification and high energy physics detector simulations are pertinent to research avenues pursued by physicists working with the ALICE (A Large Ion Collider Experiment) Transition Radiation Detector (TRD), within the Large Hadron Collider (LHC) at CERN (The European Organization for Nuclear Research)

    Methods in machine learning for probabilistic modelling of environment, with applications in meteorology and geology

    Get PDF
    Earth scientists increasingly deal with ‘big data’. Where once we may have struggled to obtain a handful of relevant measurements, we now often have data being collected from multiple sources, on the ground, in the air, and from space. These observations are accumulating at a rate that far outpaces our ability to make sense of them using traditional methods with limited scalability (e.g., mental modelling, or trial-and-error improvement of process based models). The revolution in machine learning offers a new paradigm for modelling the environment: rather than focusing on tweaking every aspect of models developed from the top down based largely on prior knowledge, we now have the capability to instead set up more abstract machine learning systems that can ‘do the tweaking for us’ in order to learn models from the bottom up that can be considered optimal in terms of how well they agree with our (rapidly increasing number of) observations of reality, while still being guided by our prior beliefs. In this thesis, with the help of spatial, temporal, and spatio-temporal examples in meteorology and geology, I present methods for probabilistic modelling of environmental variables using machine learning, and explore the considerations involved in developing and adopting these technologies, as well as the potential benefits they stand to bring, which include improved knowledge-acquisition and decision-making. In each application, the common theme is that we would like to learn predictive distributions for the variables of interest that are well-calibrated and as sharp as possible (i.e., to provide answers that are as precise as possible while remaining honest about their uncertainty). Achieving this requires the adoption of statistical approaches, but the volume and complexity of data available mean that scalability is an important factor — we can only realise the value of available data if it can be successfully incorporated into our models.Engineering and Physical Sciences Research Council (EPSRC
    corecore