7,760 research outputs found

    Backpropagation Beyond the Gradient

    Get PDF
    Automatic differentiation is a key enabler of deep learning: previously, practitioners were limited to models for which they could manually compute derivatives. Now, they can create sophisticated models with almost no restrictions and train them using first-order, i. e. gradient, information. Popular libraries like PyTorch and TensorFlow compute this gradient efficiently, automatically, and conveniently with a single line of code. Under the hood, reverse-mode automatic differentiation, or gradient backpropagation, powers the gradient computation in these libraries. Their entire design centers around gradient backpropagation. These frameworks are specialized around one specific task—computing the average gradient in a mini-batch. This specialization often complicates the extraction of other information like higher-order statistical moments of the gradient, or higher-order derivatives like the Hessian. It limits practitioners and researchers to methods that rely on the gradient. Arguably, this hampers the field from exploring the potential of higher-order information and there is evidence that focusing solely on the gradient has not lead to significant recent advances in deep learning optimization. To advance algorithmic research and inspire novel ideas, information beyond the batch-averaged gradient must be made available at the same level of computational efficiency, automation, and convenience. This thesis presents approaches to simplify experimentation with rich information beyond the gradient by making it more readily accessible. We present an implementation of these ideas as an extension to the backpropagation procedure in PyTorch. Using this newly accessible information, we demonstrate possible use cases by (i) showing how it can inform our understanding of neural network training by building a diagnostic tool, and (ii) enabling novel methods to efficiently compute and approximate curvature information. First, we extend gradient backpropagation for sequential feedforward models to Hessian backpropagation which enables computing approximate per-layer curvature. This perspective unifies recently proposed block- diagonal curvature approximations. Like gradient backpropagation, the computation of these second-order derivatives is modular, and therefore simple to automate and extend to new operations. Based on the insight that rich information beyond the gradient can be computed efficiently and at the same time, we extend the backpropagation in PyTorch with the BackPACK library. It provides efficient and convenient access to statistical moments of the gradient and approximate curvature information, often at a small overhead compared to computing just the gradient. Next, we showcase the utility of such information to better understand neural network training. We build the Cockpit library that visualizes what is happening inside the model during training through various instruments that rely on BackPACK’s statistics. We show how Cockpit provides a meaningful statistical summary report to the deep learning engineer to identify bugs in their machine learning pipeline, guide hyperparameter tuning, and study deep learning phenomena. Finally, we use BackPACK’s extended automatic differentiation functionality to develop ViViT, an approach to efficiently compute curvature information, in particular curvature noise. It uses the low-rank structure of the generalized Gauss-Newton approximation to the Hessian and addresses shortcomings in existing curvature approximations. Through monitoring curvature noise, we demonstrate how ViViT’s information helps in understanding challenges to make second-order optimization methods work in practice. This work develops new tools to experiment more easily with higher-order information in complex deep learning models. These tools have impacted works on Bayesian applications with Laplace approximations, out-of-distribution generalization, differential privacy, and the design of automatic differentia- tion systems. They constitute one important step towards developing and establishing more efficient deep learning algorithms

    Conversations on Empathy

    Get PDF
    In the aftermath of a global pandemic, amidst new and ongoing wars, genocide, inequality, and staggering ecological collapse, some in the public and political arena have argued that we are in desperate need of greater empathy — be this with our neighbours, refugees, war victims, the vulnerable or disappearing animal and plant species. This interdisciplinary volume asks the crucial questions: How does a better understanding of empathy contribute, if at all, to our understanding of others? How is it implicated in the ways we perceive, understand and constitute others as subjects? Conversations on Empathy examines how empathy might be enacted and experienced either as a way to highlight forms of otherness or, instead, to overcome what might otherwise appear to be irreducible differences. It explores the ways in which empathy enables us to understand, imagine and create sameness and otherness in our everyday intersubjective encounters focusing on a varied range of "radical others" – others who are perceived as being dramatically different from oneself. With a focus on the importance of empathy to understand difference, the book contends that the role of empathy is critical, now more than ever, for thinking about local and global challenges of interconnectedness, care and justice

    LIPIcs, Volume 251, ITCS 2023, Complete Volume

    Get PDF
    LIPIcs, Volume 251, ITCS 2023, Complete Volum

    Optimal Sketching Bounds for Sparse Linear Regression

    Full text link
    We study oblivious sketching for kk-sparse linear regression under various loss functions such as an p\ell_p norm, or from a broad class of hinge-like loss functions, which includes the logistic and ReLU losses. We show that for sparse 2\ell_2 norm regression, there is a distribution over oblivious sketches with Θ(klog(d/k)/ε2)\Theta(k\log(d/k)/\varepsilon^2) rows, which is tight up to a constant factor. This extends to p\ell_p loss with an additional additive O(klog(k/ε)/ε2)O(k\log(k/\varepsilon)/\varepsilon^2) term in the upper bound. This establishes a surprising separation from the related sparse recovery problem, which is an important special case of sparse regression. For this problem, under the 2\ell_2 norm, we observe an upper bound of O(klog(d)/ε+klog(k/ε)/ε2)O(k \log (d)/\varepsilon + k\log(k/\varepsilon)/\varepsilon^2) rows, showing that sparse recovery is strictly easier to sketch than sparse regression. For sparse regression under hinge-like loss functions including sparse logistic and sparse ReLU regression, we give the first known sketching bounds that achieve o(d)o(d) rows showing that O(μ2klog(μnd/ε)/ε2)O(\mu^2 k\log(\mu n d/\varepsilon)/\varepsilon^2) rows suffice, where μ\mu is a natural complexity parameter needed to obtain relative error bounds for these loss functions. We again show that this dimension is tight, up to lower order terms and the dependence on μ\mu. Finally, we show that similar sketching bounds can be achieved for LASSO regression, a popular convex relaxation of sparse regression, where one aims to minimize Axb22+λx1\|Ax-b\|_2^2+\lambda\|x\|_1 over xRdx\in\mathbb{R}^d. We show that sketching dimension O(log(d)/(λε)2)O(\log(d)/(\lambda \varepsilon)^2) suffices and that the dependence on dd and λ\lambda is tight.Comment: AISTATS 202

    Quantum simulation of battery materials using ionic pseudopotentials

    Full text link
    Ionic pseudopotentials are widely used in classical simulations of materials to model the effective potential due to the nucleus and the core electrons. Modeling fewer electrons explicitly results in a reduction in the number of plane waves needed to accurately represent the states of a system. In this work, we introduce a quantum algorithm that uses pseudopotentials to reduce the cost of simulating periodic materials on a quantum computer. We use a qubitization-based quantum phase estimation algorithm that employs a first-quantization representation of the Hamiltonian in a plane-wave basis. We address the challenge of incorporating the complexity of pseudopotentials into quantum simulations by developing highly-optimized compilation strategies for the qubitization of the Hamiltonian. This includes a linear combination of unitaries decomposition that leverages the form of separable pseudopotentials. Our strategies make use of quantum read-only memory subroutines as a more efficient alternative to quantum arithmetic. We estimate the computational cost of applying our algorithm to simulating lithium-excess cathode materials for batteries, where more accurate simulations are needed to inform strategies for gaining reversible access to the excess capacity they offer. We estimate the number of qubits and Toffoli gates required to perform sufficiently accurate simulations with our algorithm for three materials: lithium manganese oxide, lithium nickel-manganese oxide, and lithium manganese oxyfluoride. Our optimized compilation strategies result in a pseudopotential-based quantum algorithm with a total runtime four orders of magnitude lower than the previous state of the art for a fixed target accuracy

    Contactless excitation for electric machines: high temperature superconducting flux pumps

    Get PDF
    With the intensification of global warming and climate change, the pace of transformation to a neutral-emission society is accelerating. In various sectors, electrification has become the absolute tendency to promote such a movement, where electric machines play an important role in the current power generation system. It is widely convinced that electric machines with very high power density are essential for future applications, which, however, can be hardly achieved by conventional technologies. Owing to the maturation of the second generation (2G) high temperature superconducting (HTS) technologies, it has been recognized that superconducting machine could be a competitive candidate to realize the vision. One significant obstacle that hinders the implementation of superconducting machines is how to provide the required magnetic fields, or in other words, how to energise them appropriately. Conventional direct injection is not suitable for HTS machines, because the current leads would bridge ambident temperature to the cryogenic environment, which can impose considerable heat load on the system and increase the operational cost. Thus, an efficient energisation method is demanded by HTS machines. As an emerging technology that can accumulate substantial flux in a closed loop without any physical contact, HTS flux pumps have been proposed as a promising solution. Among the existing developed HTS flux pumps, rotary HTS flux pumps, or so-called HTS dynamo, can output non-zero time-averaged DC voltage and charge the rest of the circuit if a closed loop has been formed. This type of flux pump is often employed together with HTS coils, where the HTS coils can potentially work in the persistent current mode, and act like electromagnets with a considerable magnetic field, having a wide range of applications in industry. The output characteristics of rotary HTS flux pumps have been extensively explored through experiments and finite element method (FEM) simulations, yet the work on constructing statistical models as an alternative approach to capture key characteristics has not been studied. In this thesis, a 2D FEM program has been developed to model the operation of rotary HTS flux pumps and evaluate the effects of different factors on the output voltage through parameter sweeping and analysis of variance. Typical design considerations, including the operating frequency, air gap, HTS tape width, and remanent flux density have been investigated, in particular, the bilateral effect of HTS tape width has been discovered and explained by looking at the averaged integration of the electric field over the HTS tape. Based on the data obtained from various simulations, regression analysis has been conducted through a collection of machine learning methods. It has been demonstrated that the output voltage of a rotary HTS flux pump can be obtained promptly with satisfactory accuracy via Gaussian process regression, aiming to provide a novel approach for future research and a powerful design tool for industrial applications using rotary HTS flux pumps. To enhance the applicability of the proposed statistical models, an updated FEM program has been built to take more parameters into account. The newly added parameters, namely the rotor radius and the width of permanent magnet, together with formerly included ones, should have covered all the key design parameters for a rotary HTS flux pump. Based on data collected from the FEM model, a well-trained semi-deep neural network (DNN) model with a back-propagation algorithm has been put forward and validated. The proposed DNN model is capable of quantifying the output voltage of a rotary HTS flux pump instantly with an overall accuracy of 98% with respect to the simulated values with all design parameters explicitly specified. The model possesses a powerful ability to characterize the output behaviour of rotary HTS flux pumps by integrating all design parameters, and the output characteristics of rotary HTS flux pumps have been successfully demonstrated and visualized using this model. Compared to conventional time-consuming FEM-based numerical models, the proposed DNN model has the advantages of fast learning, accurate computation, as well as strong programmability. Therefore, the DNN model can greatly facilitate the design and optimization process for rotary HTS flux pumps. An executable application has been developed accordingly based on the DNN model, which is believed to provide a useful tool for learners and designers of rotary HTS flux pumps. A new variant inspired by the working principles of rotary HTS flux pumps has been proposed and termed as stationary wave HTS flux pumps. The superiority of this type is that it has a simple structure without any moving components, and it utilises a controllable current-driven electromagnet to provide the required magnetic field. It has been demonstrated that the origin of the output voltage is determined by the asymmetric distribution of the dynamic resistance in the HTS tape, for which the electromagnet must be placed at such a position that its central line is not aligned with that of the HTS tape. A numerical model has been built to simulate the operation of a stationary wave HTS flux pump, based on which the output characteristics and dynamic resistance against various parameters have been investigated. Besides, accurate and reliable statistical models have been proposed to predict the open circuit voltage and effective dynamic resistance by adapting the previously developed machine learning techniques. The work presented in this PhD thesis can bring more insight into HTS flux pumps as an emerging promising contactless energisation technology, and the proposed statistical models can be particularly useful for the design and optimization of such devices

    Comparing the Performance of Different Machine Learning Models in the Evaluation of Solder Joint Fatigue Life Under Thermal Cycling

    Get PDF
    Predicting the reliability of board-level solder joints is a challenging process for the designer because the fatigue life of solder is influenced by a large variety of design parameters and many nonlinear, coupled phenomena. Machine learning has shown promise as a way of predicting the fatigue life of board-level solder joints. In the present work, the performance of various machine learning models to predict the fatigue life of board-level solder joints is discussed. Experimental data from many different solder joint thermal fatigue tests are used to train the different machine learning models. A web-based database for storing, sharing, and uploading data related to the performance of electronics materials, the Electronics Packaging Materials Database (EPMD), has been developed and used to store and serve the training data for the present work. Data regression is performed using artificial neural networks, random forests, gradient boosting, extreme gradient boosting (XGBoost), and adaptive boosting with neural networks (AdaBoost). While previous works have studied artificial neural networks as a way to predict the fatigue life of board-level solder joints, the results in this paper suggest that machine learning techniques based on regression trees may also be useful in predicting the fatigue life of board-level solder joints. This paper also demonstrates the need for a large collection of curated data related to board-level solder joint reliability, and presents the Electronics Packaging Materials Database to meet that need

    Open problems in deformations of Artinian algebras, Hilbert schemes and around

    Full text link
    We review the open problems in the theory of deformations of zero-dimensional objects, such as algebras, modules or tensors. We list both the well-known ones and some new ones that emerge from applications. In view of many advances in recent years, we can hope that all of them are in the range of current methods

    An Independent Timing Analysis for Credit-Based Shaping in Ethernet TSN

    Get PDF
    corecore