274 research outputs found

    Differentiable Programming Tensor Networks

    Full text link
    Differentiable programming is a fresh programming paradigm which composes parameterized algorithmic components and trains them using automatic differentiation (AD). The concept emerges from deep learning but is not only limited to training neural networks. We present theory and practice of programming tensor network algorithms in a fully differentiable way. By formulating the tensor network algorithm as a computation graph, one can compute higher order derivatives of the program accurately and efficiently using AD. We present essential techniques to differentiate through the tensor networks contractions, including stable AD for tensor decomposition and efficient backpropagation through fixed point iterations. As a demonstration, we compute the specific heat of the Ising model directly by taking the second order derivative of the free energy obtained in the tensor renormalization group calculation. Next, we perform gradient based variational optimization of infinite projected entangled pair states for quantum antiferromagnetic Heisenberg model and obtain start-of-the-art variational energy and magnetization with moderate efforts. Differentiable programming removes laborious human efforts in deriving and implementing analytical gradients for tensor network programs, which opens the door to more innovations in tensor network algorithms and applications.Comment: Typos corrected, discussion and refs added; revised version accepted for publication in PRX. Source code available at https://github.com/wangleiphy/tensorgra

    Development of self-adaptive back propagation and derivative free training algorithms in artificial neural networks

    Get PDF
    Three new iterative, dynamically self-adaptive, derivative-free and training parameter free artificial neural network (ANN) training algorithms are developed. They are defined as self-adaptive back propagation, multi-directional and restart ANN training algorithms. The descent direction in self-adaptive back propagation training is determined implicitly by a central difference approximation scheme, which chooses its step size according to the convergence behavior of the error function. This approach trains an ANN when the gradient information of the corresponding error function is not readily available. The self- adaptive variable learning rates per epoch are determined dynamically using a constrained interpolation search. As a result, appropriate descent to the error function is achieved. The multi-directional training algorithm is self-adaptive and derivative free. It orients an initial search vector in a descent location at the early stage of training. Individual learning rates and momentum term for all the ANN weights are determined optimally. The search directions are derived from rectilinear and Euclidean paths, which explore stiff ridges and valleys of the error surface to improve training. The restart training algorithm is derivative free. It redefines a de-generated simplex at a re-scale phase. This multi-parameter training algorithm updates ANN weights simultaneously instead of individually. The descent directions are derived from the centroid of a simplex along a reflection point opposite to the worst vertex. The algorithm is robust and has the ability to improve local search. These ANN training methods are appropriate when there is discontinuity in corresponding ANN error function or the Hessian matrix is ill conditioned or singular. The convergence properties of the algorithms are proved where possible. All the training algorithms successfully train exclusive OR (XOR), parity, character-recognition and forecasting problems. The simulation results with XOR, parity and character recognition problems suggest that all the training algorithms improve significantly over the standard back propagation algorithm in average number of epoch, function evaluations and terminal function values. The multivariate ANN calibration problem as a regression model with small data set is relatively difficult to train. In forecasting problems, an ANN is trained to extrapolate the data in validation period. The extrapolation results are compared with the actual data. The trained ANN performs better than the statistical regression method in mean absolute deviations; mean squared errors and relative percentage error. The restart training algorithm succeeds in training a problem, where other training algorithms face difficulty. It is shown that a seasonal time series problem possesses a Hessian matrix that has a high condition number. Convergence difficulties as well as slow training are therefore not atypical. The research exploits the geometry of the error surface to identify self-adaptive optimized learning rates and momentum terms. Consequently, the algorithms converge with high success rate. These attributes brand the training algorithms as self-adaptive, automatic, parameter free, efficient and easy to use

    Positive Definite Kernels in Machine Learning

    Full text link
    This survey is an introduction to positive definite kernels and the set of methods they have inspired in the machine learning literature, namely kernel methods. We first discuss some properties of positive definite kernels as well as reproducing kernel Hibert spaces, the natural extension of the set of functions {k(x,⋅),x∈X}\{k(x,\cdot),x\in\mathcal{X}\} associated with a kernel kk defined on a space X\mathcal{X}. We discuss at length the construction of kernel functions that take advantage of well-known statistical models. We provide an overview of numerous data-analysis methods which take advantage of reproducing kernel Hilbert spaces and discuss the idea of combining several kernels to improve the performance on certain tasks. We also provide a short cookbook of different kernels which are particularly useful for certain data-types such as images, graphs or speech segments.Comment: draft. corrected a typo in figure

    Parallel Multi-Objective Hyperparameter Optimization with Uniform Normalization and Bounded Objectives

    Full text link
    Machine learning (ML) methods offer a wide range of configurable hyperparameters that have a significant influence on their performance. While accuracy is a commonly used performance objective, in many settings, it is not sufficient. Optimizing the ML models with respect to multiple objectives such as accuracy, confidence, fairness, calibration, privacy, latency, and memory consumption is becoming crucial. To that end, hyperparameter optimization, the approach to systematically optimize the hyperparameters, which is already challenging for a single objective, is even more challenging for multiple objectives. In addition, the differences in objective scales, the failures, and the presence of outlier values in objectives make the problem even harder. We propose a multi-objective Bayesian optimization (MoBO) algorithm that addresses these problems through uniform objective normalization and randomized weights in scalarization. We increase the efficiency of our approach by imposing constraints on the objective to avoid exploring unnecessary configurations (e.g., insufficient accuracy). Finally, we leverage an approach to parallelize the MoBO which results in a 5x speed-up when using 16x more workers.Comment: Preprint with appendice

    Stochastic Metaheuristics as Sampling Techniques using Swarm Intelligence

    Get PDF
    Optimization problems appear in many fields, as various as identification problems, supervised learning of neural networks, shortest path problems, etc. Metaheuristics [22] are a family of optimization algorithms, often applied to "hard " combinatorial problems for which no more efficient method is known. They have the advantage of being generi

    Stage-Aware Learning for Dynamic Treatments

    Full text link
    Recent advances in dynamic treatment regimes (DTRs) provide powerful optimal treatment searching algorithms, which are tailored to individuals' specific needs and able to maximize their expected clinical benefits. However, existing algorithms could suffer from insufficient sample size under optimal treatments, especially for chronic diseases involving long stages of decision-making. To address these challenges, we propose a novel individualized learning method which estimates the DTR with a focus on prioritizing alignment between the observed treatment trajectory and the one obtained by the optimal regime across decision stages. By relaxing the restriction that the observed trajectory must be fully aligned with the optimal treatments, our approach substantially improves the sample efficiency and stability of inverse probability weighted based methods. In particular, the proposed learning scheme builds a more general framework which includes the popular outcome weighted learning framework as a special case of ours. Moreover, we introduce the notion of stage importance scores along with an attention mechanism to explicitly account for heterogeneity among decision stages. We establish the theoretical properties of the proposed approach, including the Fisher consistency and finite-sample performance bound. Empirically, we evaluate the proposed method in extensive simulated environments and a real case study for COVID-19 pandemic
    • …
    corecore