274 research outputs found
Differentiable Programming Tensor Networks
Differentiable programming is a fresh programming paradigm which composes
parameterized algorithmic components and trains them using automatic
differentiation (AD). The concept emerges from deep learning but is not only
limited to training neural networks. We present theory and practice of
programming tensor network algorithms in a fully differentiable way. By
formulating the tensor network algorithm as a computation graph, one can
compute higher order derivatives of the program accurately and efficiently
using AD. We present essential techniques to differentiate through the tensor
networks contractions, including stable AD for tensor decomposition and
efficient backpropagation through fixed point iterations. As a demonstration,
we compute the specific heat of the Ising model directly by taking the second
order derivative of the free energy obtained in the tensor renormalization
group calculation. Next, we perform gradient based variational optimization of
infinite projected entangled pair states for quantum antiferromagnetic
Heisenberg model and obtain start-of-the-art variational energy and
magnetization with moderate efforts. Differentiable programming removes
laborious human efforts in deriving and implementing analytical gradients for
tensor network programs, which opens the door to more innovations in tensor
network algorithms and applications.Comment: Typos corrected, discussion and refs added; revised version accepted
for publication in PRX. Source code available at
https://github.com/wangleiphy/tensorgra
Development of self-adaptive back propagation and derivative free training algorithms in artificial neural networks
Three new iterative, dynamically self-adaptive, derivative-free and training parameter free artificial neural network (ANN) training algorithms are developed. They are defined as self-adaptive back propagation, multi-directional and restart ANN training algorithms. The descent direction in self-adaptive back propagation training is determined implicitly by a central difference approximation scheme, which chooses its step size according to the convergence behavior of the error function. This approach trains an ANN when the gradient information of the corresponding error function is not readily available. The self- adaptive variable learning rates per epoch are determined dynamically using a constrained interpolation search. As a result, appropriate descent to the error function is achieved. The multi-directional training algorithm is self-adaptive and derivative free. It orients an initial search vector in a descent location at the early stage of training. Individual learning rates and momentum term for all the ANN weights are determined optimally. The search directions are derived from rectilinear and Euclidean paths, which explore stiff ridges and valleys of the error surface to improve training. The restart training algorithm is derivative free. It redefines a de-generated simplex at a re-scale phase. This multi-parameter training algorithm updates ANN weights simultaneously instead of individually. The descent directions are derived from the centroid of a simplex along a reflection point opposite to the worst vertex. The algorithm is robust and has the ability to improve local search. These ANN training methods are appropriate when there is discontinuity in corresponding ANN error function or the Hessian matrix is ill conditioned or singular. The convergence properties of the algorithms are proved where possible. All the training algorithms successfully train exclusive OR (XOR), parity, character-recognition and forecasting problems. The simulation results with XOR, parity and character recognition problems suggest that all the training algorithms improve significantly over the standard back propagation algorithm in average number of epoch, function evaluations and terminal function values. The multivariate ANN calibration problem as a regression model with small data set is relatively difficult to train. In forecasting problems, an ANN is trained to extrapolate the data in validation period. The extrapolation results are compared with the actual data. The trained ANN performs better than the statistical regression method in mean absolute deviations; mean squared errors and relative percentage error. The restart training algorithm succeeds in training a problem, where other training algorithms face difficulty. It is shown that a seasonal time series problem possesses a Hessian matrix that has a high condition number. Convergence difficulties as well as slow training are therefore not atypical. The research exploits the geometry of the error surface to identify self-adaptive optimized learning rates and momentum terms. Consequently, the algorithms converge with high success rate. These attributes brand the training algorithms as self-adaptive, automatic, parameter free, efficient and easy to use
Positive Definite Kernels in Machine Learning
This survey is an introduction to positive definite kernels and the set of
methods they have inspired in the machine learning literature, namely kernel
methods. We first discuss some properties of positive definite kernels as well
as reproducing kernel Hibert spaces, the natural extension of the set of
functions associated with a kernel defined
on a space . We discuss at length the construction of kernel
functions that take advantage of well-known statistical models. We provide an
overview of numerous data-analysis methods which take advantage of reproducing
kernel Hilbert spaces and discuss the idea of combining several kernels to
improve the performance on certain tasks. We also provide a short cookbook of
different kernels which are particularly useful for certain data-types such as
images, graphs or speech segments.Comment: draft. corrected a typo in figure
Parallel Multi-Objective Hyperparameter Optimization with Uniform Normalization and Bounded Objectives
Machine learning (ML) methods offer a wide range of configurable
hyperparameters that have a significant influence on their performance. While
accuracy is a commonly used performance objective, in many settings, it is not
sufficient. Optimizing the ML models with respect to multiple objectives such
as accuracy, confidence, fairness, calibration, privacy, latency, and memory
consumption is becoming crucial. To that end, hyperparameter optimization, the
approach to systematically optimize the hyperparameters, which is already
challenging for a single objective, is even more challenging for multiple
objectives. In addition, the differences in objective scales, the failures, and
the presence of outlier values in objectives make the problem even harder. We
propose a multi-objective Bayesian optimization (MoBO) algorithm that addresses
these problems through uniform objective normalization and randomized weights
in scalarization. We increase the efficiency of our approach by imposing
constraints on the objective to avoid exploring unnecessary configurations
(e.g., insufficient accuracy). Finally, we leverage an approach to parallelize
the MoBO which results in a 5x speed-up when using 16x more workers.Comment: Preprint with appendice
Stochastic Metaheuristics as Sampling Techniques using Swarm Intelligence
Optimization problems appear in many fields, as various as identification problems, supervised learning of neural networks, shortest path problems, etc. Metaheuristics [22] are a family of optimization algorithms, often applied to "hard " combinatorial problems for which no more efficient method is known. They have the advantage of being generi
Stage-Aware Learning for Dynamic Treatments
Recent advances in dynamic treatment regimes (DTRs) provide powerful optimal
treatment searching algorithms, which are tailored to individuals' specific
needs and able to maximize their expected clinical benefits. However, existing
algorithms could suffer from insufficient sample size under optimal treatments,
especially for chronic diseases involving long stages of decision-making. To
address these challenges, we propose a novel individualized learning method
which estimates the DTR with a focus on prioritizing alignment between the
observed treatment trajectory and the one obtained by the optimal regime across
decision stages. By relaxing the restriction that the observed trajectory must
be fully aligned with the optimal treatments, our approach substantially
improves the sample efficiency and stability of inverse probability weighted
based methods. In particular, the proposed learning scheme builds a more
general framework which includes the popular outcome weighted learning
framework as a special case of ours. Moreover, we introduce the notion of stage
importance scores along with an attention mechanism to explicitly account for
heterogeneity among decision stages. We establish the theoretical properties of
the proposed approach, including the Fisher consistency and finite-sample
performance bound. Empirically, we evaluate the proposed method in extensive
simulated environments and a real case study for COVID-19 pandemic
- …