220 research outputs found
Representations of molecules and materials for interpolation of quantum-mechanical simulations via machine learning
Computational study of molecules and materials from first principles is a cornerstone of physics, chemistry and materials science, but limited by the cost of accurate and precise simulations. In settings involving many simulations, machine learning can reduce these costs, sometimes by orders of magnitude, by interpolating between reference simulations. This requires representations that describe any molecule or material and support interpolation. We review, discuss and benchmark state-of-the-art representations and relations between them, including smooth overlap of atomic positions, many-body tensor representation, and symmetry functions. For this, we use a unified mathematical framework based on many-body functions, group averaging and tensor products, and compare energy predictions for organic molecules, binary alloys and Al-Ga-In sesquioxides in numerical experiments controlled for data distribution, regression method and hyper-parameter optimization
Fast learning rates in statistical inference through aggregation
We develop minimax optimal risk bounds for the general learning task
consisting in predicting as well as the best function in a reference set
up to the smallest possible additive term, called the convergence
rate. When the reference set is finite and when denotes the size of the
training data, we provide minimax convergence rates of the form
with tight evaluation of the positive
constant and with exact , the latter value depending on the
convexity of the loss function and on the level of noise in the output
distribution. The risk upper bounds are based on a sequential randomized
algorithm, which at each step concentrates on functions having both low risk
and low variance with respect to the previous step prediction function. Our
analysis puts forward the links between the probabilistic and worst-case
viewpoints, and allows to obtain risk bounds unachievable with the standard
statistical learning approach. One of the key ideas of this work is to use
probabilistic inequalities with respect to appropriate (Gibbs) distributions on
the prediction function space instead of using them with respect to the
distribution generating the data. The risk lower bounds are based on
refinements of the Assouad lemma taking particularly into account the
properties of the loss function. Our key example to illustrate the upper and
lower bounds is to consider the -regression setting for which an
exhaustive analysis of the convergence rates is given while ranges in
.Comment: Published in at http://dx.doi.org/10.1214/08-AOS623 the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
Asymptotic and bootstrap properties of rank regressions
The paper develops the bootstrap theory and extends the asymptotic theory of rank estimators, such as the Maximum Rank Correlation Estimator (MRC) of Han (1987), Monotone Rank Estimator (MR) of Cavanagh and Sherman (1998) or Pairwise-Difference Rank Estimators (PDR) of Abrevaya (2003). It is known that under general conditions these estimators have asymptotic normal distributions, but the asymptotic variances are difficult to find. Here we prove that the quantiles and the variances of the asymptotic distributions can be consistently estimated by the nonparametric bootstrap. We investigate the accuracy of inference based on the asymptotic approximation and the bootstrap, and provide bounds on the associated error. In the case of MRC and MR, the bound is a function of the sample size of order close to n^{-1/6}. The PDR estimators belong to a special subclass of rank estimators for which the bound is vanishing with the rate close to n^{-1/2}. The theoretical findings are illustrated with Monte-Carlo experiments and a real data example.Rank Estimators, Bootstrap, M-Estimators, U-Statistics, U-Processes
Locally Weighted Polynomial Regression: Parameter Choice and Application to Forecasts of the Great Salt Lake
Relationships between hydrologic variables are often nonlinear. Usually the functional form of such a relationship is not known a priori. A multivariate, nonparametric regression methodology is provided here for approximating the underlying regression function using locally veighted polynomials. Locally weighted polynomials consider the approximation of the target function through a Taylor series expansion of the function in the neighborhood of the point of estimate. Cross validatory procedures for the selection of the size of the neighborhood over which this approximation should take place, and for the order of the local polynomial to use are provided and shown for some simple situations. The utility of this nonparametric regression approach is demonstrated through an application to nonparametric short term forecasts of the biweekly Great Salt Lake volume. Blind forecasts up to four years in the future using the 1847-1993 time series of the Great Salt Lake are presented
Generalization Through the Lens of Learning Dynamics
A machine learning (ML) system must learn not only to match the output of a
target function on a training set, but also to generalize to novel situations
in order to yield accurate predictions at deployment. In most practical
applications, the user cannot exhaustively enumerate every possible input to
the model; strong generalization performance is therefore crucial to the
development of ML systems which are performant and reliable enough to be
deployed in the real world. While generalization is well-understood
theoretically in a number of hypothesis classes, the impressive generalization
performance of deep neural networks has stymied theoreticians. In deep
reinforcement learning (RL), our understanding of generalization is further
complicated by the conflict between generalization and stability in widely-used
RL algorithms. This thesis will provide insight into generalization by studying
the learning dynamics of deep neural networks in both supervised and
reinforcement learning tasks.Comment: PhD Thesi
- …