Search CORE

1,844 research outputs found

Random model trees: an effective and scalable regression method

Author: Pfahringer Bernhard
Publication venue: University of Waikato, Department of Computer Science
Publication date: 01/06/2010
Field of study

We present and investigate ensembles of randomized model trees as a novel regression method. Such ensembles combine the scalability of tree-based methods with predictive performance rivaling the state of the art in numeric prediction. An extensive empirical investigation shows that Random Model Trees produce predictive performance which is competitive with state-of-the-art methods like Gaussian Processes Regression or Additive Groves of Regression Trees. The training and optimization of Random Model Trees scales better than Gaussian Processes Regression to larger datasets, and enjoys a constant advantage over Additive Groves of the order of one to two orders of magnitude

Research Commons@Waikato

Quasirandom Load Balancing

Author: Lovász L.
Martin Gairing
S.
Subramanian R.
Thomas Sauerwald
Tobias Friedrich
Publication venue: 'Society for Industrial & Applied Mathematics (SIAM)'
Publication date: 01/01/2012
Field of study

We propose a simple distributed algorithm for balancing indivisible tokens on graphs. The algorithm is completely deterministic, though it tries to imitate (and enhance) a random algorithm by keeping the accumulated rounding errors as small as possible. Our new algorithm surprisingly closely approximates the idealized process (where the tokens are divisible) on important network topologies. On d-dimensional torus graphs with n nodes it deviates from the idealized process only by an additive constant. In contrast to that, the randomized rounding approach of Friedrich and Sauerwald (2009) can deviate up to Omega(polylog(n)) and the deterministic algorithm of Rabani, Sinclair and Wanka (1998) has a deviation of Omega(n^{1/d}). This makes our quasirandom algorithm the first known algorithm for this setting which is optimal both in time and achieved smoothness. We further show that also on the hypercube our algorithm has a smaller deviation from the idealized process than the previous algorithms.Comment: 25 page

arXiv.org e-Print Archive

CiteSeerX

Crossref

MPG.PuRe

Fast learning rates in statistical inference through aggregation

Author: Certis Ecole
J. -y. Audibert
Ponts Paris Est
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/01/2006
Field of study

We develop minimax optimal risk bounds for the general learning task consisting in predicting as well as the best function in a reference set

\mathcal{G}

up to the smallest possible additive term, called the convergence rate. When the reference set is finite and when

n

denotes the size of the training data, we provide minimax convergence rates of the form

C(\frac{\log|\mathcal{G}|}{n})^v

with tight evaluation of the positive constant

C

and with exact

0<v\le1

, the latter value depending on the convexity of the loss function and on the level of noise in the output distribution. The risk upper bounds are based on a sequential randomized algorithm, which at each step concentrates on functions having both low risk and low variance with respect to the previous step prediction function. Our analysis puts forward the links between the probabilistic and worst-case viewpoints, and allows to obtain risk bounds unachievable with the standard statistical learning approach. One of the key ideas of this work is to use probabilistic inequalities with respect to appropriate (Gibbs) distributions on the prediction function space instead of using them with respect to the distribution generating the data. The risk lower bounds are based on refinements of the Assouad lemma taking particularly into account the properties of the loss function. Our key example to illustrate the upper and lower bounds is to consider the

L_q

-regression setting for which an exhaustive analysis of the convergence rates is given while

q

ranges in

[1;+\infty[

.Comment: Published in at http://dx.doi.org/10.1214/08-AOS623 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

CiteSeerX

Crossref

HAL-Ecole des Ponts ParisTech

An Emulator for the Lyman-alpha Forest

Author: Bird Simeon
Font-Ribera Andreu
Peiris Hiranya V.
Pontzen Andrew
Rogers Keir K.
Verde Licia
Publication venue: 'IOP Publishing'
Publication date: 26/02/2019
Field of study

We present methods for interpolating between the 1-D flux power spectrum of the Lyman-

\alpha

forest, as output by cosmological hydrodynamic simulations. Interpolation is necessary for cosmological parameter estimation due to the limited number of simulations possible. We construct an emulator for the Lyman-

\alpha

forest flux power spectrum from

21

small simulations using Latin hypercube sampling and Gaussian process interpolation. We show that this emulator has a typical accuracy of 1.5% and a worst-case accuracy of 4%, which compares well to the current statistical error of 3 - 5% at

z < 3

from BOSS DR9. We compare to the previous state of the art, quadratic polynomial interpolation. The Latin hypercube samples the entire volume of parameter space, while quadratic polynomial emulation samples only lower-dimensional subspaces. The Gaussian process provides an estimate of the emulation error and we show using test simulations that this estimate is reasonable. We construct a likelihood function and use it to show that the posterior constraints generated using the emulator are unbiased. We show that our Gaussian process emulator has lower emulation error than quadratic polynomial interpolation and thus produces tighter posterior confidence intervals, which will be essential for future Lyman-

\alpha

surveys such as DESI.Comment: 28 pages, 10 figures, accepted to JCAP with minor change

arXiv.org e-Print Archive

Efficient Localization of Discontinuities in Complex Computational Simulations

Author: Gorodetsky Alex A.
Marzouk Youssef M.
Publication venue
Publication date: 01/01/2014
Field of study

Surrogate models for computational simulations are input-output approximations that allow computationally intensive analyses, such as uncertainty propagation and inference, to be performed efficiently. When a simulation output does not depend smoothly on its inputs, the error and convergence rate of many approximation methods deteriorate substantially. This paper details a method for efficiently localizing discontinuities in the input parameter domain, so that the model output can be approximated as a piecewise smooth function. The approach comprises an initialization phase, which uses polynomial annihilation to assign function values to different regions and thus seed an automated labeling procedure, followed by a refinement phase that adaptively updates a kernel support vector machine representation of the separating surface via active learning. The overall approach avoids structured grids and exploits any available simplicity in the geometry of the separating surface, thus reducing the number of model evaluations required to localize the discontinuity. The method is illustrated on examples of up to eleven dimensions, including algebraic models and ODE/PDE systems, and demonstrates improved scaling and efficiency over other discontinuity localization approaches

arXiv.org e-Print Archive

DSpace@MIT

Crossref