Search CORE

4,007 research outputs found

Experiment Databases: Creating a New Platform for Meta-Learning Research

Author: Blockeel Hendrik
Holmes Geoffrey
Pfahringer Bernhard
Vanschoren Joaquin
Publication venue: 'University of Porto'
Publication date: 01/01/2008
Field of study

Many studies in machine learning try to investigate what makes an algorithm succeed or fail on certain datasets. However, the field is still evolving relatively quickly, and new algorithms, preprocessing methods, learning tasks and evaluation procedures continue to emerge in the literature. Thus, it is impossible for a single study to cover this expanding space of learning approaches. In this paper, we propose a community-based approach for the analysis of learning algorithms, driven by sharing meta-data from previous experiments in a uniform way. We illustrate how organizing this information in a central database can create a practical public platform for any kind of exploitation of meta-knowledge, allowing effective reuse of previous experimentation and targeted analysis of the collected results

Lirias

CiteSeerX

Research Commons@Waikato

Bounding Optimality Gap in Stochastic Optimization via Bagging: Statistical Efficiency and Stability

Author: Lam Henry
Qian Huajie
Publication venue
Publication date: 05/10/2018
Field of study

We study a statistical method to estimate the optimal value, and the optimality gap of a given solution for stochastic optimization as an assessment of the solution quality. Our approach is based on bootstrap aggregating, or bagging, resampled sample average approximation (SAA). We show how this approach leads to valid statistical confidence bounds for non-smooth optimization. We also demonstrate its statistical efficiency and stability that are especially desirable in limited-data situations, and compare these properties with some existing methods. We present our theory that views SAA as a kernel in an infinite-order symmetric statistic, which can be approximated via bagging. We substantiate our theoretical findings with numerical results

arXiv.org e-Print Archive

Yield Curve Predictability, Regimes, and Macroeconomic Information: A Data-Driven Approach

Author: Francesco Audrino
Kameliya Filipova
Publication venue
Publication date
Field of study

We propose an empirical approach to determine the various economic sources driving the US yield curve. We allow the conditional dynamics of the yield at different maturities to change in reaction to past information coming from several relevant predictor variables. We consider both endogenous, yield curve factors and exogenous, macroeconomic factors as predictors in our model, letting the data themselves choose the most important variables. We find clear, different economic patterns in the local dynamics and regime specification of the yields depending on the maturity. Moreover, we present strong empirical evidence for the accuracy of the model in fitting in-sample and predicting out-of-sample the yield curve in comparison to several alternative approaches.Yield curve modeling and forecasting; Macroeconomic variables; Tree-structured models; Threshold regimes; GARCH; Bagging

Research Papers in Economics

A scale-space approach with wavelets to singularity estimation

Author: Bigot Jérémie
Publication venue: 'EDP Sciences'
Publication date: 01/01/2005
Field of study

This paper is concerned with the problem of determining the typical features of a curve when it is observed with noise. It has been shown that one can characterize the Lipschitz singularities of a signal by following the propagation across scales of the modulus maxima of its continuous wavelet transform. A nonparametric approach, based on appropriate thresholding of the empirical wavelet coefficients, is proposed to estimate the wavelet maxima of a signal observed with noise at various scales. In order to identify the singularities of the unknown signal, we introduce a new tool, "the structural intensity", that computes the "density" of the location of the modulus maxima of a wavelet representation along various scales. This approach is shown to be an effective technique for detecting the significant singularities of a signal corrupted by noise and for removing spurious estimates. The asymptotic properties of the resulting estimators are studied and illustrated by simulations. An application to a real data set is also proposed

CiteSeerX

EDP Sciences OAI-PMH repository (1.2.0)

Open Archive Toulouse Archive Ouverte

Numérisation de Documents Anciens Mathématiques

HAL-INSA Toulouse

Consistency of random forests

Author: Biau Gérard
Scornet Erwan
Vert Jean-Philippe
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/08/2015
Field of study

Random forests are a learning algorithm proposed by Breiman [Mach. Learn. 45 (2001) 5--32] that combines several randomized decision trees and aggregates their predictions by averaging. Despite its wide usage and outstanding practical performance, little is known about the mathematical properties of the procedure. This disparity between theory and practice originates in the difficulty to simultaneously analyze both the randomization process and the highly data-dependent tree structure. In the present paper, we take a step forward in forest exploration by proving a consistency result for Breiman's [Mach. Learn. 45 (2001) 5--32] original algorithm in the context of additive regression models. Our analysis also sheds an interesting light on how random forests can nicely adapt to sparsity. 1. Introduction. Random forests are an ensemble learning method for classification and regression that constructs a number of randomized decision trees during the training phase and predicts by averaging the results. Since its publication in the seminal paper of Breiman (2001), the procedure has become a major data analysis tool, that performs well in practice in comparison with many standard methods. What has greatly contributed to the popularity of forests is the fact that they can be applied to a wide range of prediction problems and have few parameters to tune. Aside from being simple to use, the method is generally recognized for its accuracy and its ability to deal with small sample sizes, high-dimensional feature spaces and complex data structures. The random forest methodology has been successfully involved in many practical problems, including air quality prediction (winning code of the EMC data science global hackathon in 2012, see http://www.kaggle.com/c/dsg-hackathon), chemoinformatics [Svetnik et al. (2003)], ecology [Prasad, Iverson and Liaw (2006), Cutler et al. (2007)], 3

arXiv.org e-Print Archive

HAL-MINES ParisTech

Hal-Diderot

Classification

Author: Witten Ian H.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2009
Field of study

In Classification learning, an algorithm is presented with a set of classified examples or ‘‘instances’’ from which it is expected to infer a way of classifying unseen instances into one of several ‘‘classes’’. Instances have a set of features or ‘‘attributes’’ whose values define that particular instance. Numeric prediction, or ‘‘regression,’’ is a variant of classification learning in which the class attribute is numeric rather than categorical. Classification learning is sometimes called supervised because the method operates under supervision by being provided with the actual outcome for each of the training instances. This contrasts with Data clustering (see entry Data Clustering), where the classes are not given, and with Association learning (see entry Association Learning), which seeks any association – not just one that predicts the class

Research Commons@Waikato