Search CORE

16,184 research outputs found

Artificial neural networks to forecast air pollution

Author: Pasero E. Mesin L.
Publication venue: SCIYO
Publication date: 01/01/2010
Field of study

PORTO Publications Open Repository TOrino

Nonlinear Adaptive Filtering to ForecastAir Pollution

Author: Mesin Luca
Orione Fiammetta
Pasero Eros Gian Alessandro
Publication venue: InTech Open Access
Publication date: 01/01/2011
Field of study

IntechOpen

Crossref

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

PORTO Publications Open Repository TOrino

The Degrees of Freedom of Partial Least Squares Regression

Author: Akaike H.
Brown P.
Krämer N.
Lanczos C.
Leisch F.
Masashi Sugiyama
Nicole Krämer
Publication venue
Publication date: 01/01/2010
Field of study

The derivation of statistical properties for Partial Least Squares regression can be a challenging task. The reason is that the construction of latent components from the predictor variables also depends on the response variable. While this typically leads to good performance and interpretable models in practice, it makes the statistical analysis more involved. In this work, we study the intrinsic complexity of Partial Least Squares Regression. Our contribution is an unbiased estimate of its Degrees of Freedom. It is defined as the trace of the first derivative of the fitted values, seen as a function of the response. We establish two equivalent representations that rely on the close connection of Partial Least Squares to matrix decompositions and Krylov subspace techniques. We show that the Degrees of Freedom depend on the collinearity of the predictor variables: The lower the collinearity is, the higher the Degrees of Freedom are. In particular, they are typically higher than the naive approach that defines the Degrees of Freedom as the number of components. Further, we illustrate how the Degrees of Freedom approach can be used for the comparison of different regression methods. In the experimental section, we show that our Degrees of Freedom estimate in combination with information criteria is useful for model selection.Comment: to appear in the Journal of the American Statistical Associatio

arXiv.org e-Print Archive

CiteSeerX

Crossref

Publications Server of the Weierstrass Institute for Applied Analysis and Stochastics

Repositorium für Naturwissenschaften und Technik

Research Papers in Economics

Fitting Prediction Rule Ensembles with R Package pre

Author: Fokkema Marjolein
Publication venue: 'Foundation for Open Access Statistic'
Publication date: 01/02/2020
Field of study

Prediction rule ensembles (PREs) are sparse collections of rules, offering highly interpretable regression and classification models. This paper presents the R package pre, which derives PREs through the methodology of Friedman and Popescu (2008). The implementation and functionality of package pre is described and illustrated through application on a dataset on the prediction of depression. Furthermore, accuracy and sparsity of PREs is compared with that of single trees, random forest and lasso regression in four benchmark datasets. Results indicate that pre derives ensembles with predictive accuracy comparable to that of random forests, while using a smaller number of variables for prediction

arXiv.org e-Print Archive

Journal of Statistical Software

Leiden University Scholary Publications

Neural network ensembles: Evaluation of aggregation algorithms

Author: Ceccatto H. A.
Granitto P. M.
Verdes P. F.
Publication venue
Publication date: 01/02/2005
Field of study

Ensembles of artificial neural networks show improved generalization capabilities that outperform those of single networks. However, for aggregation to be effective, the individual networks must be as accurate and diverse as possible. An important problem is, then, how to tune the aggregate members in order to have an optimal compromise between these two conflicting conditions. We present here an extensive evaluation of several algorithms for ensemble construction, including new proposals and comparing them with standard methods in the literature. We also discuss a potential problem with sequential aggregation algorithms: the non-frequent but damaging selection through their heuristics of particularly bad ensemble members. We introduce modified algorithms that cope with this problem by allowing individual weighting of aggregate members. Our algorithms and their weighted modifications are favorably tested against other methods in the literature, producing a sensible improvement in performance on most of the standard statistical databases used as benchmarks.Comment: 35 pages, 2 figures, In press AI Journa

arXiv.org e-Print Archive

CiteSeerX

Elsevier - Publisher Connector

Random Forests: some methodological insights

Author: Genuer Robin
Poggi Jean-Michel
Tuleau Christine
Publication venue
Publication date: 01/01/2008
Field of study

This paper examines from an experimental perspective random forests, the increasingly used statistical method for classification and regression problems introduced by Leo Breiman in 2001. It first aims at confirming, known but sparse, advice for using random forests and at proposing some complementary remarks for both standard problems as well as high dimensional ones for which the number of variables hugely exceeds the sample size. But the main contribution of this paper is twofold: to provide some insights about the behavior of the variable importance index based on random forests and in addition, to propose to investigate two classical issues of variable selection. The first one is to find important variables for interpretation and the second one is more restrictive and try to design a good prediction model. The strategy involves a ranking of explanatory variables using the random forests score of importance and a stepwise ascending variable introduction strategy

arXiv.org e-Print Archive

HAL-UNICE

INRIA a CCSD electronic archive server

Variable Selection for Nonparametric Gaussian Process Priors: Models and Computational Strategies

Author: Savitsky Terrance
Sha Naijun
Vannucci Marina
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/01/2011
Field of study

This paper presents a unified treatment of Gaussian process models that extends to data from the exponential dispersion family and to survival data. Our specific interest is in the analysis of data sets with predictors that have an a priori unknown form of possibly nonlinear associations to the response. The modeling approach we describe incorporates Gaussian processes in a generalized linear model framework to obtain a class of nonparametric regression models where the covariance matrix depends on the predictors. We consider, in particular, continuous, categorical and count responses. We also look into models that account for survival outcomes. We explore alternative covariance formulations for the Gaussian process prior and demonstrate the flexibility of the construction. Next, we focus on the important problem of selecting variables from the set of possible predictors and describe a general framework that employs mixture priors. We compare alternative MCMC strategies for posterior inference and achieve a computationally efficient and practical approach. We demonstrate performances on simulated and benchmark data sets.Comment: Published in at http://dx.doi.org/10.1214/11-STS354 the Statistical Science (http://www.imstat.org/sts/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

CiteSeerX

Crossref