1,246 research outputs found
A Smoothed-Distribution Form of Nadaraya-Watson Estimation
Given observation-pairs (xi ,yi ), i = 1,...,n , taken to be independent observations of the random pair (X ,Y), we sometimes want to form a nonparametric estimate of m(x) = E(Y/ X = x). Let YE have the empirical distribution of the yi , and let (XS ,YS ) have the kernel-smoothed distribution of the (xi ,yi ). Then the standard estimator, the Nadaraya-Watson form mNW(x) can be interpreted as E(YE?XS = x). The smoothed-distribution estimator ms (x)=E(YS/XS = x) is a more general form than mNW (x) and often has better properties. Similar considerations apply to estimating Var(Y/X = x), and to local polynomial estimation. The discussion generalizes to vector (xi ,yi ).nonparametric regression, Nadaraya-Watson, kernel density, conditional expectation estimator, conditional variance estimator, local polynomial estimator
boosting in kernel regression
In this paper, we investigate the theoretical and empirical properties of
boosting with kernel regression estimates as weak learners. We show that
each step of boosting reduces the bias of the estimate by two orders of
magnitude, while it does not deteriorate the order of the variance. We
illustrate the theoretical findings by some simulated examples. Also, we
demonstrate that boosting is superior to the use of higher-order kernels,
which is a well-known method of reducing the bias of the kernel estimate.Comment: Published in at http://dx.doi.org/10.3150/08-BEJ160 the Bernoulli
(http://isi.cbs.nl/bernoulli/) by the International Statistical
Institute/Bernoulli Society (http://isi.cbs.nl/BS/bshome.htm
Nonparametric estimation of diffusion process: a closer look
A Monte Carlo simulation is performed to investigate the finite sample properties of a nonparametric estimator, based on discretely sampled observations of continuous-time Ito diffusion process. Chapman and Pearson (2000) studies finite-sample properties of the nonparametric estimator of Aýt-Sahalia (1996) and Stanton (1997) and they find that nonlinearity of the short rate drift is not a robust stylized fact but it’s an artifacts of the estimation procedure. This paper examine the finite sample properties of a different nonparametric estimator within the Stanton (1997)’s framework.ewp-mac/050417
On boosting kernel regression
In this paper we propose a simple multistep regression smoother which is constructed in an iterative manner, by learning the Nadaraya-Watson estimator with L-2 boosting. We find, in both theoretical analysis and simulation experiments, that the bias converges exponentially fast. and the variance diverges exponentially slow. The first boosting step is analysed in more detail, giving asymptotic expressions as functions of the smoothing parameter, and relationships with previous work are explored. Practical performance is illustrated by both simulated and real data
Nonparametric Density and Regression Estimation for Samples of Very Large Size
Programa Oficial de Doutoramento en Estatística e Investigación Operativa. 555V01[Abstract]
This dissertation mainly deals with the problem of bandwidth selection in the context
of nonparametric density and regression estimation for samples of very large
size. Some bandwidth selection methods have the disadvantage of high computational
complexity. This implies that the number of operations required to compute
the bandwidth grows very rapidly as the sample size increases, so that the computational
cost associated with these algorithms makes them unsuitable for samples of
very large size. In the present thesis, this problem is addressed through the use of
subagging, an ensemble method that combines bootstrap aggregating or bagging with
the use of subsampling. The latter reduces the computational cost associated with
the process of bandwidth selection, while the former is aimed at achieving signi cant
reductions in the variability of the bandwidth selector. Thus, subagging versions
are proposed for bandwidth selection methods based on widely known criteria such
as cross-validation or bootstrap. When applying subagging to the cross-validation
bandwidth selector, both for the Parzen{Rosenblatt estimator and the Nadaraya{
Watson estimator, the proposed selectors are studied and their asymptotic properties
derived. The empirical behavior of all the proposed bandwidth selectors is shown
through various simulation studies and applications to real datasets.[Resumen]
Esta disertación aborda principalmente el problema de la selección de la ventana en
el contexto de la estimación no paramétrica de la densidad y de la regresión para
muestras de gran tamaño. Algunos métodos de selección de la ventana tienen el
inconveniente de contar con una elevada complejidad computacional. Esto implica
que el número de operaciones necesarias para el cálculo de la ventana crece muy
rápidamente a medida que el tamaño muestral aumenta, de manera que el coste
computacional asociado a estos algoritmos los hace inadecuados para muestras de
gran tamaño. En la presente tesis, este problema se aborda mediante el uso del subagging,
un método de aprendizaje conjunto que combina el bootstrap aggregating o
bagging con el uso de submuestreo. Este ultimo reduce el coste computacional asociado
al proceso de selección de la ventana, mientras que el primero tiene como objetivo
conseguir reducciones signi cativas en la variabilidad del selector de la ventana. Así,
se proponen versiones subagging para métodos de selección de la ventana basados
en criterios ampliamente conocidos, como la validación cruzada o el bootstrap. Al
aplicar subagging al selector de la ventana de tipo validación cruzada, tanto para el
estimador de Parzen{Rosenblatt como para el estimador de Nadaraya{Watson, se
estudian los selectores propuestos y se derivan sus propiedades asintóticas. El comportamiento
empírico de todos los selectores de la ventana propuestos se muestra
mediante varios estudios de simulación y aplicaciones a conjuntos de datos reales[Resumo]
Esta disertación aborda o problema da selección da ventá no contexto da estimación
non paramétrica da densidade e da regresión para mostras de gran tamaño. Algúns
métodos de selección da ventá teñen o inconveniente de contar cunha alta complexidade
computacional. Isto implica que o número de operacións necesarias para o
cálculo da ventá crece moi rápidamente a medida que aumenta o tamaño muestral,
polo que o coste computacional asociado a estes algoritmos fainos inadecuados para
mostras de gran tamaño. Na presente tese, este problema abórdase mediante o uso
do subagging, un método de aprendizaxe conxunta que combina o bootstrap aggregating
ou bagging co uso de submostraxe. Este último reduce o custo computacional
asociado ao proceso de selección da ventá, mentres que o primeiro ten como obxectivo
conseguir reducións signi cativas na variabilidade do selector da ventá. Así,
propóñense versións subagging para métodos de selección da ventá baseados en criterios
amplamente coñecidos, como a validación cruzada ou o bootstrap. Ao aplicar
subagging ao selector da ventá de tipo validación cruzada, tanto para o estimador
de Parzen{Rosenblatt como para o estimador de Nadaraya{Watson, estúrdanse os
selectores propostos e derívanse as súas propiedades asintóticas. O comportamento
empírico de todos os selectores da ventá propostos mostrase mediante varios estudos
de simulación e aplicacións a conxuntos de datos reais.This research has been supported by MINECO Grant MTM2017-82724-R, and by the
Xunta de Galicia (Grupos de Referencia Competitiva ED431C-2016-015, ED431C-
2020-14, Centro Singular de Investigación de Galicia ED431G/01 and Centro de
Investigación del Sistema Universitario de Galicia ED431G 2019/01), all of them
through the ERDF (European Regional Development Fund). Additionally, this work
has been partially carried out during a visit to the Texas A&M University, College
Station, financed by INDITEX, with reference INDITEX-UDC 2019.
The author is grateful to the Centro de Coordinación de Alertas y Emergencias
Sanitarias for kindly providing the COVID-19 hospitalization dataset.Xunta de Galicia; ED431C-2016-015Xunta de Galicia; ED431C-2020-14Xunta de Galicia; ED431G/01Xunta de Galicia; ED431G 2019/0
A new Kernel Regression approach for Robustified Boosting
We investigate boosting in the context of kernel regression. Kernel
smoothers, in general, lack appealing traits like symmetry and positive
definiteness, which are critical not only for understanding theoretical aspects
but also for achieving good practical performance. We consider a
projection-based smoother (Huang and Chen, 2008) that is symmetric, positive
definite, and shrinking. Theoretical results based on the orthonormal
decomposition of the smoother reveal additional insights into the boosting
algorithm. In our asymptotic framework, we may replace the full-rank smoother
with a low-rank approximation. We demonstrate that the smoother's low-rank
() is bounded above by , where is the bandwidth. Our
numerical findings show that, in terms of prediction accuracy, low-rank
smoothers may outperform full-rank smoothers. Furthermore, we show that the
boosting estimator with low-rank smoother achieves the optimal convergence
rate. Finally, to improve the performance of the boosting algorithm in the
presence of outliers, we propose a novel robustified boosting algorithm which
can be used with any smoother discussed in the study. We investigate the
numerical performance of the proposed approaches using simulations and a
real-world case
- …