University of Rochester School of Medicine and Dentistry
Abstract
Thesis (Ph.D.)--University of Rochester. School of Medicine & Dentistry. Dept. of Biostatistics & Computational Biology, 2017.Nonparametrically estimating a regression function with varying degrees of smoothness
or heteroscedasticity can benefit from a smoother that uses a data-adaptive
smoothing parameter function to efficiently capture the local features. Leave-one-out
cross-validation (LOO CV) has been used to select global smoothing parameters, as
it is expected to estimate the true mean integrated squared error (MISE), but it
often leads to undersmoothing in cases with sharp changes in smoothness and heteroscedasticity.
Oracle simulations show that simply moving from a globally-chosen
to a locally-chosen smoothing parameter yields a reduction in MISE. We explore
LOO CV as a method of estimating the mean squared error as a function of the point
of estimation, MSE(x), in order to estimate a smoothing parameter function. We
identify a relationship between the Squared Leave-One-Out cross-validated Residuals
(SLOORs) and MSE(x) for general linear smoothers. We use this identity to estimate
MSE(x) and obtain improved smoothing parameter function estimates.
This proposal presents a portfolio of smoothers based on local polynomials and
natural cubic smoothing splines that estimate and use a data-adaptive smoothing parameter
function by employing Local Cross-Validation (LCV). Data is locally weighted
by a proposed truncated gaussian kernel function with sample-size adaptive truncation
thresholds. The proposed Local Cross-Validated Polynomial smoothing algorithm
(LCVPoly) estimates and uses an adaptive bandwidth function for any specified polynomial order. LCVPoly can further select the preferred global polynomial
order and adaptive orders are explored to permit greater flexibility. The relationship
of the variance function estimation problem to the mean function estimation problem
is evident in the SLOOR-MSE identity. These methods only require specification of
bandwidth bounds and polynomial orders. Available methods intended to handle underlying
functions of varying smoothness are reviewed as competitors to our proposed
algorithms.
While local polynomials use both bandwidth and polynomial order to control
smoothness, smoothing splines use a single smoothing parameter. Because of this, we
propose a single version of our Local Cross-Validated Spline (LCVSpline) smoothing
algorithm to estimate and use an adaptive degree-of-freedom function. As smoothing
splines are linear smoothers, the SLOOR-MSE relationship holds here as well and we
can use the result for degree-of-freedom function estimation.
Electrocardiograms (ECGs) measured over a 24-hour period are heteroscedastic
and can be very noisy, which can mask short-term cardiovascular events of interest.
This type of data can benefit from a smoother that can pick up both short-term
events and long-term changes while appropriately smoothing out the noise. Current
techniques to smooth ECG data use a moving median smoother with no guide on the
size of the moving window. We show how our proposed methods and other available
methods perform on a dataset of over 80,000 heart inter-beat intervals. In addition
to this data, we also employ our methods on the well-known motorcycle acceleration
data set typically used to demonstrate spatially adaptive smoothers