Search CORE

14 research outputs found

Applied Nonparametric Methods

Author: Hardle W.
Publication venue
Publication date
Field of study

Locally Weighted Polynomial Regression: Parameter Choice and Application to Forecasts of the Great Salt Lake

Author: Bosworth Ken
Lall Upmanu
Moon Young-II
Publication venue: Hosted by Utah State University Libraries
Publication date: 01/01/1995
Field of study

Relationships between hydrologic variables are often nonlinear. Usually the functional form of such a relationship is not known a priori. A multivariate, nonparametric regression methodology is provided here for approximating the underlying regression function using locally veighted polynomials. Locally weighted polynomials consider the approximation of the target function through a Taylor series expansion of the function in the neighborhood of the point of estimate. Cross validatory procedures for the selection of the size of the neighborhood over which this approximation should take place, and for the order of the local polynomial to use are provided and shown for some simple situations. The utility of this nonparametric regression approach is demonstrated through an application to nonparametric short term forecasts of the biweekly Great Salt Lake volume. Blind forecasts up to four years in the future using the 1847-1993 time series of the Great Salt Lake are presented

DigitalCommons@USU

Recommended from our members

Geometrically designed, variable know regression splines: asymptotics and inference

Author: Dimitrova D. S.
Haberman S.
Kaishev V. K.
Verrall R. J.
Publication venue: Faculty of Actuarial Science & Insurance, City University London
Publication date: 01/01/2006
Field of study

A new method for Computer Aided Geometric Design of least squares (LS) splines with variable knots, named GeDS, is presented. It is based on the property that the spline regression function, viewed as a parametric curve, has a control polygon and, due to the shape preserving and convex hull properties, closely follows the shape of this control polygon. The latter has vertices, whose x-coordinates are certain knot averages, known as the Greville sites and whose y-coordinates are the regression coefficients. Thus, manipulation of the position of the control polygon and hence of the spline curve may be interpreted as estimation of its knots and coefficients. These geometric ideas are implemented in the two stages of the GeDS estimation method. In stage A, a linear LS spline fit to the data is constructed, and viewed as the initial position of the control polygon of a higher order (n > 2) smooth spline curve. In stage B, the optimal set of knots of this higher order spline curve is found, so that its control polygon is as close to the initial polygon of stage A as possible and finally, the LS estimates of the regression coefficients of this curve are found. To implement stage A, an automatic adaptive knot location scheme for generating linear spline fits is developed. At each step of stage A, a knot is placed where a certain bias dominated measure is maximal. This stage is equipped with a novel stopping rule which serves as a model selector. The optimal knots defined in stage B ensure that the higher order spline curve is nearly a variation diminishing (i.e., shape preserving) spline approximation to the linear fit of stage A. Error bounds for this approximation are derived in Kaishev et al. (2006). The GeDS method produces simultaneously linear, quadratic, cubic (and possibly higher order) spline fits with one and the same number of B-spline regression functions. Large sample properties of the GeDS estimator are also explored, and asymptotic normality is established. Asymptotic conditions on the rate of growth of the knots with the increase of the sample size, which ensure that the bias is of negligible magnitude compared to the variance of the GeD estimator, are given. Based on these results, pointwise asymptotic confidence intervals with GeDS are also constructed and shown to converge to the nominal coverage probability level for a reasonable number of knots and sample sizes

City Research Online

Applied Nonparametric Methods

Author: Härdle W.K.
Publication venue: 'WARC Limited'
Publication date: 01/01/1992
Field of study

Tilburg University Repository

Modelling HIV/AIDS epidemic in Nigeria

Author: Eze Jude Ikechukwu
Publication venue
Publication date: 01/02/2009
Field of study

Nigeria is one of the countries most affected by the HIV/AIDS pandemic, third only to India and South Africa. With about 10% of the global HIV/AIDS cases estimated to be in the country, the public health and socio-economic implications are enormous. This thesis has two broad aims: the first is to develop statistical models which adequately describe the spatial distribution of the Nigerian HIV/AIDS epidemic and its associated ecological risk factors; the second, to develop models that could reconstruct the HIV incidence curve, obtain an estimate of the hidden HIV/AIDS population and a short term projection for AIDS incidence and a measure of precision of the estimates. To achieve these objectives, we first examined data from various sources and selected three sets of data based on national coverage and minimal reporting delay. The data sets are the outcome of the National HIV/AIDS Sentinel Surveillance Survey conducted in 1999, 2001, 2003 and 2005 by the Federal Ministry of Health; the outcome of the survey of 1057 health and laboratory facilities conducted by the Nigerian Institute of Medical Research in 2000; and case by case HIV screening data collected from an HIV/AIDS centre of excellence. A thorough review of methods used by WHO/UNAIDS to produce estimates of the Nigerian HIV/AIDS scenario was carried out. The Estimation and Projection Package (EPP) currently being used for modelling the epidemic partitions the population into at-risk, not-at-risk and infected sub-populations. It also requires some parameter input representing the force of infection and behaviour or high risk adjustment parameter. It may be difficult to precisely ascertain the size of these population groups and parameters in countries as large and diverse as Nigeria. Also, the accuracy of vital rates used in the EPP and Spectrum program is doubtful. Literature on ordinary back-calculation, nonparametric back-calculation, and modified back-calculation methods was reviewed in detail. Also, an indepth review of disease mapping techniques including multilevel models and geostatistical methods was conducted. The existence of spatial clusters was investigated using cluster analysis and some measure of spatial autocorrelation (Moran I and Geary c coefficients, semivariogram and kriging) applied to the National HIV/AIDS Surveillance data. Results revealed the existence of spatial clusters with significant positive spatial autocorrelation coefficients that tended to get stronger as the epidemic developed through time. GAM and local regression fit on the data revealed spatial trends on the north-south and east - west axis. Analysis of hierarchical, spatial and ecological factor effects on the geographical variation of HIV prevalence using variance component and spatial multilevel models was performed using restricted maximum likelihood implemented in R and empirical and full Bayesian methods in WinBUGS. Results confirmed significant spatial effects and some ecological factors were significant in explaining the variation. Also, variation due to various levels of aggregation was prominent. Estimates of cumulative HIV infection in Nigeria were obtained from both parametric and nonparametric back-calculation methods. Step and spline functions were assumed for the HIV infection curve in the parametric case. Parameter estimates obtained using 3-step and 4-step models were similar but the standard errors of these parameters were higher in the 4-step model. Estimates obtained using linear, quadratic, cubic and natural splines differed and also depended on the number and positions of the knots. Cumulative HIV infection estimates obtained using the step function models were comparable with those obtained using nonparametric back-calculation methods. Estimates from nonparametric back-calculation were obtained using the EMS algorithm. The modified nonparametric back-calculation method makes use of HIV data instead of the AIDS incidence data that are used in parametric and ordinary nonparametric back-calculation methods. In this approach, the hazard of undergoing HIV test is different for routine and symptom-related tests. The constant hazard of routine testing and the proportionality coefficient of symptom-related tests were estimated from the data and incorporated into the HIV induction distribution function. Estimates of HIV prevalence differ widely (about three times higher) from those obtained using parametric and ordinary nonparametric back-calculation methods. Nonparametric bootstrap procedure was used to obtain point-wise confidence interval and the uncertainty in estimating or predicting precisely the most recent incidence of AIDS or HIV infection was noticeable in the models but greater when AIDS data was used in the back-projection model. Analysis of case by case HIV screening data indicate that of 33349 patients who attended the HIV laboratory of a centre of excellence for the treatment of HIV/AIDS between October 2000 and August 2006, 7646 (23%) were HIV positive with females constituting about 61% of the positive cases. The bulk of infection was found in patients aged 15-49 years, about 86 percent of infected females and 78 percent of males were in this age group. Attendance at the laboratory and the proportion of HIV positive tests witnessed a remarkable increase when screening became free of charge. Logistic regression analysis indicated a 3-way interaction between time period, age and sex. Removing the effect of time by stratifying by time period left 2-way interactions between age and sex. A Correction factor for underreporting was ascertained by studying attendance at the laboratory facility over two time periods defined by the cost of HIV screening. Estimates of HIV prevalence obtained from corrected data using the modified nonparametric back-calculation are comparable with UN estimates obtained by a different method. The Nigerian HIV/AIDS pandemic is made up of multiple epidemics spatially located in different parts of the country with most of them having the potential of being sustained into the future given information on some risk factors. It is hoped that the findings of this research will be a ready tool in the hands of policy makers in the formulation of policy and design of programs to combat the epidemic in the country. Access to data on HIV/AIDS are highly restricted in the country and this hampers more in-depth modelling of the epidemic. Subject to data availability, we recommend that further work be done on the construction of stratification models based on sex, age and the geopolitical zones in order to estimate the infection intensity in each of the population groups. Uncertainties surrounding assumptions of infection intensity and incubation distribution can be minimized using Bayesian methods in back-projection

Glasgow Theses Service

Pattern discovery in adverse event data

Author: Zhang Zhicheng
Zhang Zhicheng
Publication venue
Publication date: 01/01/2007
Field of study

Imperial Users onl

Spiral - Imperial College Digital Repository

The analysis of very small samples of repeated measurements

Author: Skene Simon Scott
Publication venue
Publication date: 01/01/2008
Field of study

The statistical analysis of repeated measures or longitudinal data always requires the accommodation of the covariance structure of the repeated measurements at some stage in the analysis. The general linear mixed model is often used for such analyses, and allows for the specification of both a mean model and a covariance structure. Often the covariance structure itself is not of direct interest, but only a means to producing valid inferences about the response. This thesis considers methods for the analysis of repeated measurements which arise from very small samples. In Part 1, existing methods of analysis are shown to be inadequate for very small samples. More precisely, statistical measures of goodness of fit are not necessarily the right measure of the appropriateness of a covariance structure and inferences based on conventional Wald type procedures (with small sample adjustments) do not approximate sufficiently well their nominal properties when data are unbalanced or incomplete. In Part 2, adaptive-estimation techniques are considered for the sample covariance matrix which smooth between unstructured and structured forms; 'direct' smoothing, a weighted average of the unstructured and structured estimates, and an estimate chosen via penalised likelihood. Whilst attractive in principle, these approaches are shown to have little success in practice, being critically dependent on the 'correct' choice of smoothing structure. Part 3 considers methods which are less dependent on the covariance structure. A generalisation of a small sample adjustment to the empirical sandwich estimator is developed which accounts for its inherent bias and increased variance. This has nominal properties but lacks power. Also, a modification to Box's correction, an ANOVA F-statistic which accounts for departures from independence, is given which has both nominal properties and acceptable power. Finally, Part 4 recommends the adoption of the modified Box statistic for repeated measurements data where the sample size is very small.EThOS - Electronic Theses Online ServiceGBUnited Kingdo

OpenGrey Repository

Empirical Bayes block shrinkage for wavelet regression

Author: Wang Xue
Publication venue
Publication date
Field of study

There has been great interest in recent years in the development of wavelet methods for estimating an unknown function observed in the presence of noise, following the pioneering work of Donoho and Johnstone (1994, 1995) and Donoho et al. (1995). In this thesis, a novel empirical Bayes block (EBB) shrinkage procedure is proposed and the performance of this approach with both independent identically distributed (IID) noise and correlated noise is thoroughly explored. The first part of this thesis develops a Bayesian methodology involving the non-central X[superscript]2 distribution to simultaneously shrink wavelet coefficients in a block, based on the block sum of squares. A useful (and to the best of our knowledge, new) identity satisfied by the non-central X[superscript]2 density is exploited. This identity leads to tractable posterior calculations for suitable families of prior distributions. Also, the families of prior distribution we work with are sufficiently flexible to represent various forms of prior knowledge. Furthermore, an efficient method for finding the hyperparameters is implemented and simulations show that this method has a high degree of computational advantage. The second part relaxes the assumption of IID noise considered in the first part of this thesis. A semi-parametric model including a parametric component and a nonparametric component is presented to deal with correlated noise situations. In the parametric component, attention is paid to the covariance structure of the noise. Two distinct parametric methods (maximum likelihood estimation and time series model identification techniques) for estimating the parameters in the covariance matrix are investigated. Both methods have been successfully implemented and are believed to be new additions to smoothing methods

Nottingham ePrints