14 research outputs found

    Applied Nonparametric Methods

    Get PDF

    Locally Weighted Polynomial Regression: Parameter Choice and Application to Forecasts of the Great Salt Lake

    Get PDF
    Relationships between hydrologic variables are often nonlinear. Usually the functional form of such a relationship is not known a priori. A multivariate, nonparametric regression methodology is provided here for approximating the underlying regression function using locally veighted polynomials. Locally weighted polynomials consider the approximation of the target function through a Taylor series expansion of the function in the neighborhood of the point of estimate. Cross validatory procedures for the selection of the size of the neighborhood over which this approximation should take place, and for the order of the local polynomial to use are provided and shown for some simple situations. The utility of this nonparametric regression approach is demonstrated through an application to nonparametric short term forecasts of the biweekly Great Salt Lake volume. Blind forecasts up to four years in the future using the 1847-1993 time series of the Great Salt Lake are presented

    Applied Nonparametric Methods

    Get PDF

    Modelling HIV/AIDS epidemic in Nigeria

    Get PDF
    Nigeria is one of the countries most affected by the HIV/AIDS pandemic, third only to India and South Africa. With about 10% of the global HIV/AIDS cases estimated to be in the country, the public health and socio-economic implications are enormous. This thesis has two broad aims: the first is to develop statistical models which adequately describe the spatial distribution of the Nigerian HIV/AIDS epidemic and its associated ecological risk factors; the second, to develop models that could reconstruct the HIV incidence curve, obtain an estimate of the hidden HIV/AIDS population and a short term projection for AIDS incidence and a measure of precision of the estimates. To achieve these objectives, we first examined data from various sources and selected three sets of data based on national coverage and minimal reporting delay. The data sets are the outcome of the National HIV/AIDS Sentinel Surveillance Survey conducted in 1999, 2001, 2003 and 2005 by the Federal Ministry of Health; the outcome of the survey of 1057 health and laboratory facilities conducted by the Nigerian Institute of Medical Research in 2000; and case by case HIV screening data collected from an HIV/AIDS centre of excellence. A thorough review of methods used by WHO/UNAIDS to produce estimates of the Nigerian HIV/AIDS scenario was carried out. The Estimation and Projection Package (EPP) currently being used for modelling the epidemic partitions the population into at-risk, not-at-risk and infected sub-populations. It also requires some parameter input representing the force of infection and behaviour or high risk adjustment parameter. It may be difficult to precisely ascertain the size of these population groups and parameters in countries as large and diverse as Nigeria. Also, the accuracy of vital rates used in the EPP and Spectrum program is doubtful. Literature on ordinary back-calculation, nonparametric back-calculation, and modified back-calculation methods was reviewed in detail. Also, an indepth review of disease mapping techniques including multilevel models and geostatistical methods was conducted. The existence of spatial clusters was investigated using cluster analysis and some measure of spatial autocorrelation (Moran I and Geary c coefficients, semivariogram and kriging) applied to the National HIV/AIDS Surveillance data. Results revealed the existence of spatial clusters with significant positive spatial autocorrelation coefficients that tended to get stronger as the epidemic developed through time. GAM and local regression fit on the data revealed spatial trends on the north-south and east - west axis. Analysis of hierarchical, spatial and ecological factor effects on the geographical variation of HIV prevalence using variance component and spatial multilevel models was performed using restricted maximum likelihood implemented in R and empirical and full Bayesian methods in WinBUGS. Results confirmed significant spatial effects and some ecological factors were significant in explaining the variation. Also, variation due to various levels of aggregation was prominent. Estimates of cumulative HIV infection in Nigeria were obtained from both parametric and nonparametric back-calculation methods. Step and spline functions were assumed for the HIV infection curve in the parametric case. Parameter estimates obtained using 3-step and 4-step models were similar but the standard errors of these parameters were higher in the 4-step model. Estimates obtained using linear, quadratic, cubic and natural splines differed and also depended on the number and positions of the knots. Cumulative HIV infection estimates obtained using the step function models were comparable with those obtained using nonparametric back-calculation methods. Estimates from nonparametric back-calculation were obtained using the EMS algorithm. The modified nonparametric back-calculation method makes use of HIV data instead of the AIDS incidence data that are used in parametric and ordinary nonparametric back-calculation methods. In this approach, the hazard of undergoing HIV test is different for routine and symptom-related tests. The constant hazard of routine testing and the proportionality coefficient of symptom-related tests were estimated from the data and incorporated into the HIV induction distribution function. Estimates of HIV prevalence differ widely (about three times higher) from those obtained using parametric and ordinary nonparametric back-calculation methods. Nonparametric bootstrap procedure was used to obtain point-wise confidence interval and the uncertainty in estimating or predicting precisely the most recent incidence of AIDS or HIV infection was noticeable in the models but greater when AIDS data was used in the back-projection model. Analysis of case by case HIV screening data indicate that of 33349 patients who attended the HIV laboratory of a centre of excellence for the treatment of HIV/AIDS between October 2000 and August 2006, 7646 (23%) were HIV positive with females constituting about 61% of the positive cases. The bulk of infection was found in patients aged 15-49 years, about 86 percent of infected females and 78 percent of males were in this age group. Attendance at the laboratory and the proportion of HIV positive tests witnessed a remarkable increase when screening became free of charge. Logistic regression analysis indicated a 3-way interaction between time period, age and sex. Removing the effect of time by stratifying by time period left 2-way interactions between age and sex. A Correction factor for underreporting was ascertained by studying attendance at the laboratory facility over two time periods defined by the cost of HIV screening. Estimates of HIV prevalence obtained from corrected data using the modified nonparametric back-calculation are comparable with UN estimates obtained by a different method. The Nigerian HIV/AIDS pandemic is made up of multiple epidemics spatially located in different parts of the country with most of them having the potential of being sustained into the future given information on some risk factors. It is hoped that the findings of this research will be a ready tool in the hands of policy makers in the formulation of policy and design of programs to combat the epidemic in the country. Access to data on HIV/AIDS are highly restricted in the country and this hampers more in-depth modelling of the epidemic. Subject to data availability, we recommend that further work be done on the construction of stratification models based on sex, age and the geopolitical zones in order to estimate the infection intensity in each of the population groups. Uncertainties surrounding assumptions of infection intensity and incubation distribution can be minimized using Bayesian methods in back-projection

    The analysis of very small samples of repeated measurements

    Get PDF
    The statistical analysis of repeated measures or longitudinal data always requires the accommodation of the covariance structure of the repeated measurements at some stage in the analysis. The general linear mixed model is often used for such analyses, and allows for the specification of both a mean model and a covariance structure. Often the covariance structure itself is not of direct interest, but only a means to producing valid inferences about the response. This thesis considers methods for the analysis of repeated measurements which arise from very small samples. In Part 1, existing methods of analysis are shown to be inadequate for very small samples. More precisely, statistical measures of goodness of fit are not necessarily the right measure of the appropriateness of a covariance structure and inferences based on conventional Wald type procedures (with small sample adjustments) do not approximate sufficiently well their nominal properties when data are unbalanced or incomplete. In Part 2, adaptive-estimation techniques are considered for the sample covariance matrix which smooth between unstructured and structured forms; 'direct' smoothing, a weighted average of the unstructured and structured estimates, and an estimate chosen via penalised likelihood. Whilst attractive in principle, these approaches are shown to have little success in practice, being critically dependent on the 'correct' choice of smoothing structure. Part 3 considers methods which are less dependent on the covariance structure. A generalisation of a small sample adjustment to the empirical sandwich estimator is developed which accounts for its inherent bias and increased variance. This has nominal properties but lacks power. Also, a modification to Box's correction, an ANOVA F-statistic which accounts for departures from independence, is given which has both nominal properties and acceptable power. Finally, Part 4 recommends the adoption of the modified Box statistic for repeated measurements data where the sample size is very small.EThOS - Electronic Theses Online ServiceGBUnited Kingdo

    Empirical Bayes block shrinkage for wavelet regression

    Get PDF
    There has been great interest in recent years in the development of wavelet methods for estimating an unknown function observed in the presence of noise, following the pioneering work of Donoho and Johnstone (1994, 1995) and Donoho et al. (1995). In this thesis, a novel empirical Bayes block (EBB) shrinkage procedure is proposed and the performance of this approach with both independent identically distributed (IID) noise and correlated noise is thoroughly explored. The first part of this thesis develops a Bayesian methodology involving the non-central X[superscript]2 distribution to simultaneously shrink wavelet coefficients in a block, based on the block sum of squares. A useful (and to the best of our knowledge, new) identity satisfied by the non-central X[superscript]2 density is exploited. This identity leads to tractable posterior calculations for suitable families of prior distributions. Also, the families of prior distribution we work with are sufficiently flexible to represent various forms of prior knowledge. Furthermore, an efficient method for finding the hyperparameters is implemented and simulations show that this method has a high degree of computational advantage. The second part relaxes the assumption of IID noise considered in the first part of this thesis. A semi-parametric model including a parametric component and a nonparametric component is presented to deal with correlated noise situations. In the parametric component, attention is paid to the covariance structure of the noise. Two distinct parametric methods (maximum likelihood estimation and time series model identification techniques) for estimating the parameters in the covariance matrix are investigated. Both methods have been successfully implemented and are believed to be new additions to smoothing methods
    corecore