49 research outputs found
Recommended from our members
Bayes-LQAS: Classifying the Prevalence of Global Acute Malnutrition
Lot Quality Assurance Sampling (LQAS) applications in health have generally relied on frequentist interpretations for statistical validity. Yet health professionals often seek statements about the probability distribution of unknown parameters to answer questions of interest. The frequentist paradigm does not pretend to yield such information, although a Bayesian formulation might. This is the source of an error made in a recent paper published in this journal. Many applications lend themselves to a Bayesian treatment, and would benefit from such considerations in their design. We discuss Bayes-LQAS (B-LQAS), which allows for incorporation of prior information into the LQAS classification procedure, and thus shows how to correct the aforementioned error. Further, we pay special attention to the formulation of Bayes Operating Characteristic Curves and the use of prior information to improve survey designs. As a motivating example, we discuss the classification of Global Acute Malnutrition prevalence and draw parallels between the Bayes and classical classifications schemes. We also illustrate the impact of informative and non-informative priors on the survey design. Results indicate that using a Bayesian approach allows the incorporation of expert information and/or historical data and is thus potentially a valuable tool for making accurate and precise classifications
Reduced-rank spatio-temporal modeling of air pollution concentrations in the Multi-Ethnic Study of Atherosclerosis and Air Pollution
There is growing evidence in the epidemiologic literature of the relationship
between air pollution and adverse health outcomes. Prediction of individual air
pollution exposure in the Environmental Protection Agency (EPA) funded
Multi-Ethnic Study of Atheroscelerosis and Air Pollution (MESA Air) study
relies on a flexible spatio-temporal prediction model that integrates land-use
regression with kriging to account for spatial dependence in pollutant
concentrations. Temporal variability is captured using temporal trends
estimated via modified singular value decomposition and temporally varying
spatial residuals. This model utilizes monitoring data from existing regulatory
networks and supplementary MESA Air monitoring data to predict concentrations
for individual cohort members. In general, spatio-temporal models are limited
in their efficacy for large data sets due to computational intractability. We
develop reduced-rank versions of the MESA Air spatio-temporal model. To do so,
we apply low-rank kriging to account for spatial variation in the mean process
and discuss the limitations of this approach. As an alternative, we represent
spatial variation using thin plate regression splines. We compare the
performance of the outlined models using EPA and MESA Air monitoring data for
predicting concentrations of oxides of nitrogen (NO)-a pollutant of primary
interest in MESA Air-in the Los Angeles metropolitan area via cross-validated
. Our findings suggest that use of reduced-rank models can improve
computational efficiency in certain cases. Low-rank kriging and thin plate
regression splines were competitive across the formulations considered,
although TPRS appeared to be more robust in some settings.Comment: Published in at http://dx.doi.org/10.1214/14-AOAS786 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Recommended from our members
Multiple Category-Lot Quality Assurance Sampling: A New Classification System with Application to Schistosomiasis Control
Background: Originally a binary classifier, Lot Quality Assurance Sampling (LQAS) has proven to be a useful tool for classification of the prevalence of Schistosoma mansoni into multiple categories (≤10%, >10 and <50%, ≥50%), and semi-curtailed sampling has been shown to effectively reduce the number of observations needed to reach a decision. To date the statistical underpinnings for Multiple Category-LQAS (MC-LQAS) have not received full treatment. We explore the analytical properties of MC-LQAS, and validate its use for the classification of S. mansoni prevalence in multiple settings in East Africa. Methodology We outline MC-LQAS design principles and formulae for operating characteristic curves. In addition, we derive the average sample number for MC-LQAS when utilizing semi-curtailed sampling and introduce curtailed sampling in this setting. We also assess the performance of MC-LQAS designs with maximum sample sizes of n = 15 and n = 25 via a weighted kappa-statistic using S. mansoni data collected in 388 schools from four studies in East Africa. Principle Findings Overall performance of MC-LQAS classification was high (kappa-statistic of 0.87). In three of the studies, the kappa-statistic for a design with n = 15 was greater than 0.75. In the fourth study, where these designs performed poorly (kappa-statistic less than 0.50), the majority of observations fell in regions where potential error is known to be high. Employment of semi-curtailed and curtailed sampling further reduced the sample size by as many as 0.5 and 3.5 observations per school, respectively, without increasing classification error. Conclusion/Significance: This work provides the needed analytics to understand the properties of MC-LQAS for assessing the prevalance of S. mansoni and shows that in most settings a sample size of 15 children provides a reliable classification of schools
Recommended from our members
The effect of clustering on lot quality assurance sampling: a probabilistic model to calculate sample sizes for quality assessments
Background: Traditional Lot Quality Assurance Sampling (LQAS) designs assume observations are collected using simple random sampling. Alternatively, randomly sampling clusters of observations and then individuals within clusters reduces costs but decreases the precision of the classifications. In this paper, we develop a general framework for designing the cluster(C)-LQAS system and illustrate the method with the design of data quality assessments for the community health worker program in Rwanda. Results: To determine sample size and decision rules for C-LQAS, we use the beta-binomial distribution to account for inflated risk of errors introduced by sampling clusters at the first stage. We present general theory and code for sample size calculations. The C-LQAS sample sizes provided in this paper constrain misclassification risks below user-specified limits. Multiple C-LQAS systems meet the specified risk requirements, but numerous considerations, including per-cluster versus per-individual sampling costs, help identify optimal systems for distinct applications. Conclusions: We show the utility of C-LQAS for data quality assessments, but the method generalizes to numerous applications. This paper provides the necessary technical detail and supplemental code to support the design of C-LQAS for specific programs
Risk Factors for Long-Term Coronary Artery Calcium Progression in the Multi-Ethnic Study of Atherosclerosis.
BackgroundCoronary artery calcium (CAC) detected by noncontrast cardiac computed tomography scanning is a measure of coronary atherosclerosis burden. Increasing CAC levels have been strongly associated with increased coronary events. Prior studies of cardiovascular disease risk factors and CAC progression have been limited by short follow-up or restricted to patients with advanced disease.Methods and resultsWe examined cardiovascular disease risk factors and CAC progression in a prospective multiethnic cohort study. CAC was measured 1 to 4 times (mean 2.5 scans) over 10 years in 6810 adults without preexisting cardiovascular disease. Mean CAC progression was 23.9 Agatston units/year. An innovative application of mixed-effects models investigated associations between cardiovascular disease risk factors and CAC progression. This approach adjusted for time-varying factors, was flexible with respect to follow-up time and number of observations per participant, and allowed simultaneous control of factors associated with both baseline CAC and CAC progression. Models included age, sex, study site, scanner type, and race/ethnicity. Associations were observed between CAC progression and age (14.2 Agatston units/year per 10 years [95% CI 13.0 to 15.5]), male sex (17.8 Agatston units/year [95% CI 15.3 to 20.3]), hypertension (13.8 Agatston units/year [95% CI 11.2 to 16.5]), diabetes (31.3 Agatston units/year [95% CI 27.4 to 35.3]), and other factors.ConclusionsCAC progression analyzed over 10 years of follow-up, with a novel analytical approach, demonstrated strong relationships with risk factors for incident cardiovascular events. Longitudinal CAC progression analyzed in this framework can be used to evaluate novel cardiovascular risk factors
Recommended from our members
Prevalence, Awareness, Treatment, and Control of Hypertension in United States Counties, 2001–2009
Hypertension is an important and modifiable risk factor for cardiovascular disease and mortality. Over the last decade, national-levels of controlled hypertension have increased, but little information on hypertension prevalence and trends in hypertension treatment and control exists at the county-level. We estimate trends in prevalence, awareness, treatment, and control of hypertension in US counties using data from the National Health and Nutrition Examination Survey (NHANES) in five two-year waves from 1999–2008 including 26,349 adults aged 30 years and older and from the Behavioral Risk Factor Surveillance System (BRFSS) from 1997–2009 including 1,283,722 adults aged 30 years and older. Hypertension was defined as systolic blood pressure (BP) of at least 140 mm Hg, self-reported use of antihypertensive treatment, or both. Hypertension control was defined as systolic BP less than 140 mm Hg. The median prevalence of total hypertension in 2009 was estimated at 37.6% (range: 26.5 to 54.4%) in men and 40.1% (range: 28.5 to 57.9%) in women. Within-state differences in the county prevalence of uncontrolled hypertension were as high as 7.8 percentage points in 2009. Awareness, treatment, and control was highest in the southeastern US, and increased between 2001 and 2009 on average. The median county-level control in men was 57.7% (range: 43.4 to 65.9%) and in women was 57.1% (range: 43.0 to 65.46%) in 2009, with highest rates in white men and black women. While control of hypertension is on the rise, prevalence of total hypertension continues to increase in the US. Concurrent increases in treatment and control of hypertension are promising, but efforts to decrease the prevalence of hypertension are needed
Comparing the performance of cluster random sampling and integrated threshold mapping for targeting trachoma control, using computer simulation.
BACKGROUND: Implementation of trachoma control strategies requires reliable district-level estimates of trachomatous inflammation-follicular (TF), generally collected using the recommended gold-standard cluster randomized surveys (CRS). Integrated Threshold Mapping (ITM) has been proposed as an integrated and cost-effective means of rapidly surveying trachoma in order to classify districts according to treatment thresholds. ITM differs from CRS in a number of important ways, including the use of a school-based sampling platform for children aged 1-9 and a different age distribution of participants. This study uses computerised sampling simulations to compare the performance of these survey designs and evaluate the impact of varying key parameters. METHODOLOGY/PRINCIPAL FINDINGS: Realistic pseudo gold standard data for 100 districts were generated that maintained the relative risk of disease between important sub-groups and incorporated empirical estimates of disease clustering at the household, village and district level. To simulate the different sampling approaches, 20 clusters were selected from each district, with individuals sampled according to the protocol for ITM and CRS. Results showed that ITM generally under-estimated the true prevalence of TF over a range of epidemiological settings and introduced more district misclassification according to treatment thresholds than did CRS. However, the extent of underestimation and resulting misclassification was found to be dependent on three main factors: (i) the district prevalence of TF; (ii) the relative risk of TF between enrolled and non-enrolled children within clusters; and (iii) the enrollment rate in schools. CONCLUSIONS/SIGNIFICANCE: Although in some contexts the two methodologies may be equivalent, ITM can introduce a bias-dependent shift as prevalence of TF increases, resulting in a greater risk of misclassification around treatment thresholds. In addition to strengthening the evidence base around choice of trachoma survey methodologies, this study illustrates the use of a simulated approach in addressing operational research questions for trachoma but also other NTDs
Historical Prediction Modeling Approach for Estimating Long-Term Concentrations of PM in Cohort Studies Before the 1999 Implementation of Widespread Monitoring
Introduction: Recent cohort studies use exposure prediction models to estimate the association between long-term residential concentrations of PM2.5 and health. Because these prediction models rely on PM2.5 monitoring data, predictions for times before extensive spatial monitoring present a challenge to understanding long-term exposure effects. The Environmental Protection Agency (EPA) Federal Reference Method (FRM) network for PM2.5 was established in 1999. We evaluated a novel statistical approach to produce high quality exposure predictions from 1980-2010 for epidemiological applications.
Methods: We developed spatio-temporal prediction models using geographic predictors and annual average PM2.5 data from 1999 through 2010 from the FRM and the Interagency Monitoring of Protected Visual Environments (IMPROVE) networks. The model consists of a spatially-varying long-term mean, a spatially-varying temporal trend, and spatially-varying and temporally-independent spatio-temporal residuals structured using a universal kriging framework. Temporal trends in annual averages of PM2.5 before 1999 were estimated by using a) extrapolation based on PM2.5 data for 1999-2010 in FRM/IMPROVE, b) PM2.5 sulfate data for 1987-2010 in the Clean Air Status and Trends Network, and c) visibility data for 1980-2010 across the Weather-Bureau-Army-Navy network. We validated the resulting models using PM2.5 data collected before 1999 from IMPROVE, California Air Resources Board dichotomous sampler monitoring (CARB dichot), the Southern California Children’s Health Study (CHS), and the Inhalable Particulate Network (IPN).
Results: The PM2.5 prediction model performed well across three trend estimation approaches when validated using IMPROVE and CHS data (R2= 0.84–0.91). Model performance using CARB dichot and IPN data was worse than those in IMPROVE most likely due to inconsistent sampling methods and smaller numbers of monitoring sites.
Discussion: Our prediction modeling approach will allow health effects estimation associated with long-term exposures to PM2.5 over extended time periods of up to 30 years
Multiple Category-Lot Quality Assurance Sampling: A New Classification System with Application to Schistosomiasis Control
Background
Originally a binary classifier, Lot Quality Assurance Sampling (LQAS) has proven to be a useful tool for
classification of the prevalence of Schistosoma mansoni into multiple categories (#10%, .10 and ,50%, $50%), and semicurtailed sampling has been shown to effectively reduce the number of observations needed to reach a decision. To date the statistical underpinnings for Multiple Category-LQAS (MC-LQAS) have not received full treatment. We explore the analytical properties of MC-LQAS, and validate its use for the classification of S. mansoni prevalence in multiple settings in East Africa.
Methodology
We outline MC-LQAS design principles and formulae for operating characteristic curves. In addition, we derive the average sample number for MC-LQAS when utilizing semi-curtailed sampling and introduce curtailed sampling in this setting. We also assess the performance of MC-LQAS designs with maximum sample sizes of n = 15 and n = 25 via a weighted kappa-statistic using S. mansoni data collected in 388 schools from four studies in East Africa.
Principle Findings: Overall performance of MC-LQAS classification was high (kappa-statistic of 0.87). In three of the studies, the kappa-statistic for a design with n = 15 was greater than 0.75. In the fourth study, where these designs performed poorly (kappa-statistic less than 0.50), the majority of observations fell in regions where potential error is known to be high. Employment of semi-curtailed and curtailed sampling further reduced the sample size by as many as 0.5 and 3.5 observations per school, respectively, without increasing classification error.
Conclusion/Significance
This work provides the needed analytics to understand the properties of MC-LQAS for assessingthe prevalance of S. mansoni and shows that in most settings a sample size of 15 children provides a reliable classification of schools