1,765 research outputs found
Analysis of the Total Food Folate Intake Data from the National Health and Nutrition Exa-amination Survey (Nhanes) Using Generalized Linear Model
The National health and nutrition examination survey (NHANES) is a respected nation-wide program in charge of assessing the health and nutritional status of adults and children in United States. Recent cal research found that folic acid play an important role in preventing baby birth defects. In this paper, we use the generalized estimating equation (GEE) method to study the generalized linear model (GLM) with compound symmetric correlation matrix for the NHANES data and investigate significant factors to ence the intake of food folic acid
Longitudinal models of iron status in a population-based cohort of mothers and children in southwest England
Longitudinal data requires special statistical methods because the observations on one subject tend to be correlated. (Although subjects can usually be assumed to be independent). When subjects are individually observed at varying sets of times with or without missing data, as is the case of ALSPAC data during pregnancy, then the resulting data is referred to as unbalanced data. This can cause further complications for the analysis.
The aim of this thesis is to contribute to longitudinal research of this topic by using mixed-effects models, which provide a powerful and flexible tool for the analysis of balanced and unbalanced data.
Although progress has been made in the study reported in this thesis, further extensions are required. As the longitudinal data typically need some structured covariance models, the overall findings indicate that when the number of occasions is large with some missing values, the use of polynomial function is inadequate to describe the model. This study highlights an approach that applies cubic spline in longitudinal modelling, including an emphasis on the use of graphical representation for exploratory analysis and the assessment of model fit.
Cubic splines provide a flexible tool for longitudinal data. The main objective of this study is to investigate a methodology to incorporate cubic spline with linear mixed models in modelling longitudinal data with number of time points and missing values
The Structure of US Food Demand
An exactly aggregable system of Gorman Engel curves for U.S. food consumption is developed and implemented. Box-Cox transformations on prices and income nest functional form. The model nests rank up to rank three. The model is estimated by nonlinear three-stage least squares with annual time series data on 21 foods, 17 nutrients, age and race demographics, and the distribution of income for 1919-1941 and 1947-2000. Results are consistent with full rank three. Point estimates for the Box-Cox parameters on income and prices are 0.86 and 1.09, respectively, strongly rejecting zero and one in both cases. No statistical evidence of serial correlation, specification errors, or parameter instability is found.Aggregation, food demand, functional form, parameter stability, rank, specification errors
Child diet over three seasons in rural Zambia: Assessments of usual nutrient intake adequacy, components of intake variation and dietary diversity score performance
Inadequate dietary intakes are a key underlying cause of undernutrition, which places children at risk for impaired growth and development. Current estimates of prevalence of nutrient inadequacies are needed for the design of interventions to improve child diet. Estimates of nutrient intake variance components and validation of dietary diversity scores (DDS) among children are needed to design studies of nutrient intakes or population-level dietary adequacy, respectively. We conducted seven repeat 24-hour dietary recalls over six months among 4- to 8-year-old rural Zambian children (n=202). Participating children were enrolled in the non-intervened arm of a biofortified maize efficacy trial. We calculated observed nutrient intakes, frequencies of food consumption, usual intakes over six months, usual intakes by survey round and 7- and 10-food group DDS by survey round. Usual nutrient intakes over six months were used to estimate the prevalence of inadequacy of eleven micronutrients. We estimated within-person, between-person and seasonal components of variance in observed nutrient intakes. The performance of each DDS relative to overall nutrient intake adequacy and to usual intakes of five selected micronutrients was assessed by season. Children’s diets were heavily plant based and included few animal source foods. Estimated prevalence of inadequate calcium, vitamin B12, folate and iron intakes was >99%, 76%, 57% and 25%, respectively. Mean nutrient intakes differed significantly between three agricultural seasons and season accounted for 3%–23% of total intake variance. Within- to between-person variance ratios were high due to low between-person variance. DDS were associated with overall intake adequacy, but this association was significantly weaker in the late lean season than in the late post-harvest or early lean seasons. The heavily plant-based diet of rural Zambian children places them at risk for inadequate nutrient intakes. Because nutrient intakes vary by season, future studies estimating usual intakes should include repeat observations in multiple seasons. The 10-food group DDS is recommended over the 7-food group DDS for use as a population-level indicator of dietary adequacy
Recommended from our members
Statistical models for estimating the intake of nutrients and foods from complex survey data
Background: The consequences of poor nutrition are well known and of wide concern. Governments and public health agencies utilise food and diet surveillance data to make decisions that lead to improvements in nutrition. These surveys often utilise complex sample designs for efficient data collection. There are several challenges in the statistical analysis of dietary intake data collected using complex survey designs, which have not been fully addressed by current methods. Firstly, the shape of the distribution of intake can be highly skewed due to the presence of outlier observations and a large proportion of zero observations arising from the inability of the food diary to capture consumption within the period of observation. Secondly, dietary data is subject to variability arising from day-to-day individual variation in food consumption and measurement error, to be accounted for in the estimation procedure for correct inferences. Thirdly, the complex sample design needs to be incorporated into the estimation procedure to allow extrapolation of results into the target population. This thesis aims to develop novel statistical methods to address these challenges, applied to the analysis of iron intake data from the UK National Diet and Nutrition Survey Rolling Programme (NDNS RP) and UK national prescription data of iron deficiency medication.
Methods: 1) To assess the nutritional status of particular population groups a two-part model with a generalised gamma (GG) distribution was developed for intakes that show high frequencies of zero observations. The two-part model accommodated the sources of data variation of dietary intake with a random intercept in each component, which could be correlated to allow a correlation between the probability of consuming and the amount consumed.
2) To identify population groups at risk of low nutrient intakes, a linear quantile mixed-effects model was developed to model quantiles of the distribution of intake as a function of explanatory variables. The proposed approach was illustrated by comparing the quantiles of iron intake with Lower Reference Nutrient Intakes (LRNI) recommendations using NDNS RP.
This thesis extended the estimation procedures of both the two-part model with GG distribution and the linear quantile mixed-effects model to incorporate the complex sample design in three steps: the likelihood function was multiplied by the sample weightings; bootstrap methods for the estimation of the variance and finally, the variance estimation of the model parameters was stratified by the survey strata.
3) To evaluate the allocation of resources to alleviate nutritional deficiencies, a quantile linear mixed-effects model was used to analyse the distribution of expenditure on iron deficiency medication across health boards in the UK. Expenditure is likely to depend on the iron status of the region; therefore, for a fair comparison among health boards, iron status was estimated using the method developed in objective 2) and used in the specification of the median amount spent. Each health board is formed by a set of general practices (GPs), therefore, a random intercept was used to induce correlation between expenditure from two GPs from the same health board.
Finally, the approaches in objectives 1) and 2) were compared with the traditional approach based on weighted linear regression modelling used in the NDNS RP reports. All analyses were implemented using SAS and R.
Results: The two-part model with GG distribution fitted to amount of iron consumed from selected episodically food, showed that females tended to have greater odds of consuming iron from foods but consumed smaller amounts. As age groups increased, consumption tended to increase relative to the reference group though odds of consumption varied. Iron consumption also appeared to be dependent on National Statistics Socio-Economic Classification (NSSEC) group with lower social groups consuming less, in general. The quantiles of iron intake estimated using the linear quantile mixed-effects model showed that more than 25% of females aged 11-50y are below the LRNI, and that 11-18y girls are the group at highest of deficiency in the UK. Predictions of spending on iron medication in the UK based on the linear quantile mixed-effects model showed areas of higher iron intake resulted in lower spending on treating iron deficiency. In a geographical display of expenditure, Northern Ireland featured the lowest amount spent. Comparing the results from the methods proposed here showed that using the traditional approach based on weighted regression analysis could result in spurious associations.
Discussion: This thesis developed novel approaches to the analysis of dietary complex survey data to address three important objectives of diet surveillance, namely the mean estimation of food intake by population groups, identification of groups at high risk of nutrient deficiency and allocation of resources to alleviate nutrient deficiencies. The methods provided models of good fit to dietary data, accounted for the sources of data variability and extended the estimation procedures to incorporate the complex sample survey design. The use of a GG distribution for modelling intake is an important improvement over existing methods, as it includes many distributions with different shapes and its domain takes non-negative values. The two-part model accommodated the sources of data variation of dietary intake with a random intercept in each component, which could be correlated to allow a correlation between the probability of consuming and the amount consumed. This also improves existing approaches that assume a zero correlation. The linear quantile mixed-effects model utilises the asymmetric Laplace distribution which can also accommodate many different distributional shapes, and likelihood-based estimation is robust to model misspecification. This method is an important improvement over existing methods used in nutritional research as it explicitly models the quantiles in terms of explanatory variables using a novel quantile regression model with random effects. The application of these models to UK national data confirmed the association of poorer diets and lower social class, identified the group of 11-50y females as a group at high risk of iron deficiency, and highlighted Northern Ireland as the region with the lowest expenditure on iron prescriptions.Medical Research Counci
A new multivariate measurement error model with zero-inflated dietary data, and its application to dietary assessment
In the United States the preferred method of obtaining dietary intake data is
the 24-hour dietary recall, yet the measure of most interest is usual or
long-term average daily intake, which is impossible to measure. Thus, usual
dietary intake is assessed with considerable measurement error. Also, diet
represents numerous foods, nutrients and other components, each of which have
distinctive attributes. Sometimes, it is useful to examine intake of these
components separately, but increasingly nutritionists are interested in
exploring them collectively to capture overall dietary patterns. Consumption of
these components varies widely: some are consumed daily by almost everyone on
every day, while others are episodically consumed so that 24-hour recall data
are zero-inflated. In addition, they are often correlated with each other.
Finally, it is often preferable to analyze the amount of a dietary component
relative to the amount of energy (calories) in a diet because dietary
recommendations often vary with energy level. The quest to understand overall
dietary patterns of usual intake has to this point reached a standstill. There
are no statistical methods or models available to model such complex
multivariate data with its measurement error and zero inflation. This paper
proposes the first such model, and it proposes the first workable solution to
fit such a model. After describing the model, we use survey-weighted MCMC
computations to fit the model, with uncertainty estimation coming from balanced
repeated replication.Comment: Published in at http://dx.doi.org/10.1214/10-AOAS446 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Transformations of Additivity in Measurement Error Models
In many problems one wants to model the relationship between a response Y and a covariate X. Sometimes it is difficult, expensive, or even impossible to observe X directly, but one can instead observe a substitute variable W which is easier to obtain. By far the most common model for the relationship between the actual covariate of interest X and the substitute W is W = X + U, where the variable U represents measurement error. This assumption of additive measurement error may be unreasonable for certain data sets. We propose a new model, namely h(W) = h(X) + U, where h(.) is a monotone transformation function selected from some family H of monotone functions. The idea of the new model is that, in the correct scale, measurement error is additive. We propose two possible transformation families H. One is based of selecting a transformation which makes the within sample mean and standard deviation of replicated W’s uncorrelated. The second is based on selecting the transformation so that the errors (U’s) fit a prespecified distribution. Transformation families used are the parametric power transformations and a cubic spline family. Several data examples are presented to illustrate the methods
Relaxation Penalties and Priors for Plausible Modeling of Nonidentified Bias Sources
In designed experiments and surveys, known laws or design feat ures provide
checks on the most relevant aspects of a model and identify the target
parameters. In contrast, in most observational studies in the health and social
sciences, the primary study data do not identify and may not even bound target
parameters. Discrepancies between target and analogous identified parameters
(biases) are then of paramount concern, which forces a major shift in modeling
strategies. Conventional approaches are based on conditional testing of
equality constraints, which correspond to implausible point-mass priors. When
these constraints are not identified by available data, however, no such
testing is possible. In response, implausible constraints can be relaxed into
penalty functions derived from plausible prior distributions. The resulting
models can be fit within familiar full or partial likelihood frameworks. The
absence of identification renders all analyses part of a sensitivity analysis.
In this view, results from single models are merely examples of what might be
plausibly inferred. Nonetheless, just one plausible inference may suffice to
demonstrate inherent limitations of the data. Points are illustrated with
misclassified data from a study of sudden infant death syndrome. Extensions to
confounding, selection bias and more complex data structures are outlined.Comment: Published in at http://dx.doi.org/10.1214/09-STS291 the Statistical
Science (http://www.imstat.org/sts/) by the Institute of Mathematical
Statistics (http://www.imstat.org
Bayesian Methods in Nutrition Epidemiology and Regression-based Predictive Models in Healthcare
This dissertation has mainly two parts. In the first part, we propose a bivariate nonlinear multivariate measurement error model to understand the distribution of dietary intake and extend it to a multivariate model to capture dietary patterns in nutrition epidemiology. In the second part, we propose regression-based predictive models to accurately predict surgery duration in healthcare.
Understanding the distribution of episodically consumed dietary components is an important problem in public health. Short-term measurements of episodically consumed dietary components are zero-inflated skewed distributions. So-called two-part models have been developed for such data. However, there is much greater public health interest in the usual intake adjusted for caloric intake. Recently a nonlinear mixed effects model has been developed and fit by maximum likelihood using nonlinear mixed effects programs. However, the fitting is slow and unstable. We develop a Monte-Carlo-based fitting method in Chapter II. We demonstrate numerically that our methods lead to increased speed of computation, converge to reasonable solutions, and have the flexibility to be used in either a frequentist or a Bayesian manner. Diet consists of numerous foods, nutrients and other components, each of which have distinctive attributes. Increasingly nutritionists are interested in exploring them collectively to capture overall dietary patterns. We thus extend the bivariate model described in Chapter III to multivariate level. We use survey-weighted MCMC computations to fit the model, with uncertainty estimation coming from balanced repeated replication. The methodology is illustrated through an application of estimating the population distribution of the Healthy Eating Index-2005 (HEI-2005), a multi-component dietary quality index , among children aged 2-8 in the United States.
The second part of this dissertation is to accurately predict surgery duration. Prior research has identified the current procedural terminology (CPT) codes as the
most important factor when predicting surgical case durations but there has been little reporting of a general predictive methodology using it effectively. In Chapter IV, we propose two regression-based predictive models. However, the naively constructed design matrix is singular. We thus devise a systematic procedure to construct a fullranked design matrix. Using surgical data from a central Texas hospital, we compare the proposed models with a few benchmark methods and demonstrate that our models lead to a remarkable reduction in prediction errors
- …