89 research outputs found
Estimation of a multivariate mean under model selection uncertainty
<p>Model selection uncertainty would occur if we selected a model based on one data set and subsequently applied it for statistical inferences, because the "correct" model would not be selected with certainty. <br />When the selection and inference are based on the same dataset, some additional problems arise due to the correlation of the two stages (selection and inference). In this paper model selection uncertainty is considered and model averaging is proposed. The proposal is related to the theory of James and Stein of estimating more than three parameters from independent normal observations. We suggest that a model averaging scheme taking into account the selection procedure could be more appropriate than model selection alone. Some properties of this model averaging estimator are investigated; in particular we show using Stein's results that it is a minimax estimator and can outperform Stein-type estimators.</p
Modeling hierarchical relationships in epidemiological studies: a Bayesian networks approach
Hierarchical relationships between risk factors are seldom taken into account in epidemiological studies though some authors stressed the importance of doing so, and proposed a conceptual framework in which each level of the hierarchy is modeled separately. The objective of this paper was to implement a simple version of their framework, and to propose an alternative procedure based on a Bayesian Network (BN). These approaches were illustrated in modeling the risk of diarrhea infection for 2740 children aged 0 to 59 months in Cameroon. The authors implemented a (naĂŻve) logistic regression, a step-level logistic regression and also a BN. While the first approach is inadequate, the two others approaches both account for the hierarchical structure but to different estimates and interpretations. BN implementation showed that a child in a family in the poorest group has respectively 89%, 40% and 18% probabilities of having poor sanitation, being malnourished and having diarrhea. An advantage of the latter approach is that it enables one to determine the probability that a risk factor (and/or the outcome) is in a given state, given the states of the others. Although the BN considered here is very simple, the method can deal with more complicated models.Bayesian networks; hierarchical model; diarrhea infection; disease determinants; logistic regression
Multidimensional Nature of Undernutrition: A Statistical Approach
The statistical assessment of undernutrition is usually restricted to a pairwise analysis of anthropometric indicators. The main objective of this study was to model the associations between underweight, stunting and wasting and to check whether multidimensionality of undernutrition can be justified from a purely statistical point of view. 3742 children aged 0 to 59 months were enrolled in a cross-sectional household survey (2004 Cameroon Demographic and Health Surveys (DHS)). The saturated loglinear model and the multiple correspondence analysis (MCA) showed no interaction and a highly significant association between underweight and stunting (P=0), underweight and wasting (P=0); but not between stunting and wasting (P=0.430). Cronbach's alpha coefficient between weight-for-age, height-for-age and weight-for-height was 0.62 (95% CI 0.59, 0.64). Thus, the study of these associations is not straightforward as it would appear in a first instance. The lack of three-factor interaction and the value of the Cronbach's alpha coefficient indicate that undernutrition is indeed (statistically) multidimensional. The three indicators are not statistically redundant; thus for the case of Cameroon the choice of a particular anthropometric indicator should depend on the goal of the policy maker, as it comes out of this study that no single indicator is to be used for all situations.Stunting; Wasting; Underweight; anthropometric measures; Z-score; Loglinear models
Estimating and Correcting the Effects of Model Selection Uncertainty
Die meisten statistischen Analysen werden
in Unkenntnis des wahren Modells durchgefĂĽhrt, d.h. dass das
Modell, das die Daten erzeugte, unbekannt ist und die Daten
zunächst dafür verwendet werden, mit Hilfe eines
Modellauswahlkriteriums ein Modell aus einer Menge plausibler
Modelle auszuwählen. Gewöhnlich werden die Daten dann verwendet, um
SchlĂĽsse ĂĽber einige Variablen zu ziehen. Dabei wird die
Modellunsicherheit, also die Tatsache, dass der
Modellauswahlschritt mit den gleichen Daten durchgefĂĽhrt wurde,
ignoriert, obwohl man weiĂź, dass dies zu ungĂĽltigen
Schlussfolgerungen fĂĽhrt. Die vorliegende Arbeit untersucht einige
Aspekte des Problems sowohl aus bayesianischer als auch aus
frequentistischer Sicht und macht neue Vorschläge, wie mit dem
Problem umgegangen werden kann. Wir untersuchen bayesianische
Modellmittelung (Bayesian model averaging =BMA) und zeigen, dass
dessen frequentistisches Abschneiden nicht immer wohldefiniert ist,
denn in einigen Fällen ist es unklar, ob BMA wirklich bayesianisch
ist. Wir illustrieren diesen Punkt mit einer „vollständigen
bayesianische Modellmittelung“, die anwendbar ist, wenn die
interessierende Größe parametrisch ist. Wir stellen ein System vor,
das die Komplexität von Schätzern nach der Modellauswahl aufdeckt
(„post-model-selection Schätzer“) und untersuchen ihre
Eigenschaften im Kontext der linearen Regression fĂĽr eine Vielzahl
an Modellauswahlprozeduren. Wir zeigen, dass kein
Modellauswahlkriterium gleichmäßig besser ist als alle anderen, im
Sinne der Risikofunktion. SchlĂĽsselzutaten des Problems werden
identifiziert und verwendet, um zu zeigen, dass selbst konsistente
Modellauswahlkriterien das Problem der Modellauswahlunsicherheit
nicht lösen. Wir argumentieren außerdem, dass das Bedingen der
Analyse auf die Teilmenge des Stichprobenraumes, die zu einem
bestimmten Modell führte, unvollständig ist. Wir betrachten das
Problem aus frequentistischer Sicht. Obwohl Modellmittelung und
Modellauswahl normalerweise als zwei getrennte Herangehensweisen
betrachtet werden, schlagen wir vor, das zweite als Spezialfall der
Modellmittelung zu betrachten, in welcher die (zufälligen) Gewichte
den Wert 1 für das ausgewählte Modell annehmen und 0 für alle
anderen. Aus dieser Perspektive, und da die optimalen Gewichte in
der Praxis nicht bestimmt werden können, kann nicht erwartet
werden, dass eine der zwei Methoden die andere konsistent
ĂĽbertrifft. Es fĂĽhrt uns dazu, alternative Gewichte fĂĽr die
Mittelung vorzuschlagen, die dazu gedacht sind, die
post-model-selection Schätzung zu verbessern. Die Innovation
besteht darin, die Modellauswahlprozedur bei der Bestimmung der
Gewichte zu berĂĽcksichtigen. Wir vergleichen die verschiedenen
Methoden für einige einfache Fälle (lineare Regression und
Häufigkeitsschätzung). Wir zeigen, dass Bootstrapverfahren keine
guten Schätzer für die Eigenschaften der post-model-selection
Schätzer liefern. Zurückkehrend zur bayesianischen Sicht zeigen wir
auf, dass, solange die Analyse bedingt auf die Daten stattfindet,
Modellauswahlunsicherheit kein Problem ist, nur die Unsicherheit
des Modells an sich. Wenn jemand allerdings an den
frequentistischen Eigenschaften der bayesianischen
post-model-selection Schätzern interessiert ist, ist die Situation
analog zu der in der frequentistischen Analyse. Hier schlagen wir
wieder eine Alternative zur gewöhnlichen BMA vor, in der die
Gewichte von den Auswahlkriterien des Modells abhängen und somit
die Auswahlprozedur berĂĽcksichtigen. Wir zeigen auĂźerdem, dass die
Eigenschaften von Modellmittelung und post-model-selection
Schätzern nur unter einem angenommenen wahren Modell hergeleitet
werden können. Unter einer solchen Annahme würde man allerdings
einfach das wahre Modell nehmen, ohne Modellwahl oder
Modellmittelung anzuwenden. Dieser Zirkelschluss macht es so
schwierig, mit dem Problem umzugehen. Traditionelle explorative
frequentistische Datenanalyse und Aufstellung eines Modells kann
als eine informelle Modellwahl betrachtet werden, in welcher die
genaue Modellauswahlprozedur schwierig zu rekonstruieren ist, was
es besonders schwierig macht, gĂĽltige Schlussfolgerungen zu ziehen.
Ohne die Debatte ĂĽber Vor- und Nachteile der bayesianischen und
frequentistischen Methoden zu führen, möchten wir betonen, dass
bayesianische Methoden vorzuziehen sind, um
Modellauswahlunsicherheit zu vermeiden, solange die
frequentistischen Eigenschaften des resultierenden Schätzers nicht
von Interesse sind
Using weight-for-age for predicting wasted children
Background: The equipments for taking body weights (scales) are more frequent in Cameroon health centres than measuring boards for heights. Even when the later exist there are some difficulties inherent in their qualities; thus the height measurement is not always available or accurate. Objective: To construct statistical models for predicting wasting from weight-for-age. Methods: 3742 children a ged 0 to 59 months were enrolled in a cross-sectional household survey (2004 Cameroon Demographic and Health Surveys (DHS)) covering the entire Cameroon national territory. Results: There were highly significant association between underweight and wasting. For all discriminant statistical methods used, the test error rates (using an independent testing sample) are less than 5%; the Area Under the Curve (AUC) using the Receiver Operating Characteristic (ROC) is 0.86. Conclusions: Weight-for-age can be used for accurately classifying a child whose wasting status is unknown. The result is useful in Cameroon as too often the height measurements may not be feasible, thus the need for estimating wasted children.Anthropometric measures, nutritional status, discriminant analysis, underweight, wasting
Modeling hierarchical relationships in epidemiological studies: a Bayesian networks approach
Hierarchical relationships between risk factors are seldom taken into account in epidemiological studies though some authors stressed the importance of doing so, and proposed a conceptual framework in which each level of the hierarchy is modeled separately. The objective of this paper was to implement a simple version of their framework, and to propose an alternative procedure based on a Bayesian Network (BN). These approaches were illustrated in modeling the risk of diarrhea infection for 2740 children aged 0 to 59 months in Cameroon. The authors implemented a (naĂŻve) logistic regression, a step-level logistic regression and also a BN. While the first approach is inadequate, the two others approaches both account for the hierarchical structure but to different estimates and interpretations. BN implementation showed that a child in a family in the poorest group has respectively 89%, 40% and 18% probabilities of having poor sanitation, being malnourished and having diarrhea. An advantage of the latter approach is that it enables one to determine the probability that a risk factor (and/or the outcome) is in a given state, given the states of the others. Although the BN considered here is very simple, the method can deal with more complicated models
Post-model selection inference and model averaging
Although model selection is routinely used in practice nowadays, little is known about its precise effects on any subsequent inference that is carried out. The same goes for the effects induced by the closely related technique of model averaging. This paper is concerned with the use of the same data first to select a model and then to carry out inference, in particular point estimation and point prediction. The properties of the resulting estimator, called a post-model-selection estimator (PMSE), are hard to derive. Using selection criteria such as hypothesis testing, AIC, BIC, HQ and Cp, we illustrate that, in terms of risk function, no single PMSE dominates the others. The same conclusion holds more generally for any penalised likelihood information criterion. We also compare various model averaging schemes and show that no single one dominates the others in terms of risk function. Since PMSEs can be regarded as a special case of model averaging, with 0-1 random-weights, we propose a connection between the two theories, in the frequentist approach, by taking account of the selection procedure when performing model averaging. We illustrate the point by simulating a simple linear regression model
Modeling hierarchical relationships in epidemiological studies: a Bayesian networks approach
Hierarchical relationships between risk factors are seldom taken into account in epidemiological studies though some authors stressed the importance of doing so, and proposed a conceptual framework in which each level of the hierarchy is modeled separately. The objective of this paper was to implement a simple version of their framework, and to propose an alternative procedure based on a Bayesian Network (BN). These approaches were illustrated in modeling the risk of diarrhea infection for 2740 children aged 0 to 59 months in Cameroon. The authors implemented a (naĂŻve) logistic regression, a step-level logistic regression and also a BN. While the first approach is inadequate, the two others approaches both account for the hierarchical structure but to different estimates and interpretations. BN implementation showed that a child in a family in the poorest group has respectively 89%, 40% and 18% probabilities of having poor sanitation, being malnourished and having diarrhea. An advantage of the latter approach is that it enables one to determine the probability that a risk factor (and/or the outcome) is in a given state, given the states of the others. Although the BN considered here is very simple, the method can deal with more complicated models
Multidimensional Nature of Undernutrition: A Statistical Approach
The statistical assessment of undernutrition is usually restricted to a pairwise analysis of
anthropometric indicators. The main objective of this study was to model the associations between
underweight, stunting and wasting and to check whether multidimensionality of undernutrition can be
justified from a purely statistical point of view. 3742 children aged 0 to 59 months were enrolled in a
cross-sectional household survey (2004 Cameroon Demographic and Health Surveys (DHS)). The
saturated loglinear model and the multiple correspondence analysis (MCA) showed no interaction and a
highly significant association between underweight and stunting (P=0), underweight and wasting (P=0);
but not between stunting and wasting (P=0.430). Cronbach's alpha coefficient between weight-for-age,
height-for-age and weight-for-height was 0.62 (95% CI 0.59, 0.64). Thus, the study of these associations
is not straightforward as it would appear in a first instance. The lack of three-factor interaction and the
value of the Cronbach's alpha coefficient indicate that undernutrition is indeed (statistically)
multidimensional. The three indicators are not statistically redundant; thus for the case of Cameroon
the choice of a particular anthropometric indicator should depend on the goal of the policy maker, as it
comes out of this study that no single indicator is to be used for all situations
Multidimensional Nature of Undernutrition: A Statistical Approach
The statistical assessment of undernutrition is usually restricted to a pairwise analysis of
anthropometric indicators. The main objective of this study was to model the associations between
underweight, stunting and wasting and to check whether multidimensionality of undernutrition can be
justified from a purely statistical point of view. 3742 children aged 0 to 59 months were enrolled in a
cross-sectional household survey (2004 Cameroon Demographic and Health Surveys (DHS)). The
saturated loglinear model and the multiple correspondence analysis (MCA) showed no interaction and a
highly significant association between underweight and stunting (P=0), underweight and wasting (P=0);
but not between stunting and wasting (P=0.430). Cronbach's alpha coefficient between weight-for-age,
height-for-age and weight-for-height was 0.62 (95% CI 0.59, 0.64). Thus, the study of these associations
is not straightforward as it would appear in a first instance. The lack of three-factor interaction and the
value of the Cronbach's alpha coefficient indicate that undernutrition is indeed (statistically)
multidimensional. The three indicators are not statistically redundant; thus for the case of Cameroon
the choice of a particular anthropometric indicator should depend on the goal of the policy maker, as it
comes out of this study that no single indicator is to be used for all situations
- …