5 research outputs found

    Model selection methods in the linear mixed model for longitudinal data

    Get PDF
    The increased use of repeated measures for longitudinal studies has resulted in the necessity for more research in the modeling of this type of data. In this dissertation, we extend three candidate model selection methods from the univariate linear model to the linear mixed model, and investigate their behavior. Mallows' Cp statistic was developed for the univariate linear model in 1964. Here we propose a Cp statistic for the linear mixed model and show that it can be a promising method for fixed effects selection. Of all the methods investigated in this dissertation, the Cp statistic gave the most favorable results in terms of fixed effects selection and is the least computationally demanding of all the candidate methods. The KIC statistic, a symmetric divergence information criteria, explored here appears to be promising as a model selection method for both fixed effects and covariance structure. In the selection of the correct covariance structure, the KIC tended to hold middle ground between the AIC and the BIC. In terms of fixed effects, the KIC appears to perform significantly better than either the AIC or BIC in the selection of fixed effects when there is no interaction effect present. The predicted sum of squares (PRESS) statistic has been developed for the linear mixed model and is available in the SAS statistical software, but its abilities as a model selection method lacked sufficient evaluation. From our study, it appears that the PRESS statistic does not add much as a fixed effect selection method compared to the Cp or the KIC while being more computationally intensive. All three criteria are investigated using simulation studies and a large example dataset evaluating health outcomes in the elderly to determine their reliability. As a by-product of this research, the reliability of standard selection criteria in the linear mixed model, namely the AIC and BIC, are also evaluated. Numerous areas of future research within the context of model selection methods in the linear mixed model, are identified

    Lag length selection for vector error correction models

    Get PDF
    This thesis investigates the problem of model identification in a Vector Autoregressive framework. The study reviews the existing research, conducts an extensive simulation based analysis of thirteen information theoretic criterion (IC), one of which is a novel derivation. The simulation exercise considers the evaluation of seven alternative error restricted vector autoregressive models with four different lag lengths. Alternative sample sizes and parameterisations are also evaluated and compared to results in the existing literature. The results of the comparative analysis provide strong support for the efficiency based criterion of Akaike and in particular the selection capability of the novel criterion, referred to as a modified corrected Akaike information criterion, demonstrates useful finite sample properties

    Uncertainty Assessment of Hydrogeological Models Based on Information Theory

    Get PDF
    There is a great deal of uncertainty in hydrogeological modeling. Overparametrized models increase uncertainty since the information of the observations is distributed through all of the parameters. The present study proposes a new option to reduce this uncertainty. A way to achieve this goal is to select a model which provides good performance with as few calibrated parameters as possible (parsimonious model) and to calibrate it using many sources of information. Akaike’s Information Criterion (AIC), proposed by Hirotugu Akaike in 1973, is a statistic-probabilistic criterion based on the Information Theory, which allows us to select a parsimonious model. AIC formulates the problem of parsimonious model selection as an optimization problem across a set of proposed conceptual models. The AIC assessment is relatively new in groundwater modeling and it presents a challenge to apply it with different sources of observations. In this dissertation, important findings in the application of AIC in hydrogeological modeling using different sources of observations are discussed. AIC is tested on ground-water models using three sets of synthetic data: hydraulic pressure, horizontal hydraulic conductivity, and tracer concentration. In the present study, the impact of the following factors is analyzed: number of observations, types of observations and order of calibrated parameters. These analyses reveal not only that the number of observations determine how complex a model can be but also that its diversity allows for further complexity in the parsimonious model. However, a truly parsimonious model was only achieved when the order of calibrated parameters was properly considered. This means that parameters which provide bigger improvements in model fit should be considered first. The approach to obtain a parsimonious model applying AIC with different types of information was successfully applied to an unbiased lysimeter model using two different types of real data: evapotranspiration and seepage water. With this additional independent model assessment it was possible to underpin the general validity of this AIC approach.Hydrogeologische Modellierung ist von erheblicher Unsicherheit geprägt. Überparametrisierte Modelle erhöhen die Unsicherheit, da gemessene Informationen auf alle Parameter verteilt sind. Die vorliegende Arbeit schlägt einen neuen Ansatz vor, um diese Unsicherheit zu reduzieren. Eine Möglichkeit, um dieses Ziel zu erreichen, besteht darin, ein Modell auszuwählen, das ein gutes Ergebnis mit möglichst wenigen Parametern liefert („parsimonious model“), und es zu kalibrieren, indem viele Informationsquellen genutzt werden. Das 1973 von Hirotugu Akaike vorgeschlagene Informationskriterium, bekannt als Akaike-Informationskriterium (engl. Akaike’s Information Criterion; AIC), ist ein statistisches Wahrscheinlichkeitskriterium basierend auf der Informationstheorie, welches die Auswahl eines Modells mit möglichst wenigen Parametern erlaubt. AIC formuliert das Problem der Entscheidung für ein gering parametrisiertes Modell als ein modellübergreifendes Optimierungsproblem. Die Anwendung von AIC in der Grundwassermodellierung ist relativ neu und stellt eine Herausforderung in der Anwendung verschiedener Messquellen dar. In der vorliegenden Dissertation werden maßgebliche Forschungsergebnisse in der Anwendung des AIC in hydrogeologischer Modellierung unter Anwendung unterschiedlicher Messquellen diskutiert. AIC wird an Grundwassermodellen getestet, bei denen drei synthetische Datensätze angewendet werden: Wasserstand, horizontale hydraulische Leitfähigkeit und Tracer-Konzentration. Die vorliegende Arbeit analysiert den Einfluss folgender Faktoren: Anzahl der Messungen, Arten der Messungen und Reihenfolge der kalibrierten Parameter. Diese Analysen machen nicht nur deutlich, dass die Anzahl der gemessenen Parameter die Komplexität eines Modells bestimmt, sondern auch, dass seine Diversität weitere Komplexität für gering parametrisierte Modelle erlaubt. Allerdings konnte ein solches Modell nur erreicht werden, wenn eine bestimmte Reihenfolge der kalibrierten Parameter berücksichtigt wurde. Folglich sollten zuerst jene Parameter in Betracht gezogen werden, die deutliche Verbesserungen in der Modellanpassung liefern. Der Ansatz, ein gering parametrisiertes Modell durch die Anwendung des AIC mit unterschiedlichen Informationsarten zu erhalten, wurde erfolgreich auf einen Lysimeterstandort übertragen. Dabei wurden zwei unterschiedliche reale Messwertarten genutzt: Evapotranspiration und Sickerwasser. Mit Hilfe dieser weiteren, unabhängigen Modellbewertung konnte die Gültigkeit dieses AIC-Ansatzes gezeigt werden

    Optimisation de critères de choix de modèles pour un faible nombre de données

    No full text
    Dans ce travail, nous proposons un critère de choix de modèles fondé sur la divergence symétrique de Kullback. Le critère proposé, noté KICc, est une version améliorée du critère asymptotique KIC (Cavanaugh, Statistics and Probability Letters, vol. 42, 1999) pour un faible nombre de données. KICc est un estimateur exactement non biaisé pour les modèles de régression linéaires et approximativement non biaisé pour les modèles autorégressives et les modèles de régression non linéaires. Les deux critères KIC et KICc sont développés sous l'hypothèse que le modèle générateur est correctement spécifié ou sur-paramétré par le modèle candidat. Nous étudions, dans le cas où cette hypothèse n'est pas vérifiée, les propriétés du biais des deux estimateurs KIC et KICc et la qualité des modèles qu'ils sélectionnent. Nous considérons aussi le PKIC, une extension du critère KICc dans un cadre de modélisation où les données d'intérêt, appelées données futures, sont indirectement observées ou manquantes. Le KICc est proposé pour résoudre le problème de débruitage d'un signal déterministe noyé dans du bruit en utilisant une projection sur une base orthogonale. La séparation sous espaces signal et bruit est faite en retenant la base minimisant le critère KICc. Finalement, nous proposons une optimisation calculatoire d'un critère de sélection de modèles fondé sur le principe de la validation croisée et en utilisant la densité prédictive bayésienne comme modèle probabiliste pour les données futures. Le critère proposé, noté CVBPD, est un critère de sélection de modèles consistant pour les modèles de régression linéaireIn this work we propose a model selection criterion based on Kullback's symmetric divergence. The developed criterion, called KICc is a bias corrected version of the asymptotic criterion KIC (Cavanaugh, Statistics and Probability Letters, vol. 42, 1999). The correction is of particular use when the sample size is small or when the number of fitted parameters is moderate to large fraction of the sample size. KICc is an exactly unbiased estimator for linear regression models and appreciatively unbiased for autoregressive and nonlinear regression models. The two criteria KIC and KICc are developed under the assumption that the true model is correctly specified or overfitted by the candidate models. We investigate the bias properties and the model selection performance of the two criteria in the underfitted case. An extension of KICc, called PKIC is also developed for the case of future experiment where date of interest is missing or indirectly observed. The KICc is implemented to solve the problem of denoising by using orthogonal projection and thresholding. The threshold is obtained as the absolute value of the kth largest coefficient that minimizes KICc. Finally, we propose a computational optimization of a cross validation based model selection criterion that uses the Bayesian predictive density as candidate model and marginal likelihood as a cost function. The developed criterion, CVBPD, is a consistent model selection criterion for linear regression.ORSAY-PARIS 11-BU Sciences (914712101) / SudocSudocFranceF
    corecore