Search CORE

15 research outputs found

Model selection methods in the linear mixed model for longitudinal data

Author: Abraham Anita Ann
Publication venue
Publication date: 01/08/2008
Field of study

The increased use of repeated measures for longitudinal studies has resulted in the necessity for more research in the modeling of this type of data. In this dissertation, we extend three candidate model selection methods from the univariate linear model to the linear mixed model, and investigate their behavior. Mallows' Cp statistic was developed for the univariate linear model in 1964. Here we propose a Cp statistic for the linear mixed model and show that it can be a promising method for fixed effects selection. Of all the methods investigated in this dissertation, the Cp statistic gave the most favorable results in terms of fixed effects selection and is the least computationally demanding of all the candidate methods. The KIC statistic, a symmetric divergence information criteria, explored here appears to be promising as a model selection method for both fixed effects and covariance structure. In the selection of the correct covariance structure, the KIC tended to hold middle ground between the AIC and the BIC. In terms of fixed effects, the KIC appears to perform significantly better than either the AIC or BIC in the selection of fixed effects when there is no interaction effect present. The predicted sum of squares (PRESS) statistic has been developed for the linear mixed model and is available in the SAS statistical software, but its abilities as a model selection method lacked sufficient evaluation. From our study, it appears that the PRESS statistic does not add much as a fixed effect selection method compared to the Cp or the KIC while being more computationally intensive. All three criteria are investigated using simulation studies and a large example dataset evaluating health outcomes in the elderly to determine their reliability. As a by-product of this research, the reliability of standard selection criteria in the linear mixed model, namely the AIC and BIC, are also evaluated. Numerous areas of future research within the context of model selection methods in the linear mixed model, are identified

Carolina Digital Repository

Predictive Inference Based on Markov Chain Monte Carlo Output

Author: Gneiting Tilmann
Krüger Fabian
Lerch Sebastian
Thorarinsdottir Thordis
Publication venue: John Wiley and Sons
Publication date: 27/10/2020
Field of study

In Bayesian inference, predictive distributions are typically in the form of samples generated via Markov chain Monte Carlo or related algorithms. In this paper, we conduct a systematic analysis of how to make and evaluate probabilistic forecasts from such simulation output. Based on proper scoring rules, we develop a notion of consistency that allows to assess the adequacy of methods for estimating the stationary distribution underlying the simulation output. We then provide asymptotic results that account for the salient features of Bayesian posterior simulators and derive conditions under which choices from the literature satisfy our notion of consistency. Importantly, these conditions depend on the scoring rule being used, such that the choices of approximation method and scoring rule are intertwined. While the logarithmic rule requires fairly stringent conditions, the continuous ranked probability score yields consistent approximations under minimal assumptions. These results are illustrated in a simulation study and an economic data example. Overall, mixture‐of‐parameters approximations that exploit the parametric structure of Bayesian models perform particularly well. Under the continuous ranked probability score, the empirical distribution function is a simple and appealing alternative option

KITopen

Predictive Inference Based on Markov Chain Monte Carlo Output

Author: Gneiting Tilmann
Krüger Fabian
Lerch Sebastian
Thorarinsdottir Thordis
Publication venue: John Wiley and Sons
Publication date: 24/06/2020
Field of study

arXiv.org e-Print Archive

KITopen

Features and Measures for Speaker Recognition

Author: Campbell Joseph Paul Jr.
Publication venue: 'Oklahoma State University Library'
Publication date: 01/12/1992
Field of study

Electrical Engineerin

SHAREOK repository

Bayesian model averaging on hydraulic conductivity estimation and groundwater head prediction

Author: Li Xiaobao
Publication venue: LSU Digital Commons
Publication date: 01/01/2008
Field of study

Characterization of aquifer heterogeneity is inherently difficult because of the insufficiency of data, the inflexibility of parameterization methods, and non-uniqueness of parameterization methods. Groundwater predictions are greatly affected by multiple interpretations of aquifer properties and the uncertainties of model parameters. This study introduces a Bayesian model averaging (BMA) method along with multiple generalized parameterization (GP) methods to identify hydraulic conductivity and along with multiple simulation models to predict groundwater head and quantify the prediction uncertainty. Two major issues about BMA are discussed. The first problem is with using Occam’s window in usual BMA applications. Occam’s window only accepts models in a very narrow range, tending to single out the best method and discard other good methods. A variance window is proposed to replace Occam’s window to cope with this problem. The second problem is with using the Kashyap information criterion (KIC) in the approximation of posterior model probabilities, which tends to prefer highly uncertain model by considering the Fisher information matrix. The Bayesian information criterion (BIC) is recommended because it is able to avoid controversial results and it is computationally efficient. Numerical examples are designed to test the Bayesian model averaging method on hydraulic conductivity identification and groundwater head prediction. The proposed methodologies are then applied to the hydraulic conductivity identification of the Alamitos Gap area, and the hydraulic conductivity estimation and groundwater head prediction of the “1,500-foot” sand in East Baton Rouge Parish, Louisiana. The results show that the GP method provides great flexibility in parameterization with small conditional variance. The use of the variance window is necessary to avoid a dominant model when many models perform equally well. Compared to KIC, BIC is able to give an unbiased posterior model probability. It is also concluded that the uncertainty increases by including multiple models under the BMA framework, but risks are reduced by avoiding overconfidence in the solution from one model

Louisiana State University

Discriminative, generative, and imitative learning

Author: Jebara Tony (Tony S.), 1974-
Publication venue: Massachusetts Institute of Technology
Publication date: 01/01/2002
Field of study

Thesis (Ph. D.)--Massachusetts Institute of Technology, School of Architecture and Planning, Program in Media Arts and Sciences, 2002.Includes bibliographical references (leaves 201-212).I propose a common framework that combines three different paradigms in machine learning: generative, discriminative and imitative learning. A generative probabilistic distribution is a principled way to model many machine learning and machine perception problems. Therein, one provides domain specific knowledge in terms of structure and parameter priors over the joint space of variables. Bayesian networks and Bayesian statistics provide a rich and flexible language for specifying this knowledge and subsequently refining it with data and observations. The final result is a distribution that is a good generator of novel exemplars. Conversely, discriminative algorithms adjust a possibly non-distributional model to data optimizing for a specific task, such as classification or prediction. This typically leads to superior performance yet compromises the flexibility of generative modeling. I present Maximum Entropy Discrimination (MED) as a framework to combine both discriminative estimation and generative probability densities. Calculations involve distributions over parameters, margins, and priors and are provably and uniquely solvable for the exponential family. Extensions include regression, feature selection, and transduction. SVMs are also naturally subsumed and can be augmented with, for example, feature selection, to obtain substantial improvements. To extend to mixtures of exponential families, I derive a discriminative variant of the Expectation-Maximization (EM) algorithm for latent discriminative learning (or latent MED).(cont.) While EM and Jensen lower bound log-likelihood, a dual upper bound is made possible via a novel reverse-Jensen inequality. The variational upper bound on latent log-likelihood has the same form as EM bounds, is computable efficiently and is globally guaranteed. It permits powerful discriminative learning with the wide range of contemporary probabilistic mixture models (mixtures of Gaussians, mixtures of multinomials and hidden Markov models). We provide empirical results on standardized data sets that demonstrate the viability of the hybrid discriminative-generative approaches of MED and reverse-Jensen bounds over state of the art discriminative techniques or generative approaches. Subsequently, imitative learning is presented as another variation on generative modeling which also learns from exemplars from an observed data source. However, the distinction is that the generative model is an agent that is interacting in a much more complex surrounding external world. It is not efficient to model the aggregate space in a generative setting. I demonstrate that imitative learning (under appropriate conditions) can be adequately addressed as a discriminative prediction task which outperforms the usual generative approach. This discriminative-imitative learning approach is applied with a generative perceptual system to synthesize a real-time agent that learns to engage in social interactive behavior.by Tony Jebara.Ph.D

DSpace@MIT

Bayesian Saltwater Intrusion Prediction and Remediation Design under Uncertainty

Author: Chitsazan Nima
Publication venue: LSU Digital Commons
Publication date: 01/01/2014
Field of study

Groundwater resources are vital for sustainable economic and demographic developments. Reliable prediction of groundwater head and contaminant transport is necessary for sustainable management of the groundwater resources. However, the groundwater simulation models are subjected to uncertainty in their predictions. The goals of this research are to: (1) quantify the uncertainty in the groundwater model predictions and (2) investigate the impact of the quantified uncertainty on the aquifer remediation designs. To pursue the first goal, this study generalizes the Bayesian model averaging (BMA) method and introduces the hierarchical Bayesian model averaging (HBMA) method that segregates and prioritizes sources of uncertainty in a hierarchical structure and conduct BMA for saltwater intrusion prediction. A BMA tree of models is developed to understand the impact of individual sources of uncertainty and uncertainty propagation on model predictions. The uncertainty analysis using HBMA leads to finding the best modeling proposition and to calculating the relative and absolute model weights. To pursue the second goal of the study, the chance-constrained (CC) programming is proposed to deal with the uncertainty in the remediation design. Prior studies of CC programming for the groundwater remediation designs are limited to considering parameter estimation uncertainty. This study combines the CC programming with the BMA and HBMA methods and proposes the BMA-CC framework and the HBMA-CC framework to also include the model structure uncertainty in the CC programming. The results show that the prediction variances from the parameter estimation uncertainty are much smaller than those from the model structure uncertainty. Ignoring the model structure uncertainty in the remediation design may lead to overestimating the design reliability, which can cause design failure

Louisiana State University

Recommended from our members

Extended Entropy Maximisation and Queueing Systems with Heavy-Tailed Distributions

Author: Mohamed Ismail A.M.
Publication venue: School of Computer Science, AI and Electronics. Faculty of Engineering and Digital Technologies
Publication date: 01/01/2022
Field of study

Numerous studies on Queueing systems, such as Internet traffic flows, have shown to be bursty, self-similar and/or long-range dependent, because of the heavy (long) tails for the various distributions of interest, including intermittent intervals and queue lengths. Other studies have addressed vacation in no-customers’ queueing system or when the server fails. These patterns are important for capacity planning, performance prediction, and optimization of networks and have a negative impact on their effective functioning. Heavy-tailed distributions have been commonly used by telecommunication engineers to create workloads for simulation studies, which, regrettably, may show peculiar queueing characteristics. To cost-effectively examine the impacts of different network patterns on heavy- tailed queues, new and reliable analytical approaches need to be developed. It is decided to establish a brand-new analytical framework based on optimizing entropy functionals, such as those of Shannon, Rényi, Tsallis, and others that have been suggested within statistical physics and information theory, subject to suitable linear and non-linear system constraints. In both discrete and continuous time domains, new heavy tail analytic performance distributions will be developed, with a focus on those exhibiting the power law behaviour seen in many Internet scenarios. The exposition of two major revolutionary approaches, namely the unification of information geometry and classical queueing systems and unifying information length theory with transient queueing systems. After conclusions, open problems arising from this thesis and limitations are introduced as future work

Bradford Scholars

MODELING HETEROTACHY IN PHYLOGENETICS

Author: Zhou Yan
Publication venue
Publication date: 01/04/2009
Field of study

Il a été démontré que l’hétérotachie, variation du taux de substitutions au cours du temps et entre les sites, est un phénomène fréquent au sein de données réelles. Échouer à modéliser l’hétérotachie peut potentiellement causer des artéfacts phylogénétiques. Actuellement, plusieurs modèles traitent l’hétérotachie : le modèle à mélange des longueurs de branche (MLB) ainsi que diverses formes du modèle covarion. Dans ce projet, notre but est de trouver un modèle qui prenne efficacement en compte les signaux hétérotaches présents dans les données, et ainsi améliorer l’inférence phylogénétique. Pour parvenir à nos fins, deux études ont été réalisées. Dans la première, nous comparons le modèle MLB avec le modèle covarion et le modèle homogène grâce aux test AIC et BIC, ainsi que par validation croisée. A partir de nos résultats, nous pouvons conclure que le modèle MLB n’est pas nécessaire pour les sites dont les longueurs de branche diffèrent sur l’ensemble de l’arbre, car, dans les données réelles, le signaux hétérotaches qui interfèrent avec l’inférence phylogénétique sont généralement concentrés dans une zone limitée de l’arbre. Dans la seconde étude, nous relaxons l’hypothèse que le modèle covarion est homogène entre les sites, et développons un modèle à mélanges basé sur un processus de Dirichlet. Afin d’évaluer différents modèles hétérogènes, nous définissons plusieurs tests de non-conformité par échantillonnage postérieur prédictif pour étudier divers aspects de l’évolution moléculaire à partir de cartographies stochastiques. Ces tests montrent que le modèle à mélanges covarion utilisé avec une loi gamma est capable de refléter adéquatement les variations de substitutions tant à l’intérieur d’un site qu’entre les sites. Notre recherche permet de décrire de façon détaillée l’hétérotachie dans des données réelles et donne des pistes à suivre pour de futurs modèles hétérotaches. Les tests de non conformité par échantillonnage postérieur prédictif fournissent des outils de diagnostic pour évaluer les modèles en détails. De plus, nos deux études révèlent la non spécificité des modèles hétérogènes et, en conséquence, la présence d’interactions entre différents modèles hétérogènes. Nos études suggèrent fortement que les données contiennent différents caractères hétérogènes qui devraient être pris en compte simultanément dans les analyses phylogénétiques.Heterotachy, substitution rate variation across sites and time, has shown to be a frequent phenomenon in the real data. Failure to model heterotachy could potentially cause phylogenetic artefacts. Currently, there are several models to handle heterotachy, the mixture branch length model (MBL) and several variant forms of the covarion model. In this project, our objective is to find a model that efficiently handles heterotachous signals in the data, and thereby improves phylogenetic inference. In order to achieve our goal, two individual studies were conducted. In the first study, we make comparisons among the MBL, covarion and homotachous models using AIC, BIC and cross validation. Based on our results, we conclude that the MBL model, in which sites have different branch lengths along the entire tree, is an over-parameterized model. Real data indicate that the heterotachous signals which interfere with phylogenetic inference are generally limited to a small area of the tree. In the second study, we relax the assumption of the homogeneity of the covarion parameters over sites, and develop a mixture covarion model using a Dirichlet process. In order to evaluate different heterogeneous models, we design several posterior predictive discrepancy tests to study different aspects of molecular evolution using stochastic mappings. The posterior predictive discrepancy tests demonstrate that the covarion mixture +Γ model is able to adequately model the substitution variation within and among sites. Our research permits a detailed view of heterotachy in real datasets and gives directions for future heterotachous models. The posterior predictive discrepancy tests provide diagnostic tools to assess models in detail. Furthermore, both of our studies reveal the non-specificity of heterogeneous models. Our studies strongly suggest that different heterogeneous features in the data should be handled simultaneously

Dépôt Institutionnel Numérique