731 research outputs found

    Formal and Informal Model Selection with Incomplete Data

    Full text link
    Model selection and assessment with incomplete data pose challenges in addition to the ones encountered with complete data. There are two main reasons for this. First, many models describe characteristics of the complete data, in spite of the fact that only an incomplete subset is observed. Direct comparison between model and data is then less than straightforward. Second, many commonly used models are more sensitive to assumptions than in the complete-data situation and some of their properties vanish when they are fitted to incomplete, unbalanced data. These and other issues are brought forward using two key examples, one of a continuous and one of a categorical nature. We argue that model assessment ought to consist of two parts: (i) assessment of a model's fit to the observed data and (ii) assessment of the sensitivity of inferences to unverifiable assumptions, that is, to how a model described the unobserved data given the observed ones.Comment: Published in at http://dx.doi.org/10.1214/07-STS253 the Statistical Science (http://www.imstat.org/sts/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Bayesian model selection techniques as decision support for shaping a statistical analysis plan of a clinical trial: An example from a vertigo phase III study with longitudinal count data as primary endpoint

    Get PDF
    Background: A statistical analysis plan (SAP) is a critical link between how a clinical trial is conducted and the clinical study report. To secure objective study results, regulatory bodies expect that the SAP will meet requirements in pre-specifying inferential analyses and other important statistical techniques. To write a good SAP for model-based sensitivity and ancillary analyses involves non-trivial decisions on and justification of many aspects of the chosen setting. In particular, trials with longitudinal count data as primary endpoints pose challenges for model choice and model validation. In the random effects setting, frequentist strategies for model assessment and model diagnosis are complex and not easily implemented and have several limitations. Therefore, it is of interest to explore Bayesian alternatives which provide the needed decision support to finalize a SAP. Methods: We focus on generalized linear mixed models (GLMMs) for the analysis of longitudinal count data. A series of distributions with over-and under-dispersion is considered. Additionally, the structure of the variance components is modified. We perform a simulation study to investigate the discriminatory power of Bayesian tools for model criticism in different scenarios derived from the model setting. We apply the findings to the data from an open clinical trial on vertigo attacks. These data are seen as pilot data for an ongoing phase III trial. To fit GLMMs we use a novel Bayesian computational approach based on integrated nested Laplace approximations (INLAs). The INLA methodology enables the direct computation of leave-one-out predictive distributions. These distributions are crucial for Bayesian model assessment. We evaluate competing GLMMs for longitudinal count data according to the deviance information criterion (DIC) or probability integral transform(PIT), and by using proper scoring rules (e.g. the logarithmic score). Results: The instruments under study provide excellent tools for preparing decisions within the SAP in a transparent way when structuring the primary analysis, sensitivity or ancillary analyses, and specific analyses for secondary endpoints. The mean logarithmic score and DIC discriminate well between different model scenarios. It becomes obvious that the naive choice of a conventional random effects Poisson model is often inappropriate for real-life count data. The findings are used to specify an appropriate mixed model employed in the sensitivity analyses of an ongoing phase III trial. Conclusions: The proposed Bayesian methods are not only appealing for inference but notably provide a sophisticated insight into different aspects of model performance, such as forecast verification or calibration checks, and can be applied within the model selection process. The mean of the logarithmic score is a robust tool for model ranking and is not sensitive to sample size. Therefore, these Bayesian model selection techniques offer helpful decision support for shaping sensitivity and ancillary analyses in a statistical analysis plan of a clinical trial with longitudinal count data as the primary endpoint

    Assessment of contributing factors to the reduction of diarrhea in rural communities of Para, Brazil

    Get PDF
    In developing communities the occurrence of diarrhea has been reported at elevated levels as compared to those communities in more developed regions. Diarrheal diseases were linked to over one million deaths in 2012 throughout the world. While multiple pathways are present for the transmission of diarrheal diseases, water has been the focus for many aid organizations. Point-of-use (POU) water treatment methods are a common tool used by aid organizations in efforts to provide potable water. The CAWST biosand filter is a POU tool that has shown removal effectiveness of pathogenic microorganisms ranging from 90-99%. However, minimal literature was found that reported on the effectiveness of the filter within the larger body of the complex system found in all communities. Therefore a hypothesis was derived to confirm that the intervention of a CAWST biosand filter is the most significant factor in the reduction of the diarrheal health burden within households in developing regions. Communities located along the Amazon River in Para, Brazil were selected for study. Structural Equation Modeling (SEM) was utilized to aid in representing the complex set of relationships within the communities. The Mahalanobis-Taguchi Strategy (MTS) was also used to confirm variable significance in the SEM model. Results show that while the biosand filter does aid in the reduction of diarrheal occurrences it is not the most significant factor. Results varied on which factor influenced diarrheal occurrences the greatest but consistently included education, economic status, and sanitation. Further, results from the MTS analysis reported education as the largest factor influencing household health. Continued work is needed for further understanding of these factors and their relationships to diarrhea reduction. --Abstract, page iv

    Novel techniques for kinetic model identification and improvement

    Get PDF
    Physics-based kinetic models are regarded as key tools for supporting the design and control of chemical processes and for understanding which degrees of freedom ultimately determine the observed behaviour of chemical systems. These models are formulated as sets of differential and algebraic equations where many state variables and parameters may be involved. Nonetheless, the translation of the available experimental evidence into an appropriate set of model equations is a time and resource intensive task that significantly relies on the presence of experienced scientists. Automated reactor platforms are increasingly being applied in research laboratories to generate large amounts of kinetic data with minimum human intervention. However, in most cases, these platforms do not implement software for the online identification of physics-based kinetic models. While automated reactor technologies have significantly improved the efficiency in the data collection process, the analysis of the data for modelling purposes still represents a tedious process that is mainly carried out a-posteriori by the scientist. This project focuses on how to systematically solve some relevant problems in kinetic modelling studies that would normally require the intervention of experienced modellers to be addressed. Specifically, the following challenges are considered: i) the selection of a robust model parametrisation to reduce the chance of numerical failures in the course of the model identification process; ii) the experimental design and parameter estimation problems in conditions of structural model uncertainty; iii) the improvement of approximated models embracing the available experimental evidence. The work presented in this Thesis paves the way towards fully automated kinetic modelling platforms through the development of intelligent algorithms for experimental design and model building under system uncertainty. The project aims at the definition of comprehensive and systematic modelling frameworks to make the modelling activity more efficient and less sensitive to human error and bias

    Self-Validated Ensemble Modelling

    Get PDF
    An important objective when performing designed experiments is to build models that predict future performance of a system in study; e.g. predict future yields of a bio-process used to manufacture therapeutic proteins. Because experimentation is costly experimental designs are structured to be efficient in terms of the number of trials while providing substantial information about the behavior of the physical system. The strategy to build accurate predictive models in larger data sets is to partition the data into a training set, used to fit the model, and a validation set to access prediction performance. Models are selected that have the lowest prediction error on the validation set. However, designed experiments are usually small in sample size and have a fixed structure which precludes partitioning of any kind; the entire set must be used for training. Contemporary methods use information criteria like the AICc or BIC with model algorithms such as Forward Selection or Lasso to select candidate models. These surrogate prediction measures often produce models with poor prediction performance relative to models selected using a validation procedure such ascross validation. This approach also uses a single fit from a model algorithm which we show to be insufficient. We propose a novel approach that allows the original data set to function as both a training set and a validation set. We accomplish this auto-validation strategy by employing a unique fractionally re-weighted bootstrapping technique. The weighting scheme is structured to induce anti-correlation between the original set and the auto-validation copy. We randomly assign new fractional weights using the bootstrap algorithm and fit a predictive model. This procedure is iterated many times producing a new model each time. The final model is the average of these models. We refer to this new methodology as Self-Validated Ensemble Modeling (SVEM). In this dissertation we investigate the performance of the SVEM algorithm across various scenarios: different model selection algorithms, different designs with varying sample sizes, model noise levels, and sparsity. This investigation shows that SVEM outperforms contemporary one-shot model selection approaches

    Untangling hotel industry’s inefficiency: An SFA approach applied to a renowned Portuguese hotel chain

    Get PDF
    The present paper explores the technical efficiency of four hotels from Teixeira Duarte Group - a renowned Portuguese hotel chain. An efficiency ranking is established from these four hotel units located in Portugal using Stochastic Frontier Analysis. This methodology allows to discriminate between measurement error and systematic inefficiencies in the estimation process enabling to investigate the main inefficiency causes. Several suggestions concerning efficiency improvement are undertaken for each hotel studied.info:eu-repo/semantics/publishedVersio

    Bayesian inference for protein signalling networks

    Get PDF
    Cellular response to a changing chemical environment is mediated by a complex system of interactions involving molecules such as genes, proteins and metabolites. In particular, genetic and epigenetic variation ensure that cellular response is often highly specific to individual cell types, or to different patients in the clinical setting. Conceptually, cellular systems may be characterised as networks of interacting components together with biochemical parameters specifying rates of reaction. Taken together, the network and parameters form a predictive model of cellular dynamics which may be used to simulate the effect of hypothetical drug regimens. In practice, however, both network topology and reaction rates remain partially or entirely unknown, depending on individual genetic variation and environmental conditions. Prediction under parameter uncertainty is a classical statistical problem. Yet, doubly uncertain prediction, where both parameters and the underlying network topology are unknown, leads to highly non-trivial probability distributions which currently require gross simplifying assumptions to analyse. Recent advances in molecular assay technology now permit high-throughput data-driven studies of cellular dynamics. This thesis sought to develop novel statistical methods in this context, focussing primarily on the problems of (i) elucidating biochemical network topology from assay data and (ii) prediction of dynamical response to therapy when both network and parameters are uncertain

    Vol. 15, No. 1 (Full Issue)

    Get PDF

    Pharmacology based toxicity assessment : towards quantitative risk prediction in humans

    Get PDF
    This thesis focuses on the implications of empirical evidence generation for the evaluation of safety and toxicity during drug development. A shift in paradigm is proposed to 1) ensure that pharmacological concepts are incorporated into the evaluation of safety and toxicity; 2) facilitate the integration of historical evidence and thereby the translation of findings across species; and 3) promote the use of experimental protocols tailored to address specific safety and toxicity questions. Nonlinear-mixed effects modelling is recommended as a tool to account for such requirements. Our goal was to explore the feasibility of a model-based approach to toxicology assessment and risk prediction in humans and, where possible, to compare the performance of this approach to traditional safety assessment approaches. The investigational plan of the thesis was divided into two sections where the development of methodology is followed by a case study with real data. A variety of analysis strategies and protocol designs are investigated where we set the constraint that proposals to deviate from existing protocols be minimal. We finally compile recommendations for protocol optimisation and data analysis/interpretation strategies to facilitate the implementation of model-based techniques in safety pharmacology and toxicology researchUBL - phd migration 201
    • …
    corecore