Search CORE

5,067 research outputs found

Comparison of Estimators in GLM with Binary Data

Author: Kashid D. N.
Sakate D. M.
Publication venue: DigitalCommons@WayneState
Publication date: 01/11/2014
Field of study

Maximum likelihood estimates (MLE) of regression parameters in the generalized linear models (GLM) are biased and their bias is non negligible when sample size is small. This study focuses on the GLM with binary data with multiple observations on response for each predictor value when sample size is small. The performance of the estimation methods in Cordeiro and McCullagh (1991), Firth (1993) and Pardo et al. (2005) are compared for GLM with binary data using an extensive Monte Carlo simulation study. Performance of these methods for three real data sets is also compared

Digital Commons@Wayne State University

On the Properties of Simulation-based Estimators in High Dimensions

Author: Guerrier Stéphane
Karemera Mucyo
Orso Samuel
Victoria-Feser Maria-Pia
Publication venue
Publication date: 01/01/2018
Field of study

Considering the increasing size of available data, the need for statistical methods that control the finite sample bias is growing. This is mainly due to the frequent settings where the number of variables is large and allowed to increase with the sample size bringing standard inferential procedures to incur significant loss in terms of performance. Moreover, the complexity of statistical models is also increasing thereby entailing important computational challenges in constructing new estimators or in implementing classical ones. A trade-off between numerical complexity and statistical properties is often accepted. However, numerically efficient estimators that are altogether unbiased, consistent and asymptotically normal in high dimensional problems would generally be ideal. In this paper, we set a general framework from which such estimators can easily be derived for wide classes of models. This framework is based on the concepts that underlie simulation-based estimation methods such as indirect inference. The approach allows various extensions compared to previous results as it is adapted to possibly inconsistent estimators and is applicable to discrete models and/or models with a large number of parameters. We consider an algorithm, namely the Iterative Bootstrap (IB), to efficiently compute simulation-based estimators by showing its convergence properties. Within this framework we also prove the properties of simulation-based estimators, more specifically the unbiasedness, consistency and asymptotic normality when the number of parameters is allowed to increase with the sample size. Therefore, an important implication of the proposed approach is that it allows to obtain unbiased estimators in finite samples. Finally, we study this approach when applied to three common models, namely logistic regression, negative binomial regression and lasso regression

arXiv.org e-Print Archive

Archive ouverte UNIGE

Bayesian Estimation of Mixed Multinomial Logit Models: Advances and Simulation-Based Evaluations

Author: Bansal Prateek
Bierlaire Michel
Daziano Ricardo A.
Krueger Rico
Rashidi Taha H.
Publication venue: 'Elsevier BV'
Publication date: 12/12/2019
Field of study

Variational Bayes (VB) methods have emerged as a fast and computationally-efficient alternative to Markov chain Monte Carlo (MCMC) methods for scalable Bayesian estimation of mixed multinomial logit (MMNL) models. It has been established that VB is substantially faster than MCMC at practically no compromises in predictive accuracy. In this paper, we address two critical gaps concerning the usage and understanding of VB for MMNL. First, extant VB methods are limited to utility specifications involving only individual-specific taste parameters. Second, the finite-sample properties of VB estimators and the relative performance of VB, MCMC and maximum simulated likelihood estimation (MSLE) are not known. To address the former, this study extends several VB methods for MMNL to admit utility specifications including both fixed and random utility parameters. To address the latter, we conduct an extensive simulation-based evaluation to benchmark the extended VB methods against MCMC and MSLE in terms of estimation times, parameter recovery and predictive accuracy. The results suggest that all VB variants with the exception of the ones relying on an alternative variational lower bound constructed with the help of the modified Jensen's inequality perform as well as MCMC and MSLE at prediction and parameter recovery. In particular, VB with nonconjugate variational message passing and the delta-method (VB-NCVMP-Delta) is up to 16 times faster than MCMC and MSLE. Thus, VB-NCVMP-Delta can be an attractive alternative to MCMC and MSLE for fast, scalable and accurate estimation of MMNL models

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

Modeling association between DNA copy number and gene expression with constrained piecewise linear regression splines

Author: Leday Gwenaël G. R.
van de Wiel Mark A.
van der Vaart Aad W.
van Wieringen Wessel N.
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/01/2013
Field of study

DNA copy number and mRNA expression are widely used data types in cancer studies, which combined provide more insight than separately. Whereas in existing literature the form of the relationship between these two types of markers is fixed a priori, in this paper we model their association. We employ piecewise linear regression splines (PLRS), which combine good interpretation with sufficient flexibility to identify any plausible type of relationship. The specification of the model leads to estimation and model selection in a constrained, nonstandard setting. We provide methodology for testing the effect of DNA on mRNA and choosing the appropriate model. Furthermore, we present a novel approach to obtain reliable confidence bands for constrained PLRS, which incorporates model uncertainty. The procedures are applied to colorectal and breast cancer data. Common assumptions are found to be potentially misleading for biologically relevant genes. More flexible models may bring more insight in the interaction between the two markers.Comment: Published in at http://dx.doi.org/10.1214/12-AOAS605 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

VU Research Portal

Leiden University Scholary Publications

Vol. 13, No. 2 (Full Issue)

Author: Editors JMASM
Publication venue: DigitalCommons@WayneState
Publication date: 01/11/2014
Field of study

Digital Commons@Wayne State University

Entropy inference and the James-Stein estimator, with application to nonlinear gene association networks

Author: Hausser Jean
Strimmer Korbinian
Publication venue
Publication date: 01/01/2008
Field of study

We present a procedure for effective estimation of entropy and mutual information from small-sample data, and apply it to the problem of inferring high-dimensional gene association networks. Specifically, we develop a James-Stein-type shrinkage estimator, resulting in a procedure that is highly efficient statistically as well as computationally. Despite its simplicity, we show that it outperforms eight other entropy estimation procedures across a diverse range of sampling scenarios and data-generating models, even in cases of severe undersampling. We illustrate the approach by analyzing E. coli gene expression data and computing an entropy-based gene-association network from gene expression data. A computer program is available that implements the proposed shrinkage estimator.Comment: 18 pages, 3 figures, 1 tabl

arXiv.org e-Print Archive

CiteSeerX

Spiral - Imperial College Digital Repository

The University of Manchester - Institutional Repository

Marginal Likelihood Estimation with the Cross-Entropy Method

Author: Chan Joshua
Eisenstat Eric
Publication venue
Publication date: 01/01/2012
Field of study

We consider an adaptive importance sampling approach to estimating the marginal likelihood, a quantity that is fundamental in Bayesian model comparison and Bayesian model averaging. This approach is motivated by the difficulty of obtaining an accurate estimate through existing algorithms that use Markov chain Monte Carlo (MCMC) draws, where the draws are typically costly to obtain and highly correlated in high-dimensional settings. In contrast, we use the cross-entropy (CE) method, a versatile adaptive Monte Carlo algorithm originally developed for rare-event simulation. The main advantage of the importance sampling approach is that random samples can be obtained from some convenient density with little additional costs. As we are generating independent draws instead of correlated MCMC draws, the increase in simulation effort is much smaller should one wish to reduce the numerical standard error of the estimator. Moreover, the importance density derived via the CE method is in a well-defined sense optimal. We demonstrate the utility of the proposed approach by two empirical applications involving women's labor market participation and U.S. macroeconomic time series. In both applications the proposed CE method compares favorably to existing estimators

Munich RePEc Personal Archive

CiteSeerX