50,581 research outputs found

    Statistical and Computational Tradeoffs in Stochastic Composite Likelihood

    Get PDF
    Maximum likelihood estimators are often of limited practical use due to the intensive computation they require. We propose a family of alternative estimators that maximize a stochastic variation of the composite likelihood function. Each of the estimators resolve the computation-accuracy tradeoff differently, and taken together they span a continuous spectrum of computation-accuracy tradeoff resolutions. We prove the consistency of the estimators, provide formulas for their asymptotic variance, statistical robustness, and computational complexity. We discuss experimental results in the context of Boltzmann machines and conditional random fields. The theoretical and experimental studies demonstrate the effectiveness of the estimators when the computational resources are insufficient. They also demonstrate that in some cases reduced computational complexity is associated with robustness thereby increasing statistical accuracy.Comment: 30 pages, 97 figures, 2 author

    Maximum Likelihood Estimation of Stochastic Frontier Models with Endogeneity

    Full text link
    We propose and study a maximum likelihood estimator of stochastic frontier models with endogeneity in cross-section data when the composite error term may be correlated with inputs and environmental variables. Our framework is a generalization of the normal half-normal stochastic frontier model with endogeneity. We derive the likelihood function in closed form using three fundamental assumptions: the existence of control functions that fully capture the dependence between regressors and unobservables; the conditional independence of the two error components given the control functions; and the conditional distribution of the stochastic inefficiency term given the control functions being a folded normal distribution. We also provide a Battese-Coelli estimator of technical efficiency. Our estimator is computationally fast and easy to implement. We study some of its asymptotic properties, and we showcase its finite sample behavior in Monte-Carlo simulations and an empirical application to farmers in Nepal

    When Composite Likelihood Meets Stochastic Approximation

    Full text link
    A composite likelihood is an inference function derived by multiplying a set of likelihood components. This approach provides a flexible framework for drawing inference when the likelihood function of a statistical model is computationally intractable. While composite likelihood has computational advantages, it can still be demanding when dealing with numerous likelihood components and a large sample size. This paper tackles this challenge by employing an approximation of the conventional composite likelihood estimator, which is derived from an optimization procedure relying on stochastic gradients. This novel estimator is shown to be asymptotically normally distributed around the true parameter. In particular, based on the relative divergent rate of the sample size and the number of iterations of the optimization, the variance of the limiting distribution is shown to compound for two sources of uncertainty: the sampling variability of the data and the optimization noise, with the latter depending on the sampling distribution used to construct the stochastic gradients. The advantages of the proposed framework are illustrated through simulation studies on two working examples: an Ising model for binary data and a gamma frailty model for count data. Finally, a real-data application is presented, showing its effectiveness in a large-scale mental health survey

    How Many Communities Are There?

    Full text link
    Stochastic blockmodels and variants thereof are among the most widely used approaches to community detection for social networks and relational data. A stochastic blockmodel partitions the nodes of a network into disjoint sets, called communities. The approach is inherently related to clustering with mixture models; and raises a similar model selection problem for the number of communities. The Bayesian information criterion (BIC) is a popular solution, however, for stochastic blockmodels, the conditional independence assumption given the communities of the endpoints among different edges is usually violated in practice. In this regard, we propose composite likelihood BIC (CL-BIC) to select the number of communities, and we show it is robust against possible misspecifications in the underlying stochastic blockmodel assumptions. We derive the requisite methodology and illustrate the approach using both simulated and real data. Supplementary materials containing the relevant computer code are available online.Comment: 26 pages, 3 figure

    Asymptotic Analysis of Generative Semi-Supervised Learning

    Full text link
    Semisupervised learning has emerged as a popular framework for improving modeling accuracy while controlling labeling cost. Based on an extension of stochastic composite likelihood we quantify the asymptotic accuracy of generative semi-supervised learning. In doing so, we complement distribution-free analysis by providing an alternative framework to measure the value associated with different labeling policies and resolve the fundamental question of how much data to label and in what manner. We demonstrate our approach with both simulation studies and real world experiments using naive Bayes for text classification and MRFs and CRFs for structured prediction in NLP.Comment: 12 pages, 9 figure

    Composite Likelihood for Stochastic Migration Model with Unobserved Factor

    Full text link
    We introduce the conditional Maximum Composite Likelihood (MCL) estimation method for the stochastic factor ordered Probit model of credit rating transitions of firms. This model is recommended for internal credit risk assessment procedures in banks and financial institutions under the Basel III regulations. Its exact likelihood function involves a high-dimensional integral, which can be approximated numerically before maximization. However, the estimated migration risk and required capital tend to be sensitive to the quality of this approximation, potentially leading to statistical regulatory arbitrage. The proposed conditional MCL estimator circumvents this problem and maximizes the composite log-likelihood of the factor ordered Probit model. We present three conditional MCL estimators of different complexity and examine their consistency and asymptotic normality when n and T tend to infinity. The performance of these estimators at finite T is examined and compared with a granularity-based approach in a simulation study. The use of the MCL estimator is also illustrated in an empirical application

    Some challenges for statistics

    Get PDF
    The paper gives a highly personal sketch of some current trends in statistical inference. After an account of the challenges that new forms of data bring, there is a brief overview of some topics in stochastic modelling. The paper then turns to sparsity, illustrated using Bayesian wavelet analysis based on a mixture model and metabolite profiling. Modern likelihood methods including higher order approximation and composite likelihood inference are then discussed, followed by some thoughts on statistical educatio

    Scalable Bayesian nonparametric regression via a Plackett-Luce model for conditional ranks

    Full text link
    We present a novel Bayesian nonparametric regression model for covariates X and continuous, real response variable Y. The model is parametrized in terms of marginal distributions for Y and X and a regression function which tunes the stochastic ordering of the conditional distributions F(y|x). By adopting an approximate composite likelihood approach, we show that the resulting posterior inference can be decoupled for the separate components of the model. This procedure can scale to very large datasets and allows for the use of standard, existing, software from Bayesian nonparametric density estimation and Plackett-Luce ranking estimation to be applied. As an illustration, we show an application of our approach to a US Census dataset, with over 1,300,000 data points and more than 100 covariates
    • …
    corecore