50,581 research outputs found
Statistical and Computational Tradeoffs in Stochastic Composite Likelihood
Maximum likelihood estimators are often of limited practical use due to the
intensive computation they require. We propose a family of alternative
estimators that maximize a stochastic variation of the composite likelihood
function. Each of the estimators resolve the computation-accuracy tradeoff
differently, and taken together they span a continuous spectrum of
computation-accuracy tradeoff resolutions. We prove the consistency of the
estimators, provide formulas for their asymptotic variance, statistical
robustness, and computational complexity. We discuss experimental results in
the context of Boltzmann machines and conditional random fields. The
theoretical and experimental studies demonstrate the effectiveness of the
estimators when the computational resources are insufficient. They also
demonstrate that in some cases reduced computational complexity is associated
with robustness thereby increasing statistical accuracy.Comment: 30 pages, 97 figures, 2 author
Maximum Likelihood Estimation of Stochastic Frontier Models with Endogeneity
We propose and study a maximum likelihood estimator of stochastic frontier
models with endogeneity in cross-section data when the composite error term may
be correlated with inputs and environmental variables. Our framework is a
generalization of the normal half-normal stochastic frontier model with
endogeneity. We derive the likelihood function in closed form using three
fundamental assumptions: the existence of control functions that fully capture
the dependence between regressors and unobservables; the conditional
independence of the two error components given the control functions; and the
conditional distribution of the stochastic inefficiency term given the control
functions being a folded normal distribution. We also provide a Battese-Coelli
estimator of technical efficiency. Our estimator is computationally fast and
easy to implement. We study some of its asymptotic properties, and we showcase
its finite sample behavior in Monte-Carlo simulations and an empirical
application to farmers in Nepal
When Composite Likelihood Meets Stochastic Approximation
A composite likelihood is an inference function derived by multiplying a set
of likelihood components. This approach provides a flexible framework for
drawing inference when the likelihood function of a statistical model is
computationally intractable. While composite likelihood has computational
advantages, it can still be demanding when dealing with numerous likelihood
components and a large sample size. This paper tackles this challenge by
employing an approximation of the conventional composite likelihood estimator,
which is derived from an optimization procedure relying on stochastic
gradients. This novel estimator is shown to be asymptotically normally
distributed around the true parameter. In particular, based on the relative
divergent rate of the sample size and the number of iterations of the
optimization, the variance of the limiting distribution is shown to compound
for two sources of uncertainty: the sampling variability of the data and the
optimization noise, with the latter depending on the sampling distribution used
to construct the stochastic gradients. The advantages of the proposed framework
are illustrated through simulation studies on two working examples: an Ising
model for binary data and a gamma frailty model for count data. Finally, a
real-data application is presented, showing its effectiveness in a large-scale
mental health survey
How Many Communities Are There?
Stochastic blockmodels and variants thereof are among the most widely used
approaches to community detection for social networks and relational data. A
stochastic blockmodel partitions the nodes of a network into disjoint sets,
called communities. The approach is inherently related to clustering with
mixture models; and raises a similar model selection problem for the number of
communities. The Bayesian information criterion (BIC) is a popular solution,
however, for stochastic blockmodels, the conditional independence assumption
given the communities of the endpoints among different edges is usually
violated in practice. In this regard, we propose composite likelihood BIC
(CL-BIC) to select the number of communities, and we show it is robust against
possible misspecifications in the underlying stochastic blockmodel assumptions.
We derive the requisite methodology and illustrate the approach using both
simulated and real data. Supplementary materials containing the relevant
computer code are available online.Comment: 26 pages, 3 figure
Asymptotic Analysis of Generative Semi-Supervised Learning
Semisupervised learning has emerged as a popular framework for improving
modeling accuracy while controlling labeling cost. Based on an extension of
stochastic composite likelihood we quantify the asymptotic accuracy of
generative semi-supervised learning. In doing so, we complement
distribution-free analysis by providing an alternative framework to measure the
value associated with different labeling policies and resolve the fundamental
question of how much data to label and in what manner. We demonstrate our
approach with both simulation studies and real world experiments using naive
Bayes for text classification and MRFs and CRFs for structured prediction in
NLP.Comment: 12 pages, 9 figure
Composite Likelihood for Stochastic Migration Model with Unobserved Factor
We introduce the conditional Maximum Composite Likelihood (MCL) estimation
method for the stochastic factor ordered Probit model of credit rating
transitions of firms. This model is recommended for internal credit risk
assessment procedures in banks and financial institutions under the Basel III
regulations. Its exact likelihood function involves a high-dimensional
integral, which can be approximated numerically before maximization. However,
the estimated migration risk and required capital tend to be sensitive to the
quality of this approximation, potentially leading to statistical regulatory
arbitrage. The proposed conditional MCL estimator circumvents this problem and
maximizes the composite log-likelihood of the factor ordered Probit model. We
present three conditional MCL estimators of different complexity and examine
their consistency and asymptotic normality when n and T tend to infinity. The
performance of these estimators at finite T is examined and compared with a
granularity-based approach in a simulation study. The use of the MCL estimator
is also illustrated in an empirical application
Some challenges for statistics
The paper gives a highly personal sketch of some current trends in statistical inference. After an account of the challenges that new forms of data bring, there is a brief overview of some topics in stochastic modelling. The paper then turns to sparsity, illustrated using Bayesian wavelet analysis based on a mixture model and metabolite profiling. Modern likelihood methods including higher order approximation and composite likelihood inference are then discussed, followed by some thoughts on statistical educatio
Scalable Bayesian nonparametric regression via a Plackett-Luce model for conditional ranks
We present a novel Bayesian nonparametric regression model for covariates X
and continuous, real response variable Y. The model is parametrized in terms of
marginal distributions for Y and X and a regression function which tunes the
stochastic ordering of the conditional distributions F(y|x). By adopting an
approximate composite likelihood approach, we show that the resulting posterior
inference can be decoupled for the separate components of the model. This
procedure can scale to very large datasets and allows for the use of standard,
existing, software from Bayesian nonparametric density estimation and
Plackett-Luce ranking estimation to be applied. As an illustration, we show an
application of our approach to a US Census dataset, with over 1,300,000 data
points and more than 100 covariates
- …