4,053 research outputs found
Focused information criterion and model averaging for generalized additive partial linear models
We study model selection and model averaging in generalized additive partial
linear models (GAPLMs). Polynomial spline is used to approximate nonparametric
functions. The corresponding estimators of the linear parameters are shown to
be asymptotically normal. We then develop a focused information criterion (FIC)
and a frequentist model average (FMA) estimator on the basis of the
quasi-likelihood principle and examine theoretical properties of the FIC and
FMA. The major advantages of the proposed procedures over the existing ones are
their computational expediency and theoretical reliability. Simulation
experiments have provided evidence of the superiority of the proposed
procedures. The approach is further applied to a real-world data example.Comment: Published in at http://dx.doi.org/10.1214/10-AOS832 the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
Variable selection in semiparametric regression modeling
In this paper, we are concerned with how to select significant variables in
semiparametric modeling. Variable selection for semiparametric regression
models consists of two components: model selection for nonparametric components
and selection of significant variables for the parametric portion. Thus,
semiparametric variable selection is much more challenging than parametric
variable selection (e.g., linear and generalized linear models) because
traditional variable selection procedures including stepwise regression and the
best subset selection now require separate model selection for the
nonparametric components for each submodel. This leads to a very heavy
computational burden. In this paper, we propose a class of variable selection
procedures for semiparametric regression models using nonconcave penalized
likelihood. We establish the rate of convergence of the resulting estimate.
With proper choices of penalty functions and regularization parameters, we show
the asymptotic normality of the resulting estimate and further demonstrate that
the proposed procedures perform as well as an oracle procedure. A
semiparametric generalized likelihood ratio test is proposed to select
significant variables in the nonparametric component. We investigate the
asymptotic behavior of the proposed test and demonstrate that its limiting null
distribution follows a chi-square distribution which is independent of the
nuisance parameters. Extensive Monte Carlo simulation studies are conducted to
examine the finite sample performance of the proposed variable selection
procedures.Comment: Published in at http://dx.doi.org/10.1214/009053607000000604 the
Annals of Statistics (http://www.imstat.org/aos/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Marginal empirical likelihood and sure independence feature screening
We study a marginal empirical likelihood approach in scenarios when the
number of variables grows exponentially with the sample size. The marginal
empirical likelihood ratios as functions of the parameters of interest are
systematically examined, and we find that the marginal empirical likelihood
ratio evaluated at zero can be used to differentiate whether an explanatory
variable is contributing to a response variable or not. Based on this finding,
we propose a unified feature screening procedure for linear models and the
generalized linear models. Different from most existing feature screening
approaches that rely on the magnitudes of some marginal estimators to identify
true signals, the proposed screening approach is capable of further
incorporating the level of uncertainties of such estimators. Such a merit
inherits the self-studentization property of the empirical likelihood approach,
and extends the insights of existing feature screening methods. Moreover, we
show that our screening approach is less restrictive to distributional
assumptions, and can be conveniently adapted to be applied in a broad range of
scenarios such as models specified using general moment conditions. Our
theoretical results and extensive numerical examples by simulations and data
analysis demonstrate the merits of the marginal empirical likelihood approach.Comment: Published in at http://dx.doi.org/10.1214/13-AOS1139 the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
Challenges of Big Data Analysis
Big Data bring new opportunities to modern society and challenges to data
scientists. On one hand, Big Data hold great promises for discovering subtle
population patterns and heterogeneities that are not possible with small-scale
data. On the other hand, the massive sample size and high dimensionality of Big
Data introduce unique computational and statistical challenges, including
scalability and storage bottleneck, noise accumulation, spurious correlation,
incidental endogeneity, and measurement errors. These challenges are
distinguished and require new computational and statistical paradigm. This
article give overviews on the salient features of Big Data and how these
features impact on paradigm change on statistical and computational methods as
well as computing architectures. We also provide various new perspectives on
the Big Data analysis and computation. In particular, we emphasis on the
viability of the sparsest solution in high-confidence set and point out that
exogeneous assumptions in most statistical methods for Big Data can not be
validated due to incidental endogeneity. They can lead to wrong statistical
inferences and consequently wrong scientific conclusions
Penalized Composite Quasi-Likelihood for Ultrahigh-Dimensional Variable Selection
In high-dimensional model selection problems, penalized simple least-square
approaches have been extensively used. This paper addresses the question of
both robustness and efficiency of penalized model selection methods, and
proposes a data-driven weighted linear combination of convex loss functions,
together with weighted -penalty. It is completely data-adaptive and does
not require prior knowledge of the error distribution. The weighted
-penalty is used both to ensure the convexity of the penalty term and to
ameliorate the bias caused by the -penalty. In the setting with
dimensionality much larger than the sample size, we establish a strong oracle
property of the proposed method that possesses both the model selection
consistency and estimation efficiency for the true non-zero coefficients. As
specific examples, we introduce a robust method of composite L1-L2, and optimal
composite quantile method and evaluate their performance in both simulated and
real data examples
SCAD-penalized regression in high-dimensional partially linear models
We consider the problem of simultaneous variable selection and estimation in
partially linear models with a divergent number of covariates in the linear
part, under the assumption that the vector of regression coefficients is
sparse. We apply the SCAD penalty to achieve sparsity in the linear part and
use polynomial splines to estimate the nonparametric component. Under
reasonable conditions, it is shown that consistency in terms of variable
selection and estimation can be achieved simultaneously for the linear and
nonparametric components. Furthermore, the SCAD-penalized estimators of the
nonzero coefficients are shown to have the asymptotic oracle property, in the
sense that it is asymptotically normal with the same means and covariances that
they would have if the zero coefficients were known in advance. The finite
sample behavior of the SCAD-penalized estimators is evaluated with simulation
and illustrated with a data set.Comment: Published in at http://dx.doi.org/10.1214/07-AOS580 the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
Simulation-based Estimation Methods for Financial Time Series Models
This chapter overviews some recent advances on simulation-based methods of estimating financial time series models that are widely used in financial economics. The simulation-based methods have proven to be particularly useful when the likelihood function and moments do not have tractable forms, and hence, the maximum likelihood (ML) method and the generalized method of moments (GMM) are diffcult to use. They are also capable of improving the finite sample performance of the traditional methods. Both frequentist's and Bayesian simulation-based methods are reviewed. Frequentist's simulation-based methods cover various forms of simulated maximum likelihood (SML) methods, the simulated generalized method of moments (SGMM), the efficient method of moments (EMM), and the indirect inference (II) method. Bayesian simulation-based methods cover various MCMC algorithms. Each simulation-based method is discussed in the context of a specific financial time series model as a motivating example. Empirical applications, based on real exchange rates, interest rates and equity data, illustrate how the simulation-based methods are implemented. In particular, SML is applied to a discrete time stochastic volatility model, EMM to estimate a continuous time stochastic volatility model, MCMC to a credit risk model, the II method to a term structure model.Generalized method of moments, Maximum likelihood, MCMC, Indirect Inference, Credit risk, Stock price, Exchange rate, Interest rate..
- …