Search CORE

4,053 research outputs found

Focused information criterion and model averaging for generalized additive partial linear models

Author: Liang Hua
Zhang Xinyu
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 08/03/2011
Field of study

We study model selection and model averaging in generalized additive partial linear models (GAPLMs). Polynomial spline is used to approximate nonparametric functions. The corresponding estimators of the linear parameters are shown to be asymptotically normal. We then develop a focused information criterion (FIC) and a frequentist model average (FMA) estimator on the basis of the quasi-likelihood principle and examine theoretical properties of the FIC and FMA. The major advantages of the proposed procedures over the existing ones are their computational expediency and theoretical reliability. Simulation experiments have provided evidence of the superiority of the proposed procedures. The approach is further applied to a real-world data example.Comment: Published in at http://dx.doi.org/10.1214/10-AOS832 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

Crossref

Variable selection in semiparametric regression modeling

Author: Li Runze
Liang Hua
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 13/03/2008
Field of study

In this paper, we are concerned with how to select significant variables in semiparametric modeling. Variable selection for semiparametric regression models consists of two components: model selection for nonparametric components and selection of significant variables for the parametric portion. Thus, semiparametric variable selection is much more challenging than parametric variable selection (e.g., linear and generalized linear models) because traditional variable selection procedures including stepwise regression and the best subset selection now require separate model selection for the nonparametric components for each submodel. This leads to a very heavy computational burden. In this paper, we propose a class of variable selection procedures for semiparametric regression models using nonconcave penalized likelihood. We establish the rate of convergence of the resulting estimate. With proper choices of penalty functions and regularization parameters, we show the asymptotic normality of the resulting estimate and further demonstrate that the proposed procedures perform as well as an oracle procedure. A semiparametric generalized likelihood ratio test is proposed to select significant variables in the nonparametric component. We investigate the asymptotic behavior of the proposed test and demonstrate that its limiting null distribution follows a chi-square distribution which is independent of the nuisance parameters. Extensive Monte Carlo simulation studies are conducted to examine the finite sample performance of the proposed variable selection procedures.Comment: Published in at http://dx.doi.org/10.1214/009053607000000604 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

CiteSeerX

Crossref

Marginal empirical likelihood and sure independence feature screening

Author: Chang Jinyuan
Tang Cheng Yong
Wu Yichao
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 06/11/2013
Field of study

We study a marginal empirical likelihood approach in scenarios when the number of variables grows exponentially with the sample size. The marginal empirical likelihood ratios as functions of the parameters of interest are systematically examined, and we find that the marginal empirical likelihood ratio evaluated at zero can be used to differentiate whether an explanatory variable is contributing to a response variable or not. Based on this finding, we propose a unified feature screening procedure for linear models and the generalized linear models. Different from most existing feature screening approaches that rely on the magnitudes of some marginal estimators to identify true signals, the proposed screening approach is capable of further incorporating the level of uncertainties of such estimators. Such a merit inherits the self-studentization property of the empirical likelihood approach, and extends the insights of existing feature screening methods. Moreover, we show that our screening approach is less restrictive to distributional assumptions, and can be conveniently adapted to be applied in a broad range of scenarios such as models specified using general moment conditions. Our theoretical results and extensive numerical examples by simulations and data analysis demonstrate the merits of the marginal empirical likelihood approach.Comment: Published in at http://dx.doi.org/10.1214/13-AOS1139 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

Challenges of Big Data Analysis

Author: Fan Jianqing
Han Fang
Liu Han
Publication venue: 'Oxford University Press (OUP)'
Publication date: 05/02/2014
Field of study

Big Data bring new opportunities to modern society and challenges to data scientists. On one hand, Big Data hold great promises for discovering subtle population patterns and heterogeneities that are not possible with small-scale data. On the other hand, the massive sample size and high dimensionality of Big Data introduce unique computational and statistical challenges, including scalability and storage bottleneck, noise accumulation, spurious correlation, incidental endogeneity, and measurement errors. These challenges are distinguished and require new computational and statistical paradigm. This article give overviews on the salient features of Big Data and how these features impact on paradigm change on statistical and computational methods as well as computing architectures. We also provide various new perspectives on the Big Data analysis and computation. In particular, we emphasis on the viability of the sparsest solution in high-confidence set and point out that exogeneous assumptions in most statistical methods for Big Data can not be validated due to incidental endogeneity. They can lead to wrong statistical inferences and consequently wrong scientific conclusions

arXiv.org e-Print Archive

CiteSeerX

Princeton University Open Access Repository

Crossref

PubMed Central

Penalized Composite Quasi-Likelihood for Ultrahigh-Dimensional Variable Selection

Author: Bai
Bickel
Bickel
Bickel
Deutsch
Efron
Fan
Fan
Fan
Fan
Fan
Frank
Friedman
Huang
Huber
Kim
Koenker
Lehmann
Li
Portnoy
Portnoy
Tibshirani
van der Vaart
Wu
Xie
Yuan
Zhao
Zou
Zou
Zou
Zou
Publication venue: 'Wiley'
Publication date: 30/06/2010
Field of study

In high-dimensional model selection problems, penalized simple least-square approaches have been extensively used. This paper addresses the question of both robustness and efficiency of penalized model selection methods, and proposes a data-driven weighted linear combination of convex loss functions, together with weighted

L_1

-penalty. It is completely data-adaptive and does not require prior knowledge of the error distribution. The weighted

L_1

-penalty is used both to ensure the convexity of the penalty term and to ameliorate the bias caused by the

L_1

-penalty. In the setting with dimensionality much larger than the sample size, we establish a strong oracle property of the proposed method that possesses both the model selection consistency and estimation efficiency for the true non-zero coefficients. As specific examples, we introduce a robust method of composite L1-L2, and optimal composite quantile method and evaluate their performance in both simulated and real data examples

arXiv.org e-Print Archive

Crossref

SCAD-penalized regression in high-dimensional partially linear models

Author: Huang Jian
Xie Huiliang
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/01/2009
Field of study

We consider the problem of simultaneous variable selection and estimation in partially linear models with a divergent number of covariates in the linear part, under the assumption that the vector of regression coefficients is sparse. We apply the SCAD penalty to achieve sparsity in the linear part and use polynomial splines to estimate the nonparametric component. Under reasonable conditions, it is shown that consistency in terms of variable selection and estimation can be achieved simultaneously for the linear and nonparametric components. Furthermore, the SCAD-penalized estimators of the nonzero coefficients are shown to have the asymptotic oracle property, in the sense that it is asymptotically normal with the same means and covariances that they would have if the zero coefficients were known in advance. The finite sample behavior of the SCAD-penalized estimators is evaluated with simulation and illustrated with a data set.Comment: Published in at http://dx.doi.org/10.1214/07-AOS580 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

CiteSeerX

Crossref

Simulation-based Estimation Methods for Financial Time Series Models

Author: Jun Yu
Publication venue
Publication date
Field of study

This chapter overviews some recent advances on simulation-based methods of estimating financial time series models that are widely used in financial economics. The simulation-based methods have proven to be particularly useful when the likelihood function and moments do not have tractable forms, and hence, the maximum likelihood (ML) method and the generalized method of moments (GMM) are diffcult to use. They are also capable of improving the finite sample performance of the traditional methods. Both frequentist's and Bayesian simulation-based methods are reviewed. Frequentist's simulation-based methods cover various forms of simulated maximum likelihood (SML) methods, the simulated generalized method of moments (SGMM), the efficient method of moments (EMM), and the indirect inference (II) method. Bayesian simulation-based methods cover various MCMC algorithms. Each simulation-based method is discussed in the context of a specific financial time series model as a motivating example. Empirical applications, based on real exchange rates, interest rates and equity data, illustrate how the simulation-based methods are implemented. In particular, SML is applied to a discrete time stochastic volatility model, EMM to estimate a continuous time stochastic volatility model, MCMC to a credit risk model, the II method to a term structure model.Generalized method of moments, Maximum likelihood, MCMC, Indirect Inference, Credit risk, Stock price, Exchange rate, Interest rate..

Research Papers in Economics