Search CORE

2,308 research outputs found

Adaptive robust variable selection

Author: Barut Emre
Fan Jianqing
Fan Yingying
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/02/2014
Field of study

Heavy-tailed high-dimensional data are commonly encountered in various scientific fields and pose great challenges to modern statistical analysis. A natural procedure to address this problem is to use penalized quantile regression with weighted

L_1

-penalty, called weighted robust Lasso (WR-Lasso), in which weights are introduced to ameliorate the bias problem induced by the

L_1

-penalty. In the ultra-high dimensional setting, where the dimensionality can grow exponentially with the sample size, we investigate the model selection oracle property and establish the asymptotic normality of the WR-Lasso. We show that only mild conditions on the model error distribution are needed. Our theoretical results also reveal that adaptive choice of the weight vector is essential for the WR-Lasso to enjoy these nice asymptotic properties. To make the WR-Lasso practically feasible, we propose a two-step procedure, called adaptive robust Lasso (AR-Lasso), in which the weight vector in the second step is constructed based on the

L_1

-penalized quantile regression estimate from the first step. This two-step procedure is justified theoretically to possess the oracle property and the asymptotic normality. Numerical studies demonstrate the favorable finite-sample performance of the AR-Lasso.Comment: Published in at http://dx.doi.org/10.1214/13-AOS1191 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

Princeton University Open Access Repository

Crossref

PubMed Central

One-step estimator paths for concave regularization

Author: Taddy Matt
Publication venue
Publication date: 01/05/2016
Field of study

The statistics literature of the past 15 years has established many favorable properties for sparse diminishing-bias regularization: techniques which can roughly be understood as providing estimation under penalty functions spanning the range of concavity between

L_0

and

L_1

norms. However, lasso

L_1

-regularized estimation remains the standard tool for industrial `Big Data' applications because of its minimal computational cost and the presence of easy-to-apply rules for penalty selection. In response, this article proposes a simple new algorithm framework that requires no more computation than a lasso path: the path of one-step estimators (POSE) does

L_1

penalized regression estimation on a grid of decreasing penalties, but adapts coefficient-specific weights to decrease as a function of the coefficient estimated in the previous path step. This provides sparse diminishing-bias regularization at no extra cost over the fastest lasso algorithms. Moreover, our `gamma lasso' implementation of POSE is accompanied by a reliable heuristic for the fit degrees of freedom, so that standard information criteria can be applied in penalty selection. We also provide novel results on the distance between weighted-

L_1

and

L_0

penalized predictors; this allows us to build intuition about POSE and other diminishing-bias regularization schemes. The methods and results are illustrated in extensive simulations and in application of logistic regression to evaluating the performance of hockey players.Comment: Data and code are in the gamlr package for R. Supplemental appendix is at https://github.com/TaddyLab/pose/raw/master/paper/supplemental.pd

arXiv.org e-Print Archive

FigShare

APPLE: Approximate Path for Penalized Likelihood Estimators

Author: Feng Yang
Yu Yi
Publication venue
Publication date: 01/01/2013
Field of study

In high-dimensional data analysis, penalized likelihood estimators are shown to provide superior results in both variable selection and parameter estimation. A new algorithm, APPLE, is proposed for calculating the Approximate Path for Penalized Likelihood Estimators. Both the convex penalty (such as LASSO) and the nonconvex penalty (such as SCAD and MCP) cases are considered. The APPLE efficiently computes the solution path for the penalized likelihood estimator using a hybrid of the modified predictor-corrector method and the coordinate-descent algorithm. APPLE is compared with several well-known packages via simulation and analysis of two gene expression data sets.Comment: 24 pages, 9 figure

arXiv.org e-Print Archive

Warwick Research Archives Portal Repository

Explore Bristol Research

Maximum Likelihood Estimation of Stochastic Frontier Models with Endogeneity

Author: Centorrino Samuele
Pérez-Urdiales María
Publication venue
Publication date: 21/03/2021
Field of study

We propose and study a maximum likelihood estimator of stochastic frontier models with endogeneity in cross-section data when the composite error term may be correlated with inputs and environmental variables. Our framework is a generalization of the normal half-normal stochastic frontier model with endogeneity. We derive the likelihood function in closed form using three fundamental assumptions: the existence of control functions that fully capture the dependence between regressors and unobservables; the conditional independence of the two error components given the control functions; and the conditional distribution of the stochastic inefficiency term given the control functions being a folded normal distribution. We also provide a Battese-Coelli estimator of technical efficiency. Our estimator is computationally fast and easy to implement. We study some of its asymptotic properties, and we showcase its finite sample behavior in Monte-Carlo simulations and an empirical application to farmers in Nepal

arXiv.org e-Print Archive

AgEcon Search - Research in Agricultural & Applied Economics

AgEcon Search: Research in Agricultural and Applied Economics

Variance Estimation Using Refitted Cross-validation in Ultrahigh Dimensional Regression

Author: Fan Jianqing
Guo Shaojun
Hao Ning
Publication venue
Publication date: 24/12/2010
Field of study

Variance estimation is a fundamental problem in statistical modeling. In ultrahigh dimensional linear regressions where the dimensionality is much larger than sample size, traditional variance estimation techniques are not applicable. Recent advances on variable selection in ultrahigh dimensional linear regressions make this problem accessible. One of the major problems in ultrahigh dimensional regression is the high spurious correlation between the unobserved realized noise and some of the predictors. As a result, the realized noises are actually predicted when extra irrelevant variables are selected, leading to serious underestimate of the noise level. In this paper, we propose a two-stage refitted procedure via a data splitting technique, called refitted cross-validation (RCV), to attenuate the influence of irrelevant variables with high spurious correlations. Our asymptotic results show that the resulting procedure performs as well as the oracle estimator, which knows in advance the mean regression function. The simulation studies lend further support to our theoretical claims. The naive two-stage estimator which fits the selected variables in the first stage and the plug-in one stage estimators using LASSO and SCAD are also studied and compared. Their performances can be improved by the proposed RCV method

arXiv.org e-Print Archive

Princeton University Open Access Repository

Doubly Robust Inference when Combining Probability and Non-probability Samples with High-dimensional Data

Author: Bang
Berger
Berger
Bethlehem
Breidt
Brewer
Brookhart
Buchanan
Cao
Chen
Chen
Chen
Chernozhukov
Chipperfield
Conti
De
Deville
DiSogra
Elliott
Fan
Fan
Farrell
Friedman
Fuller
Gao
Grafström
Han
Hunter
Hájek
Johnson
Kang
Keiding
Kim
Kim
Kott
Kott
Lee
McConville
Meng
O’Muircheartaigh
Patrick
Rivers
Rosenbaum
Shao
Shortreed
Stuart
Stuart
Tillé
Tsiatis
Valliant
Publication venue
Publication date: 23/08/2019
Field of study

Non-probability samples become increasingly popular in survey statistics but may suffer from selection biases that limit the generalizability of results to the target population. We consider integrating a non-probability sample with a probability sample which provides high-dimensional representative covariate information of the target population. We propose a two-step approach for variable selection and finite population inference. In the first step, we use penalized estimating equations with folded-concave penalties to select important variables for the sampling score of selection into the non-probability sample and the outcome model. We show that the penalized estimating equation approach enjoys the selection consistency property for general probability samples. The major technical hurdle is due to the possible dependence of the sample under the finite population framework. To overcome this challenge, we construct martingales which enable us to apply Bernstein concentration inequality for martingales. In the second step, we focus on a doubly robust estimator of the finite population mean and re-estimate the nuisance model parameters by minimizing the asymptotic squared bias of the doubly robust estimator. This estimating strategy mitigates the possible first-step selection error and renders the doubly robust estimator root-n consistent if either the sampling probability or the outcome model is correctly specified

arXiv.org e-Print Archive

Digital Repository @ Iowa State University (ISU)

Crossref

Publication Bias in Meta-Analysis: Confidence Intervals for Rosenthal's Fail-Safe Number

Author: Fragkos Konstantinos C.
Frangos Christos C.
Tsagris Michail
Publication venue
Publication date: 01/12/2014
Field of study

The purpose of the present paper is to assess the efficacy of confidence intervals for Rosenthal's fail-safe number. Although Rosenthal's estimator is highly used by researchers, its statistical properties are largely unexplored. First of all, we developed statistical theory which allowed us to produce confidence intervals for Rosenthal's fail-safe number.This was produced by discerning whether the number of studies analysed in a meta-analysis is fixed or random. Each case produces different variance estimators. For a given number of studies and a given distribution, we provided five variance estimators. Confidence intervals are examined with a normal approximation and a nonparametric bootstrap. The accuracy of the different confidence interval estimates was then tested by methods of simulation under different distributional assumptions. The half normal distribution variance estimator has the best probability coverage. Finally, we provide a table of lower confidence intervals for Rosenthal's estimator.Comment: Published in the International Scholarly Research Notices in December 201

arXiv.org e-Print Archive

Munich RePEc Personal Archive

PubMed Central

UCL Discovery