2,141 research outputs found
Efficient quantile regression for heteroscedastic models
Quantile regression (QR) provides estimates of a range of conditional quantiles. This stands in contrast to traditional regression techniques, which focus on a single conditional mean function. Lee et al. [Regularization of case-specific parameters for robustness and efficiency. Statist Sci. 2012;27(3):350–372] proposed efficient QR by rounding the sharp corner of the loss. The main modification generally involves an asymmetric ℓ₂ adjustment of the loss function around zero. We extend the idea of ℓ₂ adjusted QR to linear heterogeneous models. The ℓ₂ adjustment is constructed to diminish as sample size grows. Conditions to retain consistency properties are also provided
Rediscovering a little known fact about the t-test and the F-test: Algebraic, Geometric, Distributional and Graphical Considerations
We discuss the role that the null hypothesis should play in the construction
of a test statistic used to make a decision about that hypothesis. To construct
the test statistic for a point null hypothesis about a binomial proportion, a
common recommendation is to act as if the null hypothesis is true. We argue
that, on the surface, the one-sample t-test of a point null hypothesis about a
Gaussian population mean does not appear to follow the recommendation. We show
how simple algebraic manipulations of the usual t-statistic lead to an
equivalent test procedure consistent with the recommendation. We provide
geometric intuition regarding this equivalence and we consider extensions to
testing nested hypotheses in Gaussian linear models. We discuss an application
to graphical residual diagnostics where the form of the test statistic makes a
practical difference. By examining the formulation of the test statistic from
multiple perspectives in this familiar example, we provide simple, concrete
illustrations of some important issues that can guide the formulation of
effective solutions to more complex statistical problems.Comment: 22 pages, 5 figure
Bayesian Synthesis: Combining subjective analyses, with an application to ozone data
Bayesian model averaging enables one to combine the disparate predictions of
a number of models in a coherent fashion, leading to superior predictive
performance. The improvement in performance arises from averaging models that
make different predictions. In this work, we tap into perhaps the biggest
driver of different predictions---different analysts---in order to gain the
full benefits of model averaging. In a standard implementation of our method,
several data analysts work independently on portions of a data set, eliciting
separate models which are eventually updated and combined through a specific
weighting method. We call this modeling procedure Bayesian Synthesis. The
methodology helps to alleviate concerns about the sizable gap between the
foundational underpinnings of the Bayesian paradigm and the practice of
Bayesian statistics. In experimental work we show that human modeling has
predictive performance superior to that of many automatic modeling techniques,
including AIC, BIC, Smoothing Splines, CART, Bagged CART, Bayes CART, BMA and
LARS, and only slightly inferior to that of BART. We also show that Bayesian
Synthesis further improves predictive performance. Additionally, we examine the
predictive performance of a simple average across analysts, which we dub Convex
Synthesis, and find that it also produces an improvement.Comment: Published in at http://dx.doi.org/10.1214/10-AOAS444 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Regularization of Case-Specific Parameters for Robustness and Efficiency
Regularization methods allow one to handle a variety of inferential problems
where there are more covariates than cases. This allows one to consider a
potentially enormous number of covariates for a problem. We exploit the power
of these techniques, supersaturating models by augmenting the "natural"
covariates in the problem with an additional indicator for each case in the
data set. We attach a penalty term for these case-specific indicators which is
designed to produce a desired effect. For regression methods with squared error
loss, an penalty produces a regression which is robust to outliers and
high leverage cases; for quantile regression methods, an penalty
decreases the variance of the fit enough to overcome an increase in bias. The
paradigm thus allows us to robustify procedures which lack robustness and to
increase the efficiency of procedures which are robust. We provide a general
framework for the inclusion of case-specific parameters in regularization
problems, describing the impact on the effective loss for a variety of
regression and classification problems. We outline a computational strategy by
which existing software can be modified to solve the augmented regularization
problem, providing conditions under which such modification will converge to
the optimum solution. We illustrate the benefits of including case-specific
parameters in the context of mean regression and quantile regression through
analysis of NHANES and linguistic data sets.Comment: Published in at http://dx.doi.org/10.1214/11-STS377 the Statistical
Science (http://www.imstat.org/sts/) by the Institute of Mathematical
Statistics (http://www.imstat.org
Bayesian Restricted Likelihood Methods: Conditioning on Insufficient Statistics in Bayesian Regression
Bayesian methods have proven themselves to be successful across a wide range
of scientific problems and have many well-documented advantages over competing
methods. However, these methods run into difficulties for two major and
prevalent classes of problems: handling data sets with outliers and dealing
with model misspecification. We outline the drawbacks of previous solutions to
both of these problems and propose a new method as an alternative. When working
with the new method, the data is summarized through a set of insufficient
statistics, targeting inferential quantities of interest, and the prior
distribution is updated with the summary statistics rather than the complete
data. By careful choice of conditioning statistics, we retain the main benefits
of Bayesian methods while reducing the sensitivity of the analysis to features
of the data not captured by the conditioning statistics. For reducing
sensitivity to outliers, classical robust estimators (e.g., M-estimators) are
natural choices for conditioning statistics. A major contribution of this work
is the development of a data augmented Markov chain Monte Carlo (MCMC)
algorithm for the linear model and a large class of summary statistics. We
demonstrate the method on simulated and real data sets containing outliers and
subject to model misspecification. Success is manifested in better predictive
performance for data points of interest as compared to competing methods
Efficient Model Selection in Linear and Non-Linear Quantile Regression by Cross-Validation
Check loss function is used to define quantile regression. In the prospect of cross validation, it is also employed as a validation function when underlying truth is unknown. However, our empirical study indicates that the validation with check loss often leads to choosing an over estimated fits. In this work, we suggest a modified or L2-adjusted check loss which rounds the sharp corner in the middle of check loss. It has a large effect of guarding against over fitted model in some extent. Through various simulation settings of linear and non-linear regressions, the improvement of check loss by L2 adjustment is empirically examined. This adjustment is devised to shrink to zero as sample size grows
- …
