Search CORE

2,141 research outputs found

Efficient quantile regression for heteroscedastic models

Author: Jung Yoonsuh
Lee Yoonkyung
MacEachern Steve N,
Publication venue: 'Informa UK Limited'
Publication date: 13/10/2014
Field of study

Quantile regression (QR) provides estimates of a range of conditional quantiles. This stands in contrast to traditional regression techniques, which focus on a single conditional mean function. Lee et al. [Regularization of case-specific parameters for robustness and efficiency. Statist Sci. 2012;27(3):350–372] proposed efficient QR by rounding the sharp corner of the loss. The main modification generally involves an asymmetric ℓ₂ adjustment of the loss function around zero. We extend the idea of ℓ₂ adjusted QR to linear heterogeneous models. The ℓ₂ adjustment is constructed to diminish as sample size grows. Conditions to retain consistency properties are also provided

CiteSeerX

Crossref

Research Commons@Waikato

The Francis Crick Institute

Rediscovering a little known fact about the t-test and the F-test: Algebraic, Geometric, Distributional and Graphical Considerations

Author: MacEachern Steven N.
Peruggia Mario
Sinnott Jennifer A.
Publication venue
Publication date: 13/07/2022
Field of study

We discuss the role that the null hypothesis should play in the construction of a test statistic used to make a decision about that hypothesis. To construct the test statistic for a point null hypothesis about a binomial proportion, a common recommendation is to act as if the null hypothesis is true. We argue that, on the surface, the one-sample t-test of a point null hypothesis about a Gaussian population mean does not appear to follow the recommendation. We show how simple algebraic manipulations of the usual t-statistic lead to an equivalent test procedure consistent with the recommendation. We provide geometric intuition regarding this equivalence and we consider extensions to testing nested hypotheses in Gaussian linear models. We discuss an application to graphical residual diagnostics where the form of the test statistic makes a practical difference. By examining the formulation of the test statistic from multiple perspectives in this familiar example, we provide simple, concrete illustrations of some important issues that can guide the formulation of effective solutions to more complex statistical problems.Comment: 22 pages, 5 figure

arXiv.org e-Print Archive

AlmaDL Journals

Bayesian Synthesis: Combining subjective analyses, with an application to ozone data

Author: MacEachern Steven N.
Peruggia Mario
Yu Qingzhao
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 29/07/2011
Field of study

Bayesian model averaging enables one to combine the disparate predictions of a number of models in a coherent fashion, leading to superior predictive performance. The improvement in performance arises from averaging models that make different predictions. In this work, we tap into perhaps the biggest driver of different predictions---different analysts---in order to gain the full benefits of model averaging. In a standard implementation of our method, several data analysts work independently on portions of a data set, eliciting separate models which are eventually updated and combined through a specific weighting method. We call this modeling procedure Bayesian Synthesis. The methodology helps to alleviate concerns about the sizable gap between the foundational underpinnings of the Bayesian paradigm and the practice of Bayesian statistics. In experimental work we show that human modeling has predictive performance superior to that of many automatic modeling techniques, including AIC, BIC, Smoothing Splines, CART, Bagged CART, Bayes CART, BMA and LARS, and only slightly inferior to that of BART. We also show that Bayesian Synthesis further improves predictive performance. Additionally, we examine the predictive performance of a simple average across analysts, which we dub Convex Synthesis, and find that it also produces an improvement.Comment: Published in at http://dx.doi.org/10.1214/10-AOAS444 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

Crossref

Regularization of Case-Specific Parameters for Robustness and Efficiency

Author: Jung Yoonsuh
Lee Yoonkyung
MacEachern Steven N.
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/08/2012
Field of study

Regularization methods allow one to handle a variety of inferential problems where there are more covariates than cases. This allows one to consider a potentially enormous number of covariates for a problem. We exploit the power of these techniques, supersaturating models by augmenting the "natural" covariates in the problem with an additional indicator for each case in the data set. We attach a penalty term for these case-specific indicators which is designed to produce a desired effect. For regression methods with squared error loss, an

\ell_1

penalty produces a regression which is robust to outliers and high leverage cases; for quantile regression methods, an

\ell_2

penalty decreases the variance of the fit enough to overcome an increase in bias. The paradigm thus allows us to robustify procedures which lack robustness and to increase the efficiency of procedures which are robust. We provide a general framework for the inclusion of case-specific parameters in regularization problems, describing the impact on the effective loss for a variety of regression and classification problems. We outline a computational strategy by which existing software can be modified to solve the augmented regularization problem, providing conditions under which such modification will converge to the optimum solution. We illustrate the benefits of including case-specific parameters in the context of mean regression and quantile regression through analysis of NHANES and linguistic data sets.Comment: Published in at http://dx.doi.org/10.1214/11-STS377 the Statistical Science (http://www.imstat.org/sts/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

Crossref

Bayesian Restricted Likelihood Methods: Conditioning on Insufficient Statistics in Bayesian Regression

Author: Lee Yoonkyung
Lewis John R.
MacEachern Steven N.
Publication venue
Publication date: 18/12/2018
Field of study

Bayesian methods have proven themselves to be successful across a wide range of scientific problems and have many well-documented advantages over competing methods. However, these methods run into difficulties for two major and prevalent classes of problems: handling data sets with outliers and dealing with model misspecification. We outline the drawbacks of previous solutions to both of these problems and propose a new method as an alternative. When working with the new method, the data is summarized through a set of insufficient statistics, targeting inferential quantities of interest, and the prior distribution is updated with the summary statistics rather than the complete data. By careful choice of conditioning statistics, we retain the main benefits of Bayesian methods while reducing the sensitivity of the analysis to features of the data not captured by the conditioning statistics. For reducing sensitivity to outliers, classical robust estimators (e.g., M-estimators) are natural choices for conditioning statistics. A major contribution of this work is the development of a data augmented Markov chain Monte Carlo (MCMC) algorithm for the linear model and a large class of summary statistics. We demonstrate the method on simulated and real data sets containing outliers and subject to model misspecification. Success is manifested in better predictive performance for data points of interest as compared to competing methods

arXiv.org e-Print Archive

Efficient Model Selection in Linear and Non-Linear Quantile Regression by Cross-Validation

Author: Jung Yoonsuh
MacEachern Steven N.
Publication venue
Publication date: 19/04/2016
Field of study

Check loss function is used to define quantile regression. In the prospect of cross validation, it is also employed as a validation function when underlying truth is unknown. However, our empirical study indicates that the validation with check loss often leads to choosing an over estimated fits. In this work, we suggest a modified or L2-adjusted check loss which rounds the sharp corner in the middle of check loss. It has a large effect of guarding against over fitted model in some extent. Through various simulation settings of linear and non-linear regressions, the improvement of check loss by L2 adjustment is empirically examined. This adjustment is devised to shrink to zero as sample size grows

Research Commons@Waikato