21 research outputs found

    Quantile Based Estimation of Scale and Dependence

    Get PDF
    QUANTILE BASED ESTIMATION OF SCALE AND DEPENDENCE Garth Tarr Abstract The sample quantile has a long history in statistics. The aim of this thesis is to explore some further applications of quantiles as simple, convenient and robust alternatives to classical procedures. Chapter 1 addresses the need for reliable confidence intervals for quantile regression coefficients particularly in small samples. We demonstrate the competitive performance of the xy-pair quantile bootstrap approach in a broad range of model designs with a focus on small and moderate sample sizes. Chapter 2 forms the core of this thesis with its investigation into robust estimation of scale. Common robust estimators of scale such as the interquartile range and the median absolute deviation from the median are inefficient when the observations come from a Gaussian distribution. We present a new robust scale estimator, Pn, which is proportional to the interquartile range of the pairwise means. When the underlying distribution is Gaussian, Pn trades some robustness for high Gaussian efficiency. Chapter 3 extends our robust scale estimator to the bivariate setting. We show that the resulting covariance estimator inherits the robustness and efficiency properties of the underlying scale estimator. We also consider the problem of estimating scale and autocovariance in dependent processes. We establish the asymptotic normality of Pn under short and mildly long range dependent Gaussian processes. In the case of extreme long range dependence, we prove a non-normal limit result for the interquartile range. Simulation suggests that an equivalent result holds for Pn. Chapter 4 looks at the problem of estimating covariance and precision matrices under cellwise contamination. A pairwise approach is shown to perform well under much higher levels of contamination than standard robust techniques would allow. Our approach works well with high levels of scattered contamination and has the advantage of being able to impose sparsity on the resulting precision matrix

    mplot: An R Package for Graphical Model Stability and Variable Selection Procedures

    Get PDF
    The mplot package provides an easy to use implementation of model stability and variable inclusion plots (M\"uller and Welsh 2010; Murray, Heritier, and M\"uller 2013) as well as the adaptive fence (Jiang, Rao, Gu, and Nguyen 2008; Jiang, Nguyen, and Rao 2009) for linear and generalised linear models. We provide a number of innovations on the standard procedures and address many practical implementation issues including the addition of redundant variables, interactive visualisations and approximating logistic models with linear models. An option is provided that combines our bootstrap approach with glmnet for higher dimensional models. The plots and graphical user interface leverage state of the art web technologies to facilitate interaction with the results. The speed of implementation comes from the leaps package and cross-platform multicore support.Comment: 28 pages, 9 figure

    Robust Variable Selection under Cellwise Contamination

    Full text link
    Cellwise outliers are widespread in data and traditional robust methods may fail when applied to datasets under such contamination. We propose a variable selection procedure, that uses a pairwise robust estimator to obtain an initial empirical covariance matrix among the response and potentially many predictors. Then we replace the primary design matrix and the response vector with their robust counterparts based on the estimated covariance matrix. Finally, we adopt the adaptive Lasso to obtain variable selection results. The proposed approach is robust to cellwise outliers in regular and high dimensional settings and empirical results show good performance in comparison with recently proposed alternative robust approaches, particularly in the challenging setting when contamination rates are high but the magnitude of outliers is moderate. Real data applications demonstrate the practical utility of the proposed method.Comment: 17 pages, 4 figure

    Predicting Hemolytic Uremic Syndrome and Renal Replacement Therapy in Shiga Toxin-producing Escherichia coli-infected Children.

    Get PDF
    BACKGROUND: Shiga toxin-producing Escherichia coli (STEC) infections are leading causes of pediatric acute renal failure. Identifying hemolytic uremic syndrome (HUS) risk factors is needed to guide care. METHODS: We conducted a multicenter, historical cohort study to identify features associated with development of HUS (primary outcome) and need for renal replacement therapy (RRT) (secondary outcome) in STEC-infected children without HUS at initial presentation. Children agedeligible. RESULTS: Of 927 STEC-infected children, 41 (4.4%) had HUS at presentation; of the remaining 886, 126 (14.2%) developed HUS. Predictors (all shown as odds ratio [OR] with 95% confidence interval [CI]) of HUS included younger age (0.77 [.69-.85] per year), leukocyte count ≥13.0 × 103/μL (2.54 [1.42-4.54]), higher hematocrit (1.83 [1.21-2.77] per 5% increase) and serum creatinine (10.82 [1.49-78.69] per 1 mg/dL increase), platelet count \u3c250 \u3e× 103/μL (1.92 [1.02-3.60]), lower serum sodium (1.12 [1.02-1.23 per 1 mmol/L decrease), and intravenous fluid administration initiated ≥4 days following diarrhea onset (2.50 [1.14-5.46]). A longer interval from diarrhea onset to index visit was associated with reduced HUS risk (OR, 0.70 [95% CI, .54-.90]). RRT predictors (all shown as OR [95% CI]) included female sex (2.27 [1.14-4.50]), younger age (0.83 [.74-.92] per year), lower serum sodium (1.15 [1.04-1.27] per mmol/L decrease), higher leukocyte count ≥13.0 × 103/μL (2.35 [1.17-4.72]) and creatinine (7.75 [1.20-50.16] per 1 mg/dL increase) concentrations, and initial intravenous fluid administration ≥4 days following diarrhea onset (2.71 [1.18-6.21]). CONCLUSIONS: The complex nature of STEC infection renders predicting its course a challenge. Risk factors we identified highlight the importance of avoiding dehydration and performing close clinical and laboratory monitoring

    Regularized Predictive Models for Beef Eating Quality of Individual Meals

    No full text
    Faced with changing markets and evolving consumer demands, beef industries are investing in grading systems to maximise value extraction throughout their entire supply chain. The Meat Standards Australia (MSA) system is a customer-oriented total quality management system that stands out internationally by predicting quality grades of specific muscles processed by a designated cooking method. The model currently underpinning the MSA system requires laborious effort to estimate and its prediction performance may be less accurate in the presence of unbalanced data sets where many "muscle x cook" combinations have few observations and/or few predictors of palatability are available. This paper proposes a novel predictive method for beef eating quality that bridges a spectrum of muscle x cook-specific models. At one extreme, each muscle x cook combination is modelled independently; at the other extreme a pooled predictive model is obtained across all muscle x cook combinations. Via a data-driven regularization method, we cover all muscle x cook-specific models along this spectrum. We demonstrate that the proposed predictive method attains considerable accuracy improvements relative to independent or pooled approaches on unique MSA data sets

    mplot: An R Package for Graphical Model Stability and Variable Selection Procedures

    Get PDF
    The mplot package provides an easy to use implementation of model stability and variable inclusion plots (Müller and Welsh 2010; Murray, Heritier, and Müller 2013) as well as the adaptive fence (Jiang, Rao, Gu, and Nguyen 2008; Jiang, Nguyen, and Rao 2009) for linear and generalized linear models. We provide a number of innovations on the standard procedures and address many practical implementation issues including the addition of redundant variables, interactive visualizations and the approximation of logistic models with linear models. An option is provided that combines our bootstrap approach with glmnet for higher dimensional models. The plots and graphical user interface leverage state of the art web technologies to facilitate interaction with the results. The speed of implementation comes from the leaps package and cross-platform multicore support

    CR-Lasso: Robust cellwise regularized sparse regression

    Full text link
    Cellwise contamination remains a challenging problem for data scientists, particularly in research fields that require the selection of sparse features. Traditional robust methods may not be feasible nor efficient in dealing with such contaminated datasets. We propose CR-Lasso, a robust Lasso-type cellwise regularization procedure that performs feature selection in the presence of cellwise outliers by minimising a regression loss and cell deviation measure simultaneously. To evaluate the approach, we conduct empirical studies comparing its selection and prediction performance with several sparse regression methods. We show that CR-Lasso is competitive under the settings considered. We illustrate the effectiveness of the proposed method on real data through an analysis of a bone mineral density dataset
    corecore