380 research outputs found

    Finding an unknown number of multivariate outliers

    Get PDF
    We use the forward search to provide robust Mahalanobis distances to detect the presence of outliers in a sample of multivariate normal data. Theoretical results on order statistics and on estimation in truncated samples provide the distribution of our test statistic. We also introduce several new robust distances with associated distributional results. Comparisons of our procedure with tests using other robust Mahalanobis distances show the good size and high power of our procedure. We also provide a unification of results on correction factors for estimation from truncated samples

    A Parametric Framework for the Comparison of Methods of Very Robust Regression

    Full text link
    There are several methods for obtaining very robust estimates of regression parameters that asymptotically resist 50% of outliers in the data. Differences in the behaviour of these algorithms depend on the distance between the regression data and the outliers. We introduce a parameter λ\lambda that defines a parametric path in the space of models and enables us to study, in a systematic way, the properties of estimators as the groups of data move from being far apart to close together. We examine, as a function of λ\lambda, the variance and squared bias of five estimators and we also consider their power when used in the detection of outliers. This systematic approach provides tools for gaining knowledge and better understanding of the properties of robust estimators.Comment: Published in at http://dx.doi.org/10.1214/13-STS437 the Statistical Science (http://www.imstat.org/sts/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Building Regression Models with the Forward Search

    Get PDF
    We give an example of the use of the forward search in building a regression model. The standard backwards elimination of variables is supplemented by forward plots of added variable t statistics that exhibit the effect of each observation on the process of model building. Attention is also paid to the effect of individual observations on selection of a transformation. Variable selection using AIC is mentioned, as is the analysis of multivariate data

    Robust Bayesian regression with the forward search: theory and data analysis

    Get PDF
    The frequentist forward search yields a flexible and informative form of robust regression. The device of fictitious observations provides a natural way to include prior information in the search. However, this extension is not straightforward, requiring weighted regression. Bayesian versions of forward plots are used to exhibit the presence of multiple outliers in a data set from banking with 1903 observations and nine explanatory variables which shows, in this case, the clear advantages from including prior information in the forward search. Use of observation weights from frequentist robust regression is shown to provide a simple general method for robust Bayesian regression

    The box-cox transformation: review and extensions

    Get PDF
    The Box-Cox power transformation family for non-negative responses in linear models has a long and interesting history in both statistical practice and theory, which we summarize. The relationship between generalized linear models and log transformed data is illustrated. Extensions investigated include the transform both sides model and the Yeo-Johnson transformation for observations that can be positive or negative. The paper also describes an extended Yeo-Johnson transformation that allows positive and negative responses to have different power transformations. Analyses of data show this to be necessary. Robustness enters in the fan plot for which the forward search provides an ordering of the data. Plausible transformations are checked with an extended fan plot. These procedures are used to compare parametric power transformations with nonparametric transformations produced by smoothing

    The analysis of transformations for profit-and-loss data

    Get PDF
    We analyse data on the performance of investment funds, 99 out of 309 of which report a loss, and on the profitability of 1405 firms, 407 of which report losses. The problem in both cases is to use regression to predict performance from sets of explanatory variables. In one case, it is clear from scatter plots of the data that the negative responses have a lower variance than the positive responses and a different relationship with the explanatory variables. Because the data include negative responses, the Box–Cox transformation cannot be used. We develop a robust version of an extension to the Yeo–Johnson transformation which allows different transformations for positive and negative responses. Tests and graphical methods from our robust analysis enable the detection of outliers, the assessment of values of the two transformation parameters and the building of simple regression models. Performance comparisons are made with non-parametric transformations

    Robust regression with density power divergence: theory, comparisons, and data analysis

    Get PDF
    Minimum density power divergence estimation provides a general framework for robust statistics, depending on a parameter α , which determines the robustness properties of the method. The usual estimation method is numerical minimization of the power divergence. The paper considers the special case of linear regression. We developed an alternative estimation procedure using the methods of S-estimation. The rho function so obtained is proportional to one minus a suitably scaled normal density raised to the power α . We used the theory of S-estimation to determine the asymptotic efficiency and breakdown point for this new form of S-estimation. Two sets of comparisons were made. In one, S power divergence is compared with other S-estimators using four distinct rho functions. Plots of efficiency against breakdown point show that the properties of S power divergence are close to those of Tukey's biweight. The second set of comparisons is between S power divergence estimation and numerical minimization. Monitoring these two procedures in terms of breakdown point shows that the numerical minimization yields a procedure with larger robust residuals and a lower empirical breakdown point, thus providing an estimate of α leading to more efficient parameter estimates

    Determinantes da Pressão Arterial Elevada em Crianças: um Estudo de Caso-controle em Vitória-es

    Get PDF
    A Hipertensão Arterial (HA) se constitui em um dos maiores problemas de saúde pública em todo mundo pelo seu forte impacto na morbi-mortalidade cardiovascular. Estudos clínicos e epidemiológicos têm demonstrado que a HA tem alcançado fases precoces da vida e subsidiam a hipótese de que a elevação da pressão arterial realmente começa na infância. Diferentemente do adulto, no qual os determinantes para o desenvolvimento da HA estão bem estabelecidos, em crianças são pouco compreendidos e, por vezes, conflitantes nos diversos estudos presentes na literatura. Este trabalho teve como objetivo identificar fatores preditores da ocorrência da pressão arterial elevada em crianças de 7 a 10 anos. Foi realizado um estudo do tipo caso-controle a partir de uma amostra representativa de crianças de 7 a 10 anos da cidade de Vitória/ES. O grupo de casos foi constituído por 159 crianças com pressão arterial elevada (PA sistólica ou PA diastólica acima ou igual ao percentil 95) e o de controles por 636 crianças com PA em níveis normais (PA abaixo do percentil 90), perfazendo um total de 795 crianças. Foi realizado pareamento das crianças por idade e sexo. Foram estudadas variáveis sócio-demográficas (raça/cor, tipo de escola, classificação socioeconômica e escolaridade da mãe) e referentes às crianças (excesso de peso, peso ao nascer, idade gestacional, aleitamento materno exclusivo, tempo em atividades físicas sedentárias, tempo diário de atividade física, número de horas de sono por dia e exposição ao tabaco). Diferenças significativas entre casos e controles foram observadas para idade gestacional (RC= 1,8 IC95%1,03,0; p=0,038), tipo de escola (RC= 1,9 IC95%1,1-3,2; p=0,021) e exposição ao tabaco (RC= 0,5 IC95%0,30,8; p=0,005). Crianças nascidas prematuras ou que estudam em escola pública apresentam duas vezes mais chances de ter pressão arterial elevada e crianças cujas mães não fumam (RC= 0,5 IC95%0,3 0,8; p=0,005) possuem 50% menos chances de apresentarem níveis pressóricos elevados

    Statistical and Proactive Analysis of an Inter-Laboratory Comparison: The Radiocarbon Dating of the Shroud of Turin.

    Get PDF
    We review the sampling and results of the radiocarbon dating of the archaeological cloth known as the Shroud of Turin, in the light of recent statistical analyses of both published and raw data. The statistical analyses highlight an inter-laboratory heterogeneity of the means and a monotone spatial variation of the ages of subsamples that suggest the presence of contaminants unevenly removed by the cleaning pretreatments. We consider the significance and overall impact of the statistical analyses on assessing the reliability of the dating results and the design of correct sampling. These analyses suggest that the 1988 radiocarbon dating does not match the current accuracy requirements. Should this be the case, it would be interesting to know the accurate age of the Shroud of Turin. Taking into account the whole body of scientific data, we discuss whether it makes sense to date the Shroud again
    corecore