2,340 research outputs found
A Conversation with Professor Tadeusz Cali\'{n}ski
Tadeusz Cali\'{n}ski was born in Pozna\'{n}, Poland in 1928. Despite the
absence of formal secondary eduction for Poles during the Second World War, he
entered the University of Pozna\'{n} in 1948, initially studying agronomy and
in later years mathematics. From 1953 to 1988 he taught statistics, biometry
and experimental design at the Agricultural University of Pozna\'{n}. During
this period he founded and developed the Pozna\'{n} inter-university school of
mathematical statistics and biometry, which has become one of the most
important schools of this type in Poland and beyond. He has supervised 24 Ph.D.
students, many of whom are currently professors at a variety of universities.
He is now Professor Emeritus. Among many awards, in 1995 Professor Cali\'{n}ski
received the Order of Polonia Restituta for his outstanding achievements in the
fields of Education and Science. In 2012 the Polish Statistical Society awarded
him The Jerzy Sp{\l}awa-Neyman Medal for his contribution to the development of
research in statistics in Poland. Professor Cali\'{n}ski in addition has
Doctoral Degrees honoris causa from the Agricultural University of Pozna\'{n}
and the Warsaw University of Life Sciences. His research interests include
mathematical statistics and biometry, with applications to agriculture, natural
sciences, biology and genetics. He has published over 140 articles in
scientific journals as well as, with Sanpei Kageyama, two important books on
the randomization approach to the design and analysis of experiments. He has
been extremely active and successful in initiating and contributing to fruitful
international research cooperation between Polish statisticians and
biometricians and their colleagues in various countries, particularly in the
Netherlands, France, Italy, Great Britain, Germany, Japan and Portugal. The
conversations in addition cover the history of biometry and experimental design
in Poland and the early influence of British statisticians.Comment: Published at http://dx.doi.org/10.1214/15-STS522 in the Statistical
Science (http://www.imstat.org/sts/) by the Institute of Mathematical
Statistics (http://www.imstat.org
A Parametric Framework for the Comparison of Methods of Very Robust Regression
There are several methods for obtaining very robust estimates of regression
parameters that asymptotically resist 50% of outliers in the data. Differences
in the behaviour of these algorithms depend on the distance between the
regression data and the outliers. We introduce a parameter that
defines a parametric path in the space of models and enables us to study, in a
systematic way, the properties of estimators as the groups of data move from
being far apart to close together. We examine, as a function of , the
variance and squared bias of five estimators and we also consider their power
when used in the detection of outliers. This systematic approach provides tools
for gaining knowledge and better understanding of the properties of robust
estimators.Comment: Published in at http://dx.doi.org/10.1214/13-STS437 the Statistical
Science (http://www.imstat.org/sts/) by the Institute of Mathematical
Statistics (http://www.imstat.org
Finding an unknown number of multivariate outliers
We use the forward search to provide robust Mahalanobis distances to detect the presence of outliers in a sample of multivariate normal data. Theoretical results on order statistics and on estimation in truncated samples provide the distribution of our test statistic. We also introduce several new robust distances with associated distributional results. Comparisons of our procedure with tests using other robust Mahalanobis distances show the good size and high power of our procedure. We also provide a unification of results on correction factors for estimation from truncated samples
Income Inequality in OECD Countries: Data and Explanations
There is much disagreement about both the facts and the explanations of income inequality. Even if we confine attention to OECD countries, we find people arguing that there has been a great U-turn, with inequality rising sharply after its post war fall, and others who believe that the speed of change is glacial. In order to evaluate the historical record, we need data for a long run of years. The present paper reviews evidence about covering the period 1945-2001 for nine OECD countries. It is widely believed that rising inequality is attributable to technological change and to globalisation. The second part of the paper argues that these are only part of a complex story. Household incomes depend on public policy and on sources of income apart from work. What is happening at the top of the distribution may need to be explained quite differently
Optimal response and covariate-adaptive biased-coin designs for clinical trials with continuous multivariate or longitudinal responses
Adaptive randomization of the sequential construction of optimum experimental designs is used to derive biased-coin designs for longitudinal clinical trials with continuous responses. The designs, coming from a very general rule, target pre-specified allocation proportions for the ranked treatment effects. Many of the properties of the designs are similar to those of well understood designs for univariate responses. A numerical study illustrates this similarity in a comparison of four designs for longitudinal trials. Designs for multivariate responses can likewise be found, requiring only the appropriate information matrix. Some new results in the theory of optimum experimental design for multivariate responses are presented
Building Regression Models with the Forward Search
We give an example of the use of the forward search in building a regression model. The standard backwards elimination of variables is supplemented by forward plots of added variable t statistics that exhibit the effect of each observation on the process of model building. Attention is also paid to the effect of individual observations on selection of a transformation. Variable selection using AIC is mentioned, as is the analysis of multivariate data
Robust Bayesian regression with the forward search: theory and data analysis
The frequentist forward search yields a flexible and informative form of robust regression. The device of fictitious observations provides a natural way to include prior information in the search. However, this extension is not straightforward, requiring weighted regression. Bayesian versions of forward plots are used to exhibit the presence of multiple outliers in a data set from banking with 1903 observations and nine explanatory variables which shows, in this case, the clear advantages from including prior information in the forward search. Use of observation weights from frequentist robust regression is shown to provide a simple general method for robust Bayesian regression
fsdaSAS: a package for robust regression for very large datasets including the batch forward search
The forward search (FS) is a general method of robust data fitting that moves smoothly from very robust to maximum likelihood estimation. The regression procedures are included in the MATLAB toolbox FSDA. The work on a SAS version of the FS originates from the need for the analysis of large datasets expressed by law enforcement services operating in the European Union that use our SAS software for detecting data anomalies that may point to fraudulent customs returns. Specific to our SAS implementation, the fsdaSAS package, we describe the approximation used to provide fast analyses of large datasets using an FS which progresses through the inclusion of batches of observations, rather than progressing one observation at a time. We do, however, test for outliers one observation at a time. We demonstrate that our SAS implementation becomes appreciably faster than the MATLAB version as the sample size increases and is also able to analyse larger datasets. The series of fits provided by the FS leads to the adaptive data-dependent choice of maximally efficient robust estimates. This also allows the monitoring of residuals and parameter estimates for fits of differing robustness levels. We mention that our fsdaSAS also applies the idea of monitoring to several robust estimators for regression for a range of values of breakdown point or nominal efficiency, leading to adaptive values for these parameters. We have also provided a variety of plots linked through brushing. Further programmed analyses include the robust transformations of the response in regression. Our package also provides the SAS community with methods of monitoring robust estimators for multivariate data, including multivariate data transformations
The box-cox transformation: review and extensions
The Box-Cox power transformation family for non-negative responses in linear models has a long and interesting history in both statistical practice and theory, which we summarize. The relationship between generalized linear models and log transformed data is illustrated. Extensions investigated include the transform both sides model and the Yeo-Johnson transformation for observations that can be positive or negative. The paper also describes an extended Yeo-Johnson transformation that allows positive and negative responses to have different power transformations. Analyses of data show this to be necessary. Robustness enters in the fan plot for which the forward search provides an ordering of the data. Plausible transformations are checked with an extended fan plot. These procedures are used to compare parametric power transformations with nonparametric transformations produced by smoothing
- …