Search CORE

569 research outputs found

The Influence Function of Penalized Regression Estimators

Author: Alfons Andreas
Croux Christophe
Öllerer Viktoria
Publication venue: 'Informa UK Limited'
Publication date: 06/01/2015
Field of study

To perform regression analysis in high dimensions, lasso or ridge estimation are a common choice. However, it has been shown that these methods are not robust to outliers. Therefore, alternatives as penalized M-estimation or the sparse least trimmed squares (LTS) estimator have been proposed. The robustness of these regression methods can be measured with the influence function. It quantifies the effect of infinitesimal perturbations in the data. Furthermore it can be used to compute the asymptotic variance and the mean squared error. In this paper we compute the influence function, the asymptotic variance and the mean squared error for penalized M-estimators and the sparse LTS estimator. The asymptotic biasedness of the estimators make the calculations nonstandard. We show that only M-estimators with a loss function with a bounded derivative are robust against regression outliers. In particular, the lasso has an unbounded influence function.Comment: appears in Statistics: A Journal of Theoretical and Applied Statistics, 201

arXiv.org e-Print Archive

EUR Research Repository

The shooting S-estimator for robust regression

Author: Alfons Andreas
Croux Christophe
Öllerer Viktoria
Publication venue
Publication date: 03/06/2015
Field of study

To perform multiple regression, the least squares estimator is commonly used. However, this estimator is not robust to outliers. Therefore, robust methods such as S-estimation have been proposed. These estimators flag any observation with a large residual as an outlier and downweight it in the further procedure. However, a large residual may be caused by an outlier in only one single predictor variable, and downweighting the complete observation results in a loss of information. Therefore, we propose the shooting S-estimator, a regression estimator that is especially designed for situations where a large number of observations suffer from contamination in a small number of predictor variables. The shooting S-estimator combines the ideas of the coordinate descent algorithm with simple S-regression, which makes it robust against componentwise contamination, at the cost of failing the regression equivariance property

arXiv.org e-Print Archive

Lirias

Crossref

EUR Research Repository

robustHD: An R package for robust regression with high-dimensional data

Author: Alfons Andreas
Publication venue: 'The Open Journal'
Publication date: 03/11/2021
Field of study

EUR Research Repository

An Object-Oriented Framework for Statistical Simulation: The R Package simFrame

Author: Andreas Alfons
Matthias Templ
Peter Filzmoser
Publication venue
Publication date
Field of study

Simulation studies are widely used by statisticians to gain insight into the quality of developed methods. Usually some guidelines regarding, e.g., simulation designs, contamination, missing data models or evaluation criteria are necessary in order to draw meaningful conclusions. The R package simFrame is an object-oriented framework for statistical simulation, which allows researchers to make use of a wide range of simulation designs with a minimal effort of programming. Its object-oriented implementation provides clear interfaces for extensions by the user. Since statistical simulation is an embarrassingly parallel process, the framework supports parallel computing to increase computational performance. Furthermore, an appropriate plot method is selected automatically depending on the structure of the simulation results. In this paper, the implementation of simFrame is discussed in great detail and the functionality of the framework is demonstrated in examples for different simulation designs.

Research Papers in Economics

Sparse least trimmed squares regression.

Author: Alfons Andreas
Croux Christophe
Gelper Sarah
Publication venue
Publication date
Field of study

Sparse model estimation is a topic of high importance in modern data analysis due to the increasing availability of data sets with a large number of variables. Another common problem in applied statistics is the presence of outliers in the data. This paper combines robust regression and sparse model estimation. A robust and sparse estimator is introduced by adding an L1 penalty on the coefficient estimates to the well known least trimmed squares (LTS) estimator. The breakdown point of this sparse LTS estimator is derived, and a fast algorithm for its computation is proposed. Both the simulation study and the real data example show that the LTS has better prediction performance than its competitors in the presence of leverage points.Breakdown point; Outliers; Penalized regression; Robust regression; Trimming;

Research Papers in Economics

Open Science Perspectives on Machine Learning for the Identification of Careless Responding:A New Hope or Phantom Menace?

Author: Alfons Andreas
Welz Max
Publication venue
Publication date: 01/02/2024
Field of study

Powerful methods for identifying careless respondents in survey data are not just important to ensure the validity of subsequent data analyses, they are also instrumental for studying the psychological processes that drive humans to respond carelessly. Conversely, a deeper understanding of the phenomenon of careless responding enables the development of improved methods for the identification of careless respondents. While machine learning has gained substantial attention and popularity in many scientific fields, it is largely unexplored for the detection of careless responding. On the one hand, machine learning algorithms can be highly powerful tools due to their flexibility. On the other hand, science based on machine learning has been criticized in the literature for a lack of reproducibility. We assess the potential and the pitfalls of machine learning approaches for identifying careless respondents from an open science perspective. In particular, we discuss possible sources of reproducibility issues when applying machine learning in the context of careless responding, and we give practical guidelines on how to avoid them. Furthermore, we illustrate the high potential of an unsupervised machine learning method for the identification of careless respondents in a proof-of-concept simulation experiment. Finally, we stress the necessity of building an open data repository with labeled benchmark data sets, which would enable the evaluation of methods in a more realistic setting and make it possible to train supervised learning methods. Without such a data repository, the true potential of machine learning for the identification of careless responding may fail to be unlocked.</p

EUR Research Repository

Generating a Close-to-Reality Synthetic Population of Ghana

Author: Alfons Andreas
Frazier Tyler
Publication venue: W&M ScholarWorks
Publication date: 01/01/2012
Field of study

The purpose of this research is to generate a close-to-reality synthetic human population for use in a geosimulation of urban dynamics. Two commonly accepted approaches to generating synthetic human populations are Iterative Proportional Fitting (IPF) and Resampling with Replacement. While these methods are effective at reproducing one instance of the probability model describing the survey, it is an instance with extremely small variability amongst subgroups and is very unlikely to be the real population. IPF and Resampling with Replacement also rely on pure replication of units from the underlying sample which can increase unrealistic model behavior. In this work we present a sequential logic for estimating variables using multinomial logistic regressions and the conditional probabilities amongst each variable in order to generate combinations which were not represented in the original survey but are likely to occur in the real population. We also present a model based approach to imputing missing observation responses and apply the methodology to the Ghana Living Standard Survey 5 (GLSS5) in order to generate a comprehensive synthetic population for the Republic of Ghana, including such household and person variables as household size, tribal affiliation, educational attainment and annual income, amongst others. The R language and environment for statistical computing was used as well as the packages VIM and simPopulation in developing and executing the code. Contingency coefficients, cumulative distributions, mosaic plots, and box plots are presented for evaluation in order to demonstrate the effectiveness of the new method in its application to Ghana

Lirias

DepositOnce

College of William & Mary: W&M Publish

Economic analysis of site-specific wheat management with respect to grain quality and separation of the different quality fractions

Author: Gandorfer Markus
Meyer-Aurich Andreas
Wagner Peter
Weersink Alfons
Publication venue
Publication date
Field of study

The paper analyzes site-specific and uniform management options for wheat production with respect to grain quality. Besides site-specific fertilization the economic potential of segregation of different grain qualities is the subject of this paper. Yield and quality response to fertilizer were taken from field experiments in Germany to calculate site-specific response functions. The economic optima were calculated for uniform management (UM), complete separate management of the subfields (SM), site-specific fertilization (SSF) and grain segregation (GS) for different price structures according to different grain qualities. The results show that over all price structures, highest economic potential was found with SM or SSF compared to UM. However, these management practices require the possibility to separately manage subfields (SM) or specific fertilization equipment and fertilizer algorithms (SSM). GS did not have a higher economic potential than UM. However, if required grain qualities are not met for the whole field, GS can substantially reduce profit losses by separating part of the grains and selling them at higher prices. This may save the farmer more than 50 € ha–1. In situations where higher grain qualities could only be obtained at the expense of yield penalties, premiums for higher grain qualities can create incentives for fertilizer rates beyond the yield maximizing rate. GS technologies may even boost this effect.site-specific nitrogen management, wheat quality, grain segregation., Crop Production/Industries,

Research Papers in Economics

Cost Efficient Tillage and Rotation Options for Mitigating GHG Emissions from Agriculture in Eastern Canada

Author: Deen Bill
Janovicek Ken
Meyer-Aurich Andreas
Weersink Alfons
Publication venue
Publication date
Field of study

The economic efficiency of cropping options to mitigate GHG emissions with agriculture in Eastern Canada was analyzed. Data on yield response to tillage (moldboard plow and chisel plow) and six corn based rotations were obtained from a 20-year field experiment in Ontario. Budgets were constructed for each cropping system while GHG emissions were measured for soil carbon and were estimated for nitrous oxide according to IPCC methodology. Complex crop rotations with legumes, such as corn-corn-soybeans-wheat with red clover underseeded, have higher net returns and substantially (more than 1 Mg ha1 year1) lower GHG emissions than continuous corn. Reduced tillage reduces GHG emissions due to lower input use but no sequestration effect could be found in the soil from tillage. Rotation had a much bigger effect on the mitigation potential of GHG emissions than tillage. However, opportunity costs of more than $200 per Mg CO2 eq ha1 year1 indicate the limits to increase the mitigation potential beyond the level of the economic best cropping system.Environmental Economics and Policy,

Research Papers in Economics

Effectiveness of Best Management Cropping Systems to Abate Greenhouse Gas Emissions

Author: Jayasundara Susantha
Meyer-Aurich Andreas
Wagner-Riddle Claudia
Weersink Alfons
Publication venue
Publication date
Field of study

Best management practices (BMPs) for cropping systems that involve conservation tillage and nutrient management are proposed as potential win-win solutions for both farmers and the environment. While originally targeted as a means for improving soil and water quality, these BMPs may also contribute to the mitigation of greenhouse gases (GHGs). Mitigation efforts have focused primarily on the ability of BMPs to sequester carbon and the subsequent potential revenue source carbon sequestration may represent to farmers. Increasingly, evidence from experimental stations calls into question the potential for C-sequestration with reduced tillage in soils in Eastern Canada. However, there are other ways in which BMPs can reduce GHG emissions: lowering fuel and nitrogen fertilizer consumption and, potentially, lowering emissions of nitrous oxide from the soil. This article examines the profitability and emission reduction potential of best management cropping practices for Ontario.Agricultural and Food Policy, Farm Management,

Research Papers in Economics