Search CORE

1,339 research outputs found

The Measure and Mismeasure of Fairness: A Critical Review of Fair Machine Learning

Author: Corbett-Davies Sam
Goel Sharad
Publication venue
Publication date: 14/08/2018
Field of study

The nascent field of fair machine learning aims to ensure that decisions guided by algorithms are equitable. Over the last several years, three formal definitions of fairness have gained prominence: (1) anti-classification, meaning that protected attributes---like race, gender, and their proxies---are not explicitly used to make decisions; (2) classification parity, meaning that common measures of predictive performance (e.g., false positive and false negative rates) are equal across groups defined by the protected attributes; and (3) calibration, meaning that conditional on risk estimates, outcomes are independent of protected attributes. Here we show that all three of these fairness definitions suffer from significant statistical limitations. Requiring anti-classification or classification parity can, perversely, harm the very groups they were designed to protect; and calibration, though generally desirable, provides little guarantee that decisions are equitable. In contrast to these formal fairness criteria, we argue that it is often preferable to treat similarly risky people similarly, based on the most statistically accurate estimates of risk that one can produce. Such a strategy, while not universally applicable, often aligns well with policy objectives; notably, this strategy will typically violate both anti-classification and classification parity. In practice, it requires significant effort to construct suitable risk estimates. One must carefully define and measure the targets of prediction to avoid retrenching biases in the data. But, importantly, one cannot generally address these difficulties by requiring that algorithms satisfy popular mathematical formalizations of fairness. By highlighting these challenges in the foundation of fair machine learning, we hope to help researchers and practitioners productively advance the area

arXiv.org e-Print Archive

Empirical Models of Auctions

Author: Philip A. Haile
Susan Athey
Publication venue
Publication date
Field of study

Many important economic questions arising in auctions can be answered only with knowledge of the underlying primitive distributions governing bidder demand and information. An active literature has developed aiming to estimate these primitives by exploiting restrictions from economic theory as part of the econometric model used to interpret auction data. We review some highlights of this recent literature, focusing on identification and empirical applications. We describe three insights that underlie much of the recent methodological progress in this area and discuss some of the ways these insights have been extended to richer models allowing more convincing empirical applications. We discuss several recent empirical studies using these methods to address a range of important economic questions.Auctions, Identification, Estimation, Testing

Research Papers in Economics

Empirical Models of Auctions

Author: Philip A. Haile
Susan Athey
Publication venue
Publication date
Field of study

Research Papers in Economics

Structural Econometric Methods in Auctions: A Guide to the Literature

Author: Sağlam Yiğit
Publication venue
Publication date: 01/01/2012
Field of study

Auction models have proved to be attractive to structural econometricians who, since the late 1980s, have made substantial progress in identifying and estimating these rich game-theoretic models of bidder behavior. We provide a guide to the literature in which we contrast the various informational structures (paradigms) commonly assumed by researchers and uncover the evolution of the eld. We highlight major contributions within each paradigm and benchmark modi cations and extensions to these core models. Lastly, we discuss special topics that have received substantial attention among auction researchers in recent years, including auctions formultiple objects, auctions with risk averse bidders, testing between common and private value paradigms, unobserved auction-speci c heterogeneity, and accounting for an unobserved number of bidders as well as endogenous entry

ResearchArchive at Victoria University of Wellington

Recommended from our members

Statistical aspects of credit scoring

Author: Henley William Edward
Publication venue
Publication date: 01/01/1995
Field of study

This thesis is concerned with statistical aspects of credit scoring, the process of determining how likely an applicant for credit is to default with repayments. In Chapters 1-4 a detailed introduction to credit scoring methodology is presented, including evaluation of previous published work on credit scoring and a review of discrimination and classification techniques. In Chapter 5 we describe different approaches to measuring the absolute and relative performance of credit scoring models. Two significance tests are proposed for comparing the bad rate amongst the accepts (or the error rate) from two classifiers. In Chapter 6 we consider different approaches to reject inference, the procedure of allocating class membership probabilities to the rejects. One reason for needing reject inference is to reduce the sample selection bias that results from using a sample consisting only of accepted applicants to build new scorecards. We show that the characteristic vectors for the rejects do not contain information about the parameters of the observed data likelihood, unless extra information or assumptions are included. Methods of reject inference which incorporate additional information are proposed. In Chapter 7 we make comparisons of a range of different parametric and nonparametric classification techniques for credit scoring: linear regression, logistic regression, projection pursuit regression, Poisson regression, decision trees and decision graphs. We conclude that classifier performance is fairly insensitive to the particular technique adopted. In Chapter 8 we describe the application of the k-NN method to credit scoring. We propose using an adjusted version of the Eucidean distance metric, which is designed to incorporate knowledge of class separation contained in the data. We evaluate properties of the k-NN classifier through empirical studies and make comparisons with existing techniques

Open Research Online (The Open University)

Statistical learning in complex and temporal data: distances, two-sample testing, clustering, classification and Big Data

Author: Montero Manso Pablo
Publication venue
Publication date: 01/01/2019
Field of study

Programa Oficial de Doutoramento en Estatística e Investigación Operativa. 555V01[Resumo] Esta tesis trata sobre aprendizaxe estatístico en obxetos complexos, con énfase en series temporais. O problema abórdase introducindo coñecemento sobre o dominio do fenómeno subxacente, mediante distancias e características. Proponse un contraste de dúas mostras basado en distancias e estúdase o seu funcionamento nun gran abanico de escenarios. As distancias para clasificación e clustering de series temporais acadan un incremento da potencia estatística cando se aplican a contrastes de dúas mostras. O noso test compárase de xeito favorable con outros métodos gracias á súa flexibilidade ante diferentes alternativas. Defínese unha nova distancia entre series temporais mediante un xeito innovador de comparar as distribucións retardadas das series. Esta distancia herda o bo funcionamento empírico doutros métodos pero elimina algunhas das súas limitacións. Proponse un método de predicción baseada en características das series. O método combina diferentes algoritmos estándar de predicción mediante unha suma ponderada. Os pesos desta suma veñen dun modelo que se axusta a un conxunto de entrenamento de gran tamaño. Propónse un método de clasificación distribuida, baseado en comparar, mediante unha distancia, as funcións de distribución empíricas do conxuto de proba común e as dos datos que recibe cada nodo de cómputo.[Resumen] Esta tesis trata sobre aprendizaje estadístico en objetos complejos, con énfasis en series temporales. El problema se aborda introduciendo conocimiento del dominio del fenómeno subyacente, mediante distancias y características. Se propone un test de dos muestras basado en distancias y se estudia su funcionamiento en un gran abanico de escenarios. La distancias para clasificación y clustering de series temporales consiguen un incremento de la potencia estadística cuando se aplican al tests de dos muestras. Nuestro test se compara favorablemente con otros métodos gracias a su flexibilidad antes diferentes alternativas. Se define una nueva distancia entre series temporales mediante una manera innovadora de comparar las distribuciones retardadas de la series. Esta distancia hereda el buen funcionamiento empírico de otros métodos pero elimina algunas de sus limitaciones. Se propone un método de predicción basado en características de las series. El método combina diferentes algoritmos estándar de predicción mediante una suma ponderada. Los pesos de esta suma salen de un modelo que se ajusta a un conjunto de entrenamiento de gran tamaño. Se propone un método de clasificación distribuida, basado en comparar, mediante una distancia, las funciones de distribución empírica del conjuto de prueba común y las de los datos que recibe cada nodo de cómputo.[Abstract] This thesis deals with the problem of statistical learning in complex objects, with emphasis on time series data. The problem is approached by facilitating the introduction of domain knoweldge of the underlying phenomena by means of distances and features. A distance-based two sample test is proposed, and its performance is studied under a wide range of scenarios. Distances for time series classification and clustering are also shown to increase statistical power when applied to two-sample testing. Our test compares favorably to other methods regarding its flexibility against different alternatives. A new distance for time series is defined by considering an innovative way of comparing lagged distributions of the series. This distance inherits the good empirical performance of existing methods while removing some of their limitations. A forecast method based on times series features is proposed. The method works by combining individual standard forecasting algorithms using a weighted average. These weights come from a learning model fitted on a large training set. A distributed classification algorithm is proposed, based on comparing, using a distance, the empirical distribution functions between the dataset that each computing node receives and the test set

Repositorio da Universidade da Coruña

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Empirical Models of Auctions

Author: Athey Susan
Haile Philip A.
Publication venue: EliScholar – A Digital Platform for Scholarly Publishing at Yale
Publication date: 01/03/2006
Field of study

Many important economic questions arising in auctions can be answered only with knowledge of the underlying primitive distributions governing bidder demand and information. An active literature has developed aiming to estimate these primitives by exploiting restrictions from economic theory as part of the econometric model used to interpret auction data. We review some highlights of this recent literature, focusing on identiﬁcation and empirical applications. We describe three insights that underlie much of the recent methodological progress in this area and discuss some of the ways these insights have been extended to richer models allowing more convincing empirical applications. We discuss several recent empirical studies using these methods to address a range of important economic questions

Yale University