7 research outputs found
The way forward
The growing use of data mining practices by both government and commercial entities leads to both great promises and challenges. They hold the promise of facilitating an information environment which is fair, accurate and efficient. At the same time, they might lead to practices which are both invasive and discriminatory, yet in ways the law has yet to grasp. This point is demonstrated by showing how the common measures for mitigating privacy concerns, such as a priori limiting measures (particularly access controls, anonymity and purpose specification) are mechanisms that are increasingly failing solutions against privacy and discrimination issues in this novel context. Instead, a focus on (a posteriori) accountability and transparency may be more useful. This requires improved detection of discrimination and privacy violations as well as designing and implementing techniques that are discrimination-free and privacy-preserving. This requires further (technological) research. But even with further technological research, there may be new situations and new mechanisms through which privacy violations or discrimination may take place. Novel predictive models can prove to be no more than sophisticated tools to mask the "classic" forms of discrimination, by hiding discrimination behind new proxies. Also, discrimination might be transferred to new forms of population segments, dispersed throughout society and only connected by some attributes they have in common. Such groups will lack political force to defend their interests. They might not even know what is happening. With regard to privacy, the adequacy of the envisaged European legal framework is discussed in the light of data mining and profiling. The European Union is currently revising the data protection legislation. The question whether these new proposals will adequately address the issues raised in this book is dealt with
The way forward
The growing use of data mining practices by both government and commercial entities leads to both great promises and challenges. They hold the promise of facilitating an information environment which is fair, accurate and efficient. At the same time, they might lead to practices which are both invasive and discriminatory, yet in ways the law has yet to grasp. This point is demonstrated by showing how the common measures for mitigating privacy concerns, such as a priori limiting measures (particularly access controls, anonymity and purpose specification) are mechanisms that are increasingly failing solutions against privacy and discrimination issues in this novel context. Instead, a focus on (a posteriori) accountability and transparency may be more useful. This requires improved detection of discrimination and privacy violations as well as designing and implementing techniques that are discrimination-free and privacy-preserving. This requires further (technological) research. But even with further technological research, there may be new situations and new mechanisms through which privacy violations or discrimination may take place. Novel predictive models can prove to be no more than sophisticated tools to mask the "classic" forms of discrimination, by hiding discrimination behind new proxies. Also, discrimination might be transferred to new forms of population segments, dispersed throughout society and only connected by some attributes they have in common. Such groups will lack political force to defend their interests. They might not even know what is happening. With regard to privacy, the adequacy of the envisaged European legal framework is discussed in the light of data mining and profiling. The European Union is currently revising the data protection legislation. The question whether these new proposals will adequately address the issues raised in this book is dealt with
What is data mining and how does it work?
Due to recent technological developments it became possible to generate and store increasingly larger datasets. Not the amount of data, however, but the ability to interpret and analyze the data, and to base future policies and decisions on the outcome of the analysis determines the value of data. The amounts of data collected nowadays not only offer unprecedented opportunities to improve decision procedures for companies and governments, but also hold great challenges. Many pre-existing data analysis tools did not scale up to the current data sizes. From this need, the research filed of data mining emerged. In this chapter we position data mining with respect to other data analysis techniques and introduce the most important classes of techniques developed in the area: pattern mining, classification, and clustering and outlier detection. Also related, supporting techniques such as pre-processing and database coupling are discussed
Why unbiased computational processes can lead to discriminative decision procedures (Chapter 3)
Nowadays, more and more decision procedures are supported or even guided by automated processes. An important technique in this automation is data mining. In this chapter we study how such automatically generated decision support models may exhibit discriminatory behavior towards certain groups based upon, e.g., gender or ethnicity. Surprisingly, such behavior may even be observed when sensitive information is removed or suppressed and the whole procedure is guided by neutral arguments such as predictive accuracy only. The reason for this phenomenon is that most data mining methods are based upon assumptions that are not always satisfied in reality, namely, that the data is correct and represents the population well. In this chapter we discuss the implicit modeling assumptions made by most data mining algorithms and show situations in which they are not satisfied. Then we outline three realistic scenarios in which an unbiased process can lead to discriminatory models. The effects of the implicit assumptions not being fulfilled are illustrated by examples. The chapter concludes with an outline of the main challenges and problems to be solved
Techniques for discrimination-free predictive models
In this chapter, we give an overview of the techniques developed ourselves for constructing discrimination-free classifiers. In discrimination-free classification the goal is to learn a predictive model that classifies future data objects as accurately as possible, yet the predicted labels should be uncorrelated to a given sensitive attribute. For example, the task could be to learn a gender-neutral model that predicts whether a potential client of a bank has a high income or not. The techniques we developed for discrimination-aware classification can be divided into three categories: (1) removing the discrimination directly from the historical dataset before an off-the-shelf classification technique is applied; (2) changing the learning procedures themselves by restricting the search space to non-discriminatory models; and (3) adjusting the discriminatory models, learnt by off-the-shelf classifiers on discriminatory historical data, in a post-processing phase. Experiments show that even with such a strong constraint as discrimination-freeness, still very accurate models can be learnt. In particular,we study a case of income prediction,where the available historical data exhibits a wage gap between the genders. Due to legal restrictions, however, our predictions should be gender-neutral. The discrimination-aware techniques succeed in significantly reducing gender discrimination without impairing too much the accuracy