4,092 research outputs found
A KDD process for discrimination discovery
The acceptance of analytical methods for discrimination discovery by practitioners and legal scholars can be only achieved if the data mining and machine learning communities will be able to provide case studies, methodological refinements, and the consolidation of a KDD process. We summarize here an approach along these directions
On Discrimination Discovery and Removal in Ranked Data using Causal Graph
Predictive models learned from historical data are widely used to help
companies and organizations make decisions. However, they may digitally
unfairly treat unwanted groups, raising concerns about fairness and
discrimination. In this paper, we study the fairness-aware ranking problem
which aims to discover discrimination in ranked datasets and reconstruct the
fair ranking. Existing methods in fairness-aware ranking are mainly based on
statistical parity that cannot measure the true discriminatory effect since
discrimination is causal. On the other hand, existing methods in causal-based
anti-discrimination learning focus on classification problems and cannot be
directly applied to handle the ranked data. To address these limitations, we
propose to map the rank position to a continuous score variable that represents
the qualification of the candidates. Then, we build a causal graph that
consists of both the discrete profile attributes and the continuous score. The
path-specific effect technique is extended to the mixed-variable causal graph
to identify both direct and indirect discrimination. The relationship between
the path-specific effects for the ranked data and those for the binary decision
is theoretically analyzed. Finally, algorithms for discovering and removing
discrimination from a ranked dataset are developed. Experiments using the real
dataset show the effectiveness of our approaches.Comment: 9 page
Fairness Beyond Disparate Treatment & Disparate Impact: Learning Classification without Disparate Mistreatment
Automated data-driven decision making systems are increasingly being used to
assist, or even replace humans in many settings. These systems function by
learning from historical decisions, often taken by humans. In order to maximize
the utility of these systems (or, classifiers), their training involves
minimizing the errors (or, misclassifications) over the given historical data.
However, it is quite possible that the optimally trained classifier makes
decisions for people belonging to different social groups with different
misclassification rates (e.g., misclassification rates for females are higher
than for males), thereby placing these groups at an unfair disadvantage. To
account for and avoid such unfairness, in this paper, we introduce a new notion
of unfairness, disparate mistreatment, which is defined in terms of
misclassification rates. We then propose intuitive measures of disparate
mistreatment for decision boundary-based classifiers, which can be easily
incorporated into their formulation as convex-concave constraints. Experiments
on synthetic as well as real world datasets show that our methodology is
effective at avoiding disparate mistreatment, often at a small cost in terms of
accuracy.Comment: To appear in Proceedings of the 26th International World Wide Web
Conference (WWW), 2017. Code available at:
https://github.com/mbilalzafar/fair-classificatio
Virtual Astronomy, Information Technology, and the New Scientific Methodology
All sciences, including astronomy, are now entering the era of information abundance. The exponentially increasing volume and complexity of modern data sets promises to transform the scientific practice, but also poses a number of common technological challenges. The Virtual Observatory concept is the astronomical community's response to these challenges: it aims to harness the progress in information technology in the service of astronomy, and at the same time provide a valuable testbed for information technology and applied computer science. Challenges broadly fall into two categories: data handling (or "data farming"), including issues such as archives, intelligent storage, databases, interoperability, fast networks, etc., and data mining, data understanding, and knowledge discovery, which include issues such as automated clustering and classification, multivariate correlation searches, pattern recognition, visualization in highly hyperdimensional parameter spaces, etc., as well as various applications of machine learning in these contexts. Such techniques are forming a methodological foundation for science with massive and complex data sets in general, and are likely to have a much broather impact on the modern society, commerce, information economy, security, etc. There is a powerful emerging synergy between the
computationally enabled science and the science-driven computing, which will drive the progress in science, scholarship, and many other venues in the 21st century
- …