51,437 research outputs found
Fairness-aware Machine Learning in Educational Data Mining
Fairness is an essential requirement of every educational system, which is reflected in a variety of educational activities. With the extensive use of Artificial Intelligence (AI) and Machine Learning (ML) techniques in education, researchers and educators can analyze educational (big) data and propose new (technical) methods in order to support teachers, students, or administrators of (online) learning systems in the organization of teaching and learning. Educational data mining (EDM) is the result of the application and development of data mining (DM), and ML techniques to deal with educational problems, such as student performance prediction and student grouping. However, ML-based decisions in education can be based on protected attributes, such as race or gender, leading to discrimination of individual students or subgroups of students. Therefore, ensuring fairness in ML models also contributes to equity in educational systems. On the other hand, bias can also appear in the data obtained from learning environments. Hence, bias-aware exploratory educational data analysis is important to support unbiased decision-making in EDM.
In this thesis, we address the aforementioned issues and propose methods that mitigate discriminatory outcomes of ML algorithms in EDM tasks. Specifically, we make the following contributions:
We perform bias-aware exploratory analysis of educational datasets using Bayesian networks to identify the relationships among attributes in order to understand bias in the datasets. We focus the exploratory data analysis on features having a direct or indirect relationship with the protected attributes w.r.t. prediction outcomes.
We perform a comprehensive evaluation of the sufficiency of various group fairness measures in predictive models for student performance prediction problems. A variety of experiments on various educational datasets with different fairness measures are performed to provide users with a broad view of unfairness from diverse aspects.
We deal with the student grouping problem in collaborative learning. We introduce the fair-capacitated clustering problem that takes into account cluster fairness and cluster cardinalities. We propose two approaches, namely hierarchical clustering and partitioning-based clustering, to obtain fair-capacitated clustering.
We introduce the multi-fair capacitated (MFC) students-topics grouping problem that satisfies students' preferences while ensuring balanced group cardinalities and maximizing the diversity of members regarding the protected attribute. We propose three approaches: a greedy heuristic approach, a knapsack-based approach using vanilla maximal 0-1 knapsack formulation, and an MFC knapsack approach based on group fairness knapsack formulation.
In short, the findings described in this thesis demonstrate the importance of fairness-aware ML in educational settings. We show that bias-aware data analysis, fairness measures, and fairness-aware ML models are essential aspects to ensure fairness in EDM and the educational environment.Ministry of Science and Culture of Lower Saxony/LernMINT/51410078/E
Model-based and actual independence for fairness-aware classification
The goal of fairness-aware classification is to categorize data while taking into account potential issues of fairness, discrimination, neutrality, and/or independence. For example, when applying data mining technologies to university admissions, admission criteria must be non-discriminatory and fair with regard to sensitive features, such as gender or race. In this context, such fairness can be formalized as statistical independence between classification results and sensitive features. The main purpose of this paper is to analyze this formal fairness in order to achieve better trade-offs between fairness and prediction accuracy, which is important for applying fairness-aware classifiers in practical use. We focus on a fairness-aware classifier, Calders and Verwer’s two-naive-Bayes (CV2NB) method, which has been shown to be superior to other classifiers in terms of fairness. We hypothesize that this superiority is due to the difference in types of independence. That is, because CV2NB achieves actual independence, rather than satisfying model-based independence like the other classifiers, it can account for model bias and a deterministic decision rule. We empirically validate this hypothesis by modifying two fairness-aware classifiers, a prejudice remover method and a reject option-based classification (ROC) method, so as to satisfy actual independence. The fairness of these two modified methods was drastically improved, showing the importance of maintaining actual independence, rather than model-based independence. We additionally extend an approach adopted in the ROC method so as to make it applicable to classifiers other than those with generative models, such as SVMs
Discrete Methods in Statistics: Feature Selection and Fairness-Aware Data Mining
This dissertation is a detailed investigation of issues that arise in models
that change discretely. Models are often constructed by either including or
excluding features based on some criteria. These discrete changes are
challenging to analyze due to correlation between features. Feature selection
is the problem of identifying an appropriate set of features to include in a
model, while fairness-aware data mining is the problem of needing to remove
the \emph{influence} of protected features from a model. This dissertation
provides frameworks for understanding each problem and algorithms for
accomplishing the desired goal.
The feature selection problem is addressed through the framework of sequential
hypothesis testing. We elucidate the statistical challenges in repeatedly using
inference in this domain and demonstrate how current methods fail to address
them. Our algorithms build on classically motivated, multiple testing procedures
to control measures of false rejections when using hypothesis testing during
forward stepwise regression. Furthermore, these methods have much higher power
than recent proposals from the conditional inference literature.
The fairness-aware data mining community is grappling with fundamental
questions concerning fairness in statistical modeling. Tension exists between
identifying explainable differences between groups and discriminatory ones. We
provide a framework for understanding the connections between fairness and
the use of protected information in modeling. With this discussion in hand,
generating fair estimates is straight-forward
Impartial Predictive Modeling: Ensuring Fairness in Arbitrary Models
Fairness aware data mining aims to prevent algorithms from discriminating against protected groups. The literature has come to an impasse as to what constitutes explainable variability as opposed to discrimination. This stems from incomplete discussions of fairness in statistics. We demonstrate that fairness is achieved by ensuring impartiality with respect to sensitive characteristics. As these characteristics are determined outside of the model, the correct description of the statistical task is to ensure impartiality. We provide a framework for impartiality by accounting for different perspectives on the data generating process. This framework yields a set of impartial estimates that are applicable in a wide variety of situations and post-processing tools to correct estimates from arbitrary models. This effectively separates prediction and fairness goals, allowing modelers to focus on generating highly predictive models without incorporating the constraint of fairness
Fairly Adaptive Negative Sampling for Recommendations
Pairwise learning strategies are prevalent for optimizing recommendation
models on implicit feedback data, which usually learns user preference by
discriminating between positive (i.e., clicked by a user) and negative items
(i.e., obtained by negative sampling). However, the size of different item
groups (specified by item attribute) is usually unevenly distributed. We
empirically find that the commonly used uniform negative sampling strategy for
pairwise algorithms (e.g., BPR) can inherit such data bias and oversample the
majority item group as negative instances, severely countering group fairness
on the item side. In this paper, we propose a Fairly adaptive Negative sampling
approach (FairNeg), which improves item group fairness via adaptively adjusting
the group-level negative sampling distribution in the training process. In
particular, it first perceives the model's unfairness status at each step and
then adjusts the group-wise sampling distribution with an adaptive momentum
update strategy for better facilitating fairness optimization. Moreover, a
negative sampling distribution Mixup mechanism is proposed, which gracefully
incorporates existing importance-aware sampling techniques intended for mining
informative negative samples, thus allowing for achieving multiple optimization
purposes. Extensive experiments on four public datasets show our proposed
method's superiority in group fairness enhancement and fairness-utility
tradeoff.Comment: Accepted by TheWebConf202
Discrimination and Class Imbalance Aware Online Naive Bayes
Fairness-aware mining of massive data streams is a growing and challenging
concern in the contemporary domain of machine learning. Many stream learning
algorithms are used to replace humans at critical decision-making points e.g.,
hiring staff, assessing credit risk, etc. This calls for handling massive
incoming information with minimum response delay while ensuring fair and high
quality decisions. Recent discrimination-aware learning methods are optimized
based on overall accuracy. However, the overall accuracy is biased in favor of
the majority class; therefore, state-of-the-art methods mainly diminish
discrimination by partially or completely ignoring the minority class. In this
context, we propose a novel adaptation of Na\"ive Bayes to mitigate
discrimination embedded in the streams while maintaining high predictive
performance for both the majority and minority classes. Our proposed algorithm
is simple, fast, and attains multi-objective optimization goals. To handle
class imbalance and concept drifts, a dynamic instance weighting module is
proposed, which gives more importance to recent instances and less importance
to obsolete instances based on their membership in minority or majority class.
We conducted experiments on a range of streaming and static datasets and
deduced that our proposed methodology outperforms existing state-of-the-art
fairness-aware methods in terms of both discrimination score and balanced
accuracy
- …