30 research outputs found
Text Classification in an Under-Resourced Language via Lexical Normalization and Feature Pooling
Automatic classification of textual content in an under-resourced language is challenging, since lexical resources and preprocessing tools are not available for such languages. Their bag-of-words (BoW) representation is usually highly sparse and noisy, and text classification built on such a representation yields poor performance. In this paper, we explore the effectiveness of lexical normalization of terms and statistical feature pooling for improving text classification in an under-resourced language. We focus on classifying citizen feedback on government services provided through SMS texts which are written predominantly in Roman Urdu (an informal forward transliterated version of the Urdu language). Our proposed methodology performs normalization of lexical variations of terms using phonetic and string similarity. It subsequently employs a supervised feature extraction technique to obtain category-specific highly discriminating features. Our experiments with classifiers reveal that significant improvement in classification performance is achieved by lexical normalization plus feature pooling over standard representations
On Discrimination Discovery and Removal in Ranked Data using Causal Graph
Predictive models learned from historical data are widely used to help
companies and organizations make decisions. However, they may digitally
unfairly treat unwanted groups, raising concerns about fairness and
discrimination. In this paper, we study the fairness-aware ranking problem
which aims to discover discrimination in ranked datasets and reconstruct the
fair ranking. Existing methods in fairness-aware ranking are mainly based on
statistical parity that cannot measure the true discriminatory effect since
discrimination is causal. On the other hand, existing methods in causal-based
anti-discrimination learning focus on classification problems and cannot be
directly applied to handle the ranked data. To address these limitations, we
propose to map the rank position to a continuous score variable that represents
the qualification of the candidates. Then, we build a causal graph that
consists of both the discrete profile attributes and the continuous score. The
path-specific effect technique is extended to the mixed-variable causal graph
to identify both direct and indirect discrimination. The relationship between
the path-specific effects for the ranked data and those for the binary decision
is theoretically analyzed. Finally, algorithms for discovering and removing
discrimination from a ranked dataset are developed. Experiments using the real
dataset show the effectiveness of our approaches.Comment: 9 page
AdaFair: Cumulative Fairness Adaptive Boosting
The widespread use of ML-based decision making in domains with high societal
impact such as recidivism, job hiring and loan credit has raised a lot of
concerns regarding potential discrimination. In particular, in certain cases it
has been observed that ML algorithms can provide different decisions based on
sensitive attributes such as gender or race and therefore can lead to
discrimination. Although, several fairness-aware ML approaches have been
proposed, their focus has been largely on preserving the overall classification
accuracy while improving fairness in predictions for both protected and
non-protected groups (defined based on the sensitive attribute(s)). The overall
accuracy however is not a good indicator of performance in case of class
imbalance, as it is biased towards the majority class. As we will see in our
experiments, many of the fairness-related datasets suffer from class imbalance
and therefore, tackling fairness requires also tackling the imbalance problem.
To this end, we propose AdaFair, a fairness-aware classifier based on
AdaBoost that further updates the weights of the instances in each boosting
round taking into account a cumulative notion of fairness based upon all
current ensemble members, while explicitly tackling class-imbalance by
optimizing the number of ensemble members for balanced classification error.
Our experiments show that our approach can achieve parity in true positive and
true negative rates for both protected and non-protected groups, while it
significantly outperforms existing fairness-aware methods up to 25% in terms of
balanced error.Comment: 10 pages, to appear in proceedings of the 28th ACM International
Conference on Information and Knowledge Management (CIKM
Data preprocessing techniques for classification without discrimination
info:eu-repo/semantics/publishe
Techniques for Discrimination-Free Predictive Models
info:eu-repo/semantics/publishe