Search CORE

30 research outputs found

Text Classification in an Under-Resourced Language via Lexical Normalization and Feature Pooling

Author: Elahi Inam
Ijaz Ahsan
Kamiran Faisal
Karim Asim
Sohail Omayya
Publication venue: AIS Electronic Library (AISeL)
Publication date: 26/06/2018
Field of study

Automatic classification of textual content in an under-resourced language is challenging, since lexical resources and preprocessing tools are not available for such languages. Their bag-of-words (BoW) representation is usually highly sparse and noisy, and text classification built on such a representation yields poor performance. In this paper, we explore the effectiveness of lexical normalization of terms and statistical feature pooling for improving text classification in an under-resourced language. We focus on classifying citizen feedback on government services provided through SMS texts which are written predominantly in Roman Urdu (an informal forward transliterated version of the Urdu language). Our proposed methodology performs normalization of lexical variations of terms using phonetic and string similarity. It subsequently employs a supervised feature extraction technique to obtain category-specific highly discriminating features. Our experiments with classifiers reveal that significant improvement in classification performance is achieved by lexical normalization plus feature pooling over standard representations

AIS Electronic Library (AISeL)

On Discrimination Discovery and Removal in Ranked Data using Causal Graph

Author: Andersen M S
Avin Chen
Bhattacharyya A.
Bradley Ralph Allan
Calmon Flavio
Kalisch Markus
Kamiran Faisal
Nabi Razieh
Zhang Lu
Publication venue
Publication date: 01/01/2018
Field of study

Predictive models learned from historical data are widely used to help companies and organizations make decisions. However, they may digitally unfairly treat unwanted groups, raising concerns about fairness and discrimination. In this paper, we study the fairness-aware ranking problem which aims to discover discrimination in ranked datasets and reconstruct the fair ranking. Existing methods in fairness-aware ranking are mainly based on statistical parity that cannot measure the true discriminatory effect since discrimination is causal. On the other hand, existing methods in causal-based anti-discrimination learning focus on classification problems and cannot be directly applied to handle the ranked data. To address these limitations, we propose to map the rank position to a continuous score variable that represents the qualification of the candidates. Then, we build a causal graph that consists of both the discrete profile attributes and the continuous score. The path-specific effect technique is extended to the mixed-variable causal graph to identify both direct and indirect discrimination. The relationship between the path-specific effects for the ranked data and those for the binary decision is theoretically analyzed. Finally, algorithms for discovering and removing discrimination from a ranked dataset are developed. Experiments using the real dataset show the effectiveness of our approaches.Comment: 9 page

arXiv.org e-Print Archive

ScholarWorks@UARK

Crossref

UARK (University of Arkansas )

AdaFair: Cumulative Fairness Adaptive Boosting

Author: Agarwal Alekh
Ingold David
Kamiran Faisal
Schapire Robert E
Schapire Robert E
Zhang Wenbin
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 17/09/2019
Field of study

The widespread use of ML-based decision making in domains with high societal impact such as recidivism, job hiring and loan credit has raised a lot of concerns regarding potential discrimination. In particular, in certain cases it has been observed that ML algorithms can provide different decisions based on sensitive attributes such as gender or race and therefore can lead to discrimination. Although, several fairness-aware ML approaches have been proposed, their focus has been largely on preserving the overall classification accuracy while improving fairness in predictions for both protected and non-protected groups (defined based on the sensitive attribute(s)). The overall accuracy however is not a good indicator of performance in case of class imbalance, as it is biased towards the majority class. As we will see in our experiments, many of the fairness-related datasets suffer from class imbalance and therefore, tackling fairness requires also tackling the imbalance problem. To this end, we propose AdaFair, a fairness-aware classifier based on AdaBoost that further updates the weights of the instances in each boosting round taking into account a cumulative notion of fairness based upon all current ensemble members, while explicitly tackling class-imbalance by optimizing the number of ensemble members for balanced classification error. Our experiments show that our approach can achieve parity in true positive and true negative rates for both protected and non-protected groups, while it significantly outperforms existing fairness-aware methods up to 25% in terms of balanced error.Comment: 10 pages, to appear in proceedings of the 28th ACM International Conference on Information and Knowledge Management (CIKM

arXiv.org e-Print Archive

Crossref