8,239 research outputs found
Explaining classification performance and bias via network structure and sampling technique
Social networks are very important carriers of information. For instance, the political leaning of our friends can serve as a proxy to identify our own political preferences. This explanatory power is leveraged in many scenarios ranging from business decision-making to scientific research to infer missing attributes using machine learning. However, factors affecting the performance and the direction of bias of these algorithms are not well understood. To this end, we systematically study how structural properties of the network and the training sample influence the results of collective classification. Our main findings show that (i) mean classification performance can empirically and analytically be predicted by structural properties such as homophily, class balance, edge density and sample size, (ii) small training samples are enough for heterophilic networks to achieve high and unbiased classification performance, even with imperfect model estimates, (iii) homophilic networks are more prone to bias issues and low performance when group size differences increase, (iv) when sampling budgets are small, partial crawls achieve the most accurate model estimates, and degree sampling achieves the highest overall performance. Our findings help practitioners to better understand and evaluate their results when sampling budgets are small or when no ground-truth is available
Towards Assumption-free Bias Mitigation
Despite the impressive prediction ability, machine learning models show
discrimination towards certain demographics and suffer from unfair prediction
behaviors. To alleviate the discrimination, extensive studies focus on
eliminating the unequal distribution of sensitive attributes via multiple
approaches. However, due to privacy concerns, sensitive attributes are often
either unavailable or missing in real-world scenarios. Therefore, several
existing works alleviate the bias without sensitive attributes. Those studies
face challenges, either in inaccurate predictions of sensitive attributes or
the need to mitigate unequal distribution of manually defined non-sensitive
attributes related to bias. The latter requires strong assumptions about the
correlation between sensitive and non-sensitive attributes. As data
distribution and task goals vary, the strong assumption on non-sensitive
attributes may not be valid and require domain expertise. In this work, we
propose an assumption-free framework to detect the related attributes
automatically by modeling feature interaction for bias mitigation. The proposed
framework aims to mitigate the unfair impact of identified biased feature
interactions. Experimental results on four real-world datasets demonstrate that
our proposed framework can significantly alleviate unfair prediction behaviors
by considering biased feature interactions
- …