130 research outputs found
Learning Fair Naive Bayes Classifiers by Discovering and Eliminating Discrimination Patterns
As machine learning is increasingly used to make real-world decisions, recent
research efforts aim to define and ensure fairness in algorithmic decision
making. Existing methods often assume a fixed set of observable features to
define individuals, but lack a discussion of certain features not being
observed at test time. In this paper, we study fairness of naive Bayes
classifiers, which allow partial observations. In particular, we introduce the
notion of a discrimination pattern, which refers to an individual receiving
different classifications depending on whether some sensitive attributes were
observed. Then a model is considered fair if it has no such pattern. We propose
an algorithm to discover and mine for discrimination patterns in a naive Bayes
classifier, and show how to learn maximum likelihood parameters subject to
these fairness constraints. Our approach iteratively discovers and eliminates
discrimination patterns until a fair model is learned. An empirical evaluation
on three real-world datasets demonstrates that we can remove exponentially many
discrimination patterns by only adding a small fraction of them as constraints
How are you doing? : emotions and personality in Facebook
User generated content on social media sites is a rich source of information about latent variables of their users. Proper mining of this content provides a shortcut to emotion and personality detection of users without filling out questionnaires. This in turn increases the application potential of personalized services that rely on the knowledge of such latent variables. In this paper we contribute to this emerging domain by studying the relation between emotions expressed in approximately 1 million Facebook (FB) status updates and the users' age, gender and personality. Additionally, we investigate the relations between emotion expression and the time when the status updates were posted. In particular, we find that female users are more emotional in their status posts than male users. In addition, we find a relation between age and sharing of emotions. Older FB users share their feelings more often than young users. In terms of seasons, people post about emotions less frequently in summer. On the other hand, December is a time when people are more likely to share their positive feelings with their friends. We also examine the relation between users' personality and their posts. We find that users who have an open personality express their emotions more frequently, while neurotic users are more reserved to share their feelings
VirtualIdentity : privacy preserving user profiling
User profiling from user generated content (UGC) is a common practice that supports the business models of many social media companies. Existing systems require that the UGC is fully exposed to the module that constructs the user profiles. In this paper we show that it is possible to build user profiles without ever accessing the user's original data, and without exposing the trained machine learning models for user profiling - which are the intellectual property of the company - to the users of the social media site. We present VirtualIdentity, an application that uses secure multi-party cryptographic protocols to detect the age, gender and personality traits of users by classifying their user-generated text and personal pictures with trained support vector machine models in a privacy preserving manner
Scalable adaptive label propagation in Grappa
Nodes of a social graph often represent entities with specific labels, denoting properties such as age-group or gender. Design of algorithms to assign labels to unlabeled nodes by leveraging node-proximity and a-priori labels of seed nodes is of significant interest. A semi-supervised approach to solve this problem is termed ``LPA-Label Propagation Algorithm'' where labels of a subset of nodes are iteratively propagated through the network to infer yet unknown node labels. While LPA for node labelling is extremely fast and simple, it works well only with an assumption of node-homophily -- connected nodes are connected because they must deserve a similar label -- which can often be a misnomer. In this paper we propose a novel algorithm ``Adaptive Label Propagation'' that dynamically adapts to the underlying characteristics of homophily, heterophily, or otherwise, of the connections of the network, and applies suitable label propagation strategies accordingly. Moreover, our adaptive label propagation approach is scalable as demonstrated by its implementation in Grappa, a distributed shared-memory system. Our experiments on social graphs from Facebook, YouTube, Live Journal, Orkut and Netlog demonstrate that our approach not only improves the labelling accuracy but also computes results for millions of users within a few seconds
Tidying Up the Conversational Recommender Systems' Biases
The growing popularity of language models has sparked interest in
conversational recommender systems (CRS) within both industry and research
circles. However, concerns regarding biases in these systems have emerged.
While individual components of CRS have been subject to bias studies, a
literature gap remains in understanding specific biases unique to CRS and how
these biases may be amplified or reduced when integrated into complex CRS
models. In this paper, we provide a concise review of biases in CRS by
surveying recent literature. We examine the presence of biases throughout the
system's pipeline and consider the challenges that arise from combining
multiple models. Our study investigates biases in classic recommender systems
and their relevance to CRS. Moreover, we address specific biases in CRS,
considering variations with and without natural language understanding
capabilities, along with biases related to dialogue systems and language
models. Through our findings, we highlight the necessity of adopting a holistic
perspective when dealing with biases in complex CRS models
Advancing Cultural Inclusivity: Optimizing Embedding Spaces for Balanced Music Recommendations
Popularity bias in music recommendation systems -- where artists and tracks
with the highest listen counts are recommended more often -- can also propagate
biases along demographic and cultural axes. In this work, we identify these
biases in recommendations for artists from underrepresented cultural groups in
prototype-based matrix factorization methods. Unlike traditional matrix
factorization methods, prototype-based approaches are interpretable. This
allows us to directly link the observed bias in recommendations for minority
artists (the effect) to specific properties of the embedding space (the cause).
We mitigate popularity bias in music recommendation through capturing both
users' and songs' cultural nuances in the embedding space. To address these
challenges while maintaining recommendation quality, we propose two novel
enhancements to the embedding space: i) we propose an approach to filter-out
the irrelevant prototypes used to represent each user and item to improve
generalizability, and ii) we introduce regularization techniques to reinforce a
more uniform distribution of prototypes within the embedding space. Our results
demonstrate significant improvements in reducing popularity bias and enhancing
demographic and cultural fairness in music recommendations while achieving
competitive -- if not better -- overall performance
Position: Cracking the Code of Cascading Disparity Towards Marginalized Communities
The rise of foundation models holds immense promise for advancing AI, but
this progress may amplify existing risks and inequalities, leaving marginalized
communities behind. In this position paper, we discuss that disparities towards
marginalized communities - performance, representation, privacy, robustness,
interpretability and safety - are not isolated concerns but rather
interconnected elements of a cascading disparity phenomenon. We contrast
foundation models with traditional models and highlight the potential for
exacerbated disparity against marginalized communities. Moreover, we emphasize
the unique threat of cascading impacts in foundation models, where
interconnected disparities can trigger long-lasting negative consequences,
specifically to the people on the margin. We define marginalized communities
within the machine learning context and explore the multifaceted nature of
disparities. We analyze the sources of these disparities, tracing them from
data creation, training and deployment procedures to highlight the complex
technical and socio-technical landscape. To mitigate the pressing crisis, we
conclude with a set of calls to action to mitigate disparity at its source.Comment: 14 pages, 1 figur
Promoting Fair Vaccination Strategies Through Influence Maximization: A Case Study on COVID-19 Spread
The aftermath of the Covid-19 pandemic saw more severe outcomes for racial
minority groups and economically-deprived communities. Such disparities can be
explained by several factors, including unequal access to healthcare, as well
as the inability of low income groups to reduce their mobility due to work or
social obligations. Moreover, senior citizens were found to be more susceptible
to severe symptoms, largely due to age-related health reasons. Adapting vaccine
distribution strategies to consider a range of demographics is therefore
essential to address these disparities. In this study, we propose a novel
approach that utilizes influence maximization (IM) on mobility networks to
develop vaccination strategies which incorporate demographic fairness. By
considering factors such as race, social status, age, and associated risk
factors, we aim to optimize vaccine distribution to achieve various fairness
definitions for one or more protected attributes at a time. Through extensive
experiments conducted on Covid-19 spread in three major metropolitan areas
across the United States, we demonstrate the effectiveness of our proposed
approach in reducing disease transmission and promoting fairness in vaccination
distribution.Comment: 10 pages, 4 figure
- …
