4 research outputs found
Partial Bandit and Semi-Bandit: Making the Most Out of Scarce Users' Feedback
Recent works on Multi-Armed Bandits (MAB) and Combinatorial Multi-Armed
Bandits (COM-MAB) show good results on a global accuracy metric. This can be
achieved, in the case of recommender systems, with personalization. However,
with a combinatorial online learning approach, personalization implies a large
amount of user feedbacks. Such feedbacks can be hard to acquire when users need
to be directly and frequently solicited. For a number of fields of activities
undergoing the digitization of their business, online learning is unavoidable.
Thus, a number of approaches allowing implicit user feedback retrieval have
been implemented. Nevertheless, this implicit feedback can be misleading or
inefficient for the agent's learning. Herein, we propose a novel approach
reducing the number of explicit feedbacks required by Combinatorial Multi Armed
bandit (COM-MAB) algorithms while providing similar levels of global accuracy
and learning efficiency to classical competitive methods. In this paper we
present a novel approach for considering user feedback and evaluate it using
three distinct strategies. Despite a limited number of feedbacks returned by
users (as low as 20% of the total), our approach obtains similar results to
those of state of the art approaches
Ballooning Multi-Armed Bandits
In this paper, we introduce Ballooning Multi-Armed Bandits (BL-MAB), a novel
extension of the classical stochastic MAB model. In the BL-MAB model, the set
of available arms grows (or balloons) over time. In contrast to the classical
MAB setting where the regret is computed with respect to the best arm overall,
the regret in a BL-MAB setting is computed with respect to the best available
arm at each time. We first observe that the existing stochastic MAB algorithms
result in linear regret for the BL-MAB model. We prove that, if the best arm is
equally likely to arrive at any time instant, a sub-linear regret cannot be
achieved. Next, we show that if the best arm is more likely to arrive in the
early rounds, one can achieve sub-linear regret. Our proposed algorithm
determines (1) the fraction of the time horizon for which the newly arriving
arms should be explored and (2) the sequence of arm pulls in the exploitation
phase from among the explored arms. Making reasonable assumptions on the
arrival distribution of the best arm in terms of the thinness of the
distribution's tail, we prove that the proposed algorithm achieves sub-linear
instance-independent regret. We further quantify explicit dependence of regret
on the arrival distribution parameters. We reinforce our theoretical findings
with extensive simulation results. We conclude by showing that our algorithm
would achieve sub-linear regret even if (a) the distributional parameters are
not exactly known, but are obtained using a reasonable learning mechanism or
(b) the best arm is not more likely to arrive early, but a large fraction of
arms is likely to arrive relatively early.Comment: A full version of this paper is accepted in the Journal of Artificial
Intelligence (AIJ) of Elsevier. A preliminary version is published as an
extended abstract in AAMAS 2020. Proceedings of the 19th International
Conference on Autonomous Agents and MultiAgent Systems. 202
Human-Centered Machine Learning: Algorithm Design and Human Behavior
Machine learning is increasingly engaged in a large number of important daily decisions and has great potential to reshape various sectors of our modern society. To fully realize this potential, it is important to understand the role that humans play in the design of machine learning algorithms and investigate the impacts of the algorithm on humans.
Towards the understanding of such interactions between humans and algorithms, this dissertation takes a human-centric perspective and focuses on investigating the interplay between human behavior and algorithm design. Accounting for the roles of humans in algorithm design creates unique challenges. For example, humans might be strategic or exhibit behavioral biases when generating data or responding to algorithms, violating the standard independence assumption in algorithm design. How do we design algorithms that take such human behavior into account? Moreover, humans possess various ethical values, e.g., humans want to be treated fairly and care about privacy. How do we design algorithms that align with human values? My dissertation addresses these challenges by combining both theoretical and empirical approaches. From the theoretical perspective, we explore how to design algorithms that account for human behavior and respect human values. In particular, we formulate models of human behavior in the data generation process and design algorithms that can leverage data with human biases. Moreover, we investigate the long-term impacts of algorithm decisions and design algorithms that mitigate the reinforcement of existing inequalities. From the empirical perspective, we have conducted behavioral experiments to understand human behavior in the context of data generation and information design. We have further developed more realistic human models based on empirical data and studied the algorithm design building on the updated behavior models
Exploring Diversity and Fairness in Machine Learning
With algorithms, artificial intelligence, and machine learning becoming ubiquitous in our society, we need to start thinking about the implications and ethical concerns of new machine learning models. In fact, two types of biases that impact machine learning models are social injustice bias (bias created by society) and measurement bias (bias created by unbalanced sampling). Biases against groups of individuals found in machine learning models can be mitigated through the use of diversity and fairness constraints. This dissertation introduces models to help humans make decisions by enforcing diversity and fairness constraints.
This work starts with a call to action. Bias is rife in hiring, and since algorithms are being used in multiple companies to filter applicants, we need to pay special attention to this application. Inspired by this hiring application, I introduce new multi-armed bandit frameworks to help assign human resources in the hiring process while enforcing diversity through a submodular utility function. These frameworks increase diversity while using less resources compared to original admission decisions of the Computer Science graduate program at the University of Maryland. Moving outside of hiring I present a contextual multi-armed bandit algorithm that enforces group fairness by learning a societal bias term and correcting for it. This algorithm is tested on two real world datasets and shows marked improvement over other in-use algorithms. Additionally I take a look at fairness in traditional machine learning domain adaptation. I provide the first theoretical analysis of this setting and test the resulting model on two deal world datasets. Finally I explore extensions to my core work, delving into suicidality, comprehension of fairness definitions, and student evaluations