Search CORE

826 research outputs found

Auditing Black-Box Prediction Models for Data Minimization Compliance

Author: Crovella M.
Gummadi K.
Rastegarpanah B.
Publication venue
Publication date: 01/01/2021
Field of study

Diverse Conventions for Human-AI Collaboration

Author: Sadigh Dorsa
Sarkar Bidipta
Shih Andy
Publication venue
Publication date: 23/10/2023
Field of study

Conventions are crucial for strong performance in cooperative multi-agent games, because they allow players to coordinate on a shared strategy without explicit communication. Unfortunately, standard multi-agent reinforcement learning techniques, such as self-play, converge to conventions that are arbitrary and non-diverse, leading to poor generalization when interacting with new partners. In this work, we present a technique for generating diverse conventions by (1) maximizing their rewards during self-play, while (2) minimizing their rewards when playing with previously discovered conventions (cross-play), stimulating conventions to be semantically different. To ensure that learned policies act in good faith despite the adversarial optimization of cross-play, we introduce \emph{mixed-play}, where an initial state is randomly generated by sampling self-play and cross-play transitions and the player learns to maximize the self-play reward from this initial state. We analyze the benefits of our technique on various multi-agent collaborative games, including Overcooked, and find that our technique can adapt to the conventions of humans, surpassing human-level performance when paired with real users.Comment: 25 pages, 9 figures, 37th Conference on Neural Information Processing Systems (NeurIPS 2023

arXiv.org e-Print Archive

Recommended from our members

Optimization of Item Selection with Prediction Uncertainty

Author: Dai Liang
Publication venue: eScholarship, University of California
Publication date: 01/01/2020
Field of study

Selecting items from a candidate pool to maximize the total return is a classical problem, which is faced by people frequently in real life and also engineers in information technology industry, e.g., digital advertising, e-commerce, web search, etc. For example, web UI designers always try to find the best web design among many candidates to display to users, Google needs to select personalized engaging ads to display to users based on their historical online behaviors. Each of these industries has hundreds of billions of dollars market, which means that even a small performance improvement of item selection efficiency can drive hundreds of millions of dollars growth in the real world. In these applications, the true value of each item is unknown and can only be estimated from observed historical data. There is a large volume of significant research about building prediction models which are trained on historical data to estimate the item values. Given data volume and computation resource restrictions, engineers choose different models, e.g., deep neutral network, gradient boosting tree, or logistic regression to solve the problems. We will not dive into this area too much in this dissertation. Instead, our focus is how to maximize the total return given these predictions, especially taking into account the prediction uncertainties for the value optimization.In the large-scale real applications, the candidate pool can be extraordinary large. It is infeasible to pick some items from the pool to get interactive feedback for exploration. Actually, not only is exploration infeasible, but even estimating the value of each item through a complex estimation mode is almost impossible due to the need of real-time response. For example, Apple needs to estimate users’ favorite apps and recommends them to users when they visit Apple store. Google needs to select ads to display to users given a users’ search queries. There are millions of candidates needing to be estimated from prediction models. It is very challenging to support such a large scale of model prediction under the low-latency constraint. Besides that, to have a good prediction accuracy, the models used in industry are getting more and more complex, e.g hidden neurons and layers of deep neural network increases rapidly in real applications, which also increases latency significantly. All of these make it infeasible to evaluate all candidates through one single complex model in large scale application. To solve this problem, engineers usually leverage the cascading waterfall filtering method to filter items sequentially, which means instead of using one complex model to estimate the values of all candidates, multiple stages are adopted to filter out candidates sequentially. For example, a simple model is used in the first stage to estimate candidates’ values for choosing a small subset from all candidates. These selected items are then passed to another stage to be estimated by a more complex model. Intuitively, this cascading waterfall filtering method provides a good trade-off between infrastructure cost and prediction accuracy, which can save computational resources use substantially, and simultaneously select most promising items accurately. However, there is no systematic study about how to efficiently choose the number of waterfalls and how many filtered items in each waterfall. Engineers tune the settings of this system heuristically through personal experience or online experiments, which is very inefficient, especially when the system is dynamic and changes rapidly. In this dissertation, we propose a theoretical framework for the cascading waterfall filtering problem and develop a mathematical algorithm to obtain the optimal solutions. Our method achieves a dramatic improvement in an important real-world application, which adopts cascading water filtering system to select a few items from tens of millions of candidates.There are also some cases in which the candidate pool is relatively small. For instance, the number of web UI candidates is usually less than one hundred. Then, we are able to explore during item selection process. A typical exploration case is online experimentation, which is widely used to test and select items in real applications. In this situation, we can get interactive feedback to evaluate items. Considering online experiments for example, we usually randomly segments users into several groups, show them different candidates, and then compare the overall performance of each candidate to find the item with the largest value. Among all designs, A/B testing, which usually segments users into two statasitically equivalent groups to measure the difference between two versions of a single variable, is the most popular. For instance, in order to compare the impact of an ad versus another, we need to see the impact of exposing a user to viewing the first ad, and not the second, and then compare with the converse situation. However, a user cannot both see the first ad and not see it. Consequently, we need to create two “statistically equivalent populations” and expose users randomly to one or the other. This method is straightforward. However, the defect of this method is also obvious: to measure both versions, this method cannot expose all users to the best version, which leads to potential value loss. Some multi-armed bandit algorithms, e.g., Randomized Probability Matching (RPM), Upper Confidence Bounds (UCB), whose objective is maximizing the total return in experiment, have been proposed for improvement. However, these methods do not take into account the statistical confidence levels of the final result from the experiment and the corresponding impact on the subsequent item selection in the post-experimental stage. To solve this problem, we develop algorithms to achieve a good trade-off between reducing statistical uncertainty and maximizing cumulative reward, which aims at maximizing the total expected reward of item selection over a total duration, which includes both the current experimental stage and the post-experimental stage. The proposed algorithms demonstrate consistent and statistically significant improvements across different settings, outperforming both A/B testing and multi-armed bandit algorithms significantly

eScholarship - University of California

How Technology Impacts and Compares to Humans in Socially Consequential Arenas

Author: Dooley Samuel
Publication venue
Publication date: 02/11/2022
Field of study

One of the main promises of technology development is for it to be adopted by people, organizations, societies, and governments -- incorporated into their life, work stream, or processes. Often, this is socially beneficial as it automates mundane tasks, frees up more time for other more important things, or otherwise improves the lives of those who use the technology. However, these beneficial results do not apply in every scenario and may not impact everyone in a system the same way. Sometimes a technology is developed which produces both benefits and inflicts some harm. These harms may come at a higher cost to some people than others, raising the question: {\it how are benefits and harms weighed when deciding if and how a socially consequential technology gets developed?} The most natural way to answer this question, and in fact how people first approach it, is to compare the new technology to what used to exist. As such, in this work, I make comparative analyses between humans and machines in three scenarios and seek to understand how sentiment about a technology, performance of that technology, and the impacts of that technology combine to influence how one decides to answer my main research question.Comment: Doctoral thesis proposal. arXiv admin note: substantial text overlap with arXiv:2110.08396, arXiv:2108.12508, arXiv:2006.1262

arXiv.org e-Print Archive

Multi-Objective Learning for Multi-Modal Natural Language Generation

Author: Pasunuru Ramakanth Reddy
Publication venue: University of North Carolina at Chapel Hill Graduate School
Publication date: 01/01/2021
Field of study

One of the important goals of Artificial Intelligence (AI) is to mimic the ability of humans to leverage the knowledge or skill from previously learned tasks to quickly learn a new task. For example, humans can reapply the learned skill of balancing the bicycle for learning to ride a motorbike. In a similar context, the field of Natural Language Processing (NLP) has several tasks including machine translation, textual summarization, image/video captioning, sentiment analysis, dialog systems, natural language inference, question answering, etc. While these different NLP tasks are often trained separately, leveraging the knowledge or skill from related tasks via joint training or training one task after another task in a sequential fashion, can have potential advantages. To this end, this dissertation explores various NLP tasks (especially multi-modal text generation and pair-wise classification tasks covering both natural language generation (NLG) and natural language understanding (NLU)) leveraging information from the related auxiliary tasks in an effective way via novel multi-objective learning strategies. These proposed novel learning strategies can be broadly classified into three paradigms: multi-task learning, multi-reward reinforcement learning, and continual learning. In multi-task learning, we mainly focus on intuitively finding what related auxiliary tasks can benefit the multi-modal video caption generation task and textual summarization task. We explore effective ways of sharing the parameters across these related tasks via joint training. In multi-reward reinforcement learning, we teach various skills to multi-modal text generation models in the form of rewards. For example, we try to teach the entailment skill to the video captioning model with entailment rewards. Further, we propose novel and effective ways of inducing multiple skills by `dynamically' choosing the auxiliary tasks (in MTL) or rewards (in RL) during the training in an automatic way using multi-armed bandits based approaches. Finally, in continual learning, we explore sharing of information across various tasks in a sequential way, where the model continually evolves during the sequential training without losing the performance on previously learned tasks. This kind of sharing allows the later tasks to benefit from previously trained tasks and vice-versa in some cases. For this, we propose a novel method that continually changes the model architecture to accommodate new tasks while retaining performance on old tasks. We empirically evaluate our method on three natural language inference tasks.Doctor of Philosoph

Carolina Digital Repository

Auditing black-box prediction models for data minimization compliance

Author: Crovella Mark
Gummadi Krishna P.
Rastegarpanah Bashir
Publication venue
Publication date: 01/01/2021
Field of study

In this paper, we focus on auditing black-box prediction models for compliance with the GDPR’s data minimization principle. This principle restricts prediction models to use the minimal information that is necessary for performing the task at hand. Given the challenge of the black-box setting, our key idea is to check if each of the prediction model’s input features is individually necessary by assigning it some constant value (i.e., applying a simple imputation) across all prediction instances, and measuring the extent to which the model outcomes would change. We introduce a metric for data minimization that is based on model instability under simple imputations. We extend the applicability of this metric from a finite sample model to a distributional setting by introducing a probabilistic data minimization guarantee, which we derive using a Bayesian approach. Furthermore, we address the auditing problem under a constraint on the number of queries to the prediction system. We formulate the problem of allocating a budget of system queries to feasible simple imputations (for investigating model instability) as a multi-armed bandit framework with probabilistic success metrics. We define two bandit problems for providing a probabilistic data minimization guarantee at a given confidence level: a decision problem given a data minimization level, and a measurement problem given a fixed query budget. We design efficient algorithms for these auditing problems using novel exploration strategies that expand classical bandit strategies. Our experiments with real-world prediction systems show that our auditing algorithms significantly outperform simpler benchmarks in both measurement and decision problems.Published versio

Boston University Institutional Repository (OpenBU)

MPG.PuRe

Estimating Dependency, Monitoring and Knowledge Discovery in High-Dimensional Data Streams

Author: Fouché Edouard
Publication venue: KIT-Bibliothek, Karlsruhe
Publication date: 07/12/2020
Field of study

Data Mining – known as the process of extracting knowledge from massive data sets – leads to phenomenal impacts on our society, and now affects nearly every aspect of our lives: from the layout in our local grocery store, to the ads and product recommendations we receive, the availability of treatments for common diseases, the prevention of crime, or the efficiency of industrial production processes. However, Data Mining remains difficult when (1) data is high-dimensional, i.e., has many attributes, and when (2) data comes as a stream. Extracting knowledge from high-dimensional data streams is impractical because one must cope with two orthogonal sets of challenges. On the one hand, the effects of the so-called "curse of dimensionality" bog down the performance of statistical methods and yield to increasingly complex Data Mining problems. On the other hand, the statistical properties of data streams may evolve in unexpected ways, a phenomenon known in the community as "concept drift". Thus, one needs to update their knowledge about data over time, i.e., to monitor the stream. While previous work addresses high-dimensional data sets and data streams to some extent, the intersection of both has received much less attention. Nevertheless, extracting knowledge in this setting is advantageous for many industrial applications: identifying patterns from high-dimensional data streams in real-time may lead to larger production volumes, or reduce operational costs. The goal of this dissertation is to bridge this gap. We first focus on dependency estimation, a fundamental task of Data Mining. Typically, one estimates dependency by quantifying the strength of statistical relationships. We identify the requirements for dependency estimation in high-dimensional data streams and propose a new estimation framework, Monte Carlo Dependency Estimation (MCDE), that fulfils them all. We show that MCDE leads to efficient dependency monitoring. Then, we generalise the task of monitoring by introducing the Scaling Multi-Armed Bandit (S-MAB) algorithms, extending the Multi-Armed Bandit (MAB) model. We show that our algorithms can efficiently monitor statistics by leveraging user-specific criteria. Finally, we describe applications of our contributions to Knowledge Discovery. We propose an algorithm, Streaming Greedy Maximum Random Deviation (SGMRD), which exploits our new methods to extract patterns, e.g., outliers, in high-dimensional data streams. Also, we present a new approach, that we name kj-Nearest Neighbours (kj-NN), to detect outlying documents within massive text corpora. We support our algorithmic contributions with theoretical guarantees, as well as extensive experiments against both synthetic and real-world data. We demonstrate the benefits of our methods against real-world use cases. Overall, this dissertation establishes fundamental tools for Knowledge Discovery in high-dimensional data streams, which help with many applications in the industry, e.g., anomaly detection, or predictive maintenance. To facilitate the application of our results and future research, we publicly release our implementations, experiments, and benchmark data via open-source platforms

KITopen