6 research outputs found
Bounded-Leakage Differential Privacy
We introduce and study a relaxation of differential privacy [Dwork et al., 2006] that accounts for mechanisms that leak some additional, bounded information about the database. We apply this notion to reason about two distinct settings where the notion of differential privacy is of limited use. First, we consider cases, such as in the 2020 US Census [Abowd, 2018], in which some information about the database is released exactly or with small noise. Second, we consider the accumulation of privacy harms for an individual across studies that may not even include the data of this individual. The tools that we develop for bounded-leakage differential privacy allow us reason about privacy loss in these settings, and to show that individuals preserve some meaningful protections
Comparative Learning: A Sample Complexity Theory for Two Hypothesis Classes
In many learning theory problems, a central role is played by a hypothesis class: we might assume that the data is labeled according to a hypothesis in the class (usually referred to as the realizable setting), or we might evaluate the learned model by comparing it with the best hypothesis in the class (the agnostic setting). Taking a step beyond these classic setups that involve only a single hypothesis class, we study a variety of problems that involve two hypothesis classes simultaneously.
We introduce comparative learning as a combination of the realizable and agnostic settings in PAC learning: given two binary hypothesis classes S and B, we assume that the data is labeled according to a hypothesis in the source class S and require the learned model to achieve an accuracy comparable to the best hypothesis in the benchmark class B. Even when both S and B have infinite VC dimensions, comparative learning can still have a small sample complexity. We show that the sample complexity of comparative learning is characterized by the mutual VC dimension VC(S,B) which we define to be the maximum size of a subset shattered by both S and B. We also show a similar result in the online setting, where we give a regret characterization in terms of the analogous mutual Littlestone dimension Ldim(S,B). These results also hold for partial hypotheses.
We additionally show that the insights necessary to characterize the sample complexity of comparative learning can be applied to other tasks involving two hypothesis classes. In particular, we characterize the sample complexity of realizable multiaccuracy and multicalibration using the mutual fat-shattering dimension, an analogue of the mutual VC dimension for real-valued hypotheses. This not only solves an open problem proposed by Hu, Peale, Reingold (2022), but also leads to independently interesting results extending classic ones about regression, boosting, and covering number to our two-hypothesis-class setting
Bidding Strategies for Proportional Representation in Advertisement Campaigns
Many companies rely on advertising platforms such as Google, Facebook, or
Instagram to recruit a large and diverse applicant pool for job openings. Prior
works have shown that equitable bidding may not result in equitable outcomes
due to heterogeneous levels of competition for different types of individuals.
Suggestions have been made to address this problem via revisions to the
advertising platform. However, it may be challenging to convince platforms to
undergo a costly re-vamp of their system, and in addition it might not offer
the flexibility necessary to capture the many types of fairness notions and
other constraints that advertisers would like to ensure. Instead, we consider
alterations that make no change to the platform mechanism and instead change
the bidding strategies used by advertisers. We compare two natural fairness
objectives: one in which the advertisers must treat groups equally when bidding
in order to achieve a yield with group-parity guarantees, and another in which
the bids are not constrained and only the yield must satisfy parity
constraints. We show that requiring parity with respect to both bids and yield
can result in an arbitrarily large decrease in efficiency compared to requiring
equal yield proportions alone. We find that autobidding is a natural way to
realize this latter objective and show how existing work in this area can be
extended to provide efficient bidding strategies that provide high utility
while satisfying group parity constraints as well as deterministic and
randomized rounding techniques to uphold these guarantees. Finally, we
demonstrate the effectiveness of our proposed solutions on data adapted from a
real-world employment dataset.Comment: Foundations of Responsible Computing (FORC 2023
Leximax Approximations and Representative Cohort Selection
Finding a representative cohort from a broad pool of candidates is a goal that arises in many contexts such as choosing governing committees and consumer panels. While there are many ways to define the degree to which a cohort represents a population, a very appealing solution concept is lexicographic maximality (leximax) which offers a natural (pareto-optimal like) interpretation that the utility of no population can be increased without decreasing the utility of a population that is already worse off. However, finding a leximax solution can be highly dependent on small variations in the utility of certain groups. In this work, we explore new notions of approximate leximax solutions with three distinct motivations: better algorithmic efficiency, exploiting significant utility improvements, and robustness to noise. Among other definitional contributions, we give a new notion of an approximate leximax that satisfies a similarly appealing semantic interpretation and relate it to algorithmically-feasible approximate leximax notions. When group utilities are linear over cohort candidates, we give an efficient polynomial-time algorithm for finding a leximax distribution over cohort candidates in the exact as well as in the approximate setting. Furthermore, we show that finding an integer solution to leximax cohort selection with linear utilities is NP-Hard