63 research outputs found
Approximation algorithm for the kinetic robust K-center problem
AbstractTwo complications frequently arise in real-world applications, motion and the contamination of data by outliers. We consider a fundamental clustering problem, the k-center problem, within the context of these two issues. We are given a finite point set S of size n and an integer k. In the standard k-center problem, the objective is to compute a set of k center points to minimize the maximum distance from any point of S to its closest center, or equivalently, the smallest radius such that S can be covered by k disks of this radius. In the discrete k-center problem the disk centers are drawn from the points of S, and in the absolute k-center problem the disk centers are unrestricted.We generalize this problem in two ways. First, we assume that points are in continuous motion, and the objective is to maintain a solution over time. Second, we assume that some given robustness parameter 0<t⩽1 is given, and the objective is to compute the smallest radius such that there exist k disks of this radius that cover at least ⌈tn⌉ points of S. We present a kinetic data structure (in the KDS framework) that maintains a (3+ε)-approximation for the robust discrete k-center problem and a (4+ε)-approximation for the robust absolute k-center problem, both under the assumption that k is a constant. We also improve on a previous 8-approximation for the non-robust discrete kinetic k-center problem, for arbitrary k, and show that our data structure achieves a (4+ε)-approximation. All these results hold in any metric space of constant doubling dimension, which includes Euclidean space of constant dimension
Fair Meta-Learning: Learning How to Learn Fairly
Data sets for fairness relevant tasks can lack examples or be biased
according to a specific label in a sensitive attribute. We demonstrate the
usefulness of weight based meta-learning approaches in such situations. For
models that can be trained through gradient descent, we demonstrate that there
are some parameter configurations that allow models to be optimized from a few
number of gradient steps and with minimal data which are both fair and
accurate. To learn such weight sets, we adapt the popular MAML algorithm to
Fair-MAML by the inclusion of a fairness regularization term. In practice,
Fair-MAML allows practitioners to train fair machine learning models from only
a few examples when data from related tasks is available. We empirically
exhibit the value of this technique by comparing to relevant baselines.Comment: arXiv admin note: substantial text overlap with arXiv:1908.0909
Runaway Feedback Loops in Predictive Policing
Predictive policing systems are increasingly used to determine how to
allocate police across a city in order to best prevent crime. Discovered crime
data (e.g., arrest counts) are used to help update the model, and the process
is repeated. Such systems have been empirically shown to be susceptible to
runaway feedback loops, where police are repeatedly sent back to the same
neighborhoods regardless of the true crime rate.
In response, we develop a mathematical model of predictive policing that
proves why this feedback loop occurs, show empirically that this model exhibits
such problems, and demonstrate how to change the inputs to a predictive
policing system (in a black-box manner) so the runaway feedback loop does not
occur, allowing the true crime rate to be learned. Our results are
quantitative: we can establish a link (in our model) between the degree to
which runaway feedback causes problems and the disparity in crime rates between
areas. Moreover, we can also demonstrate the way in which \emph{reported}
incidents of crime (those reported by residents) and \emph{discovered}
incidents of crime (i.e. those directly observed by police officers dispatched
as a result of the predictive policing algorithm) interact: in brief, while
reported incidents can attenuate the degree of runaway feedback, they cannot
entirely remove it without the interventions we suggest.Comment: Extended version accepted to the 1st Conference on Fairness,
Accountability and Transparency, 2018. Adds further treatment of reported as
well as discovered incident
Fairness in representation: quantifying stereotyping as a representational harm
While harms of allocation have been increasingly studied as part of the
subfield of algorithmic fairness, harms of representation have received
considerably less attention. In this paper, we formalize two notions of
stereotyping and show how they manifest in later allocative harms within the
machine learning pipeline. We also propose mitigation strategies and
demonstrate their effectiveness on synthetic datasets.Comment: 9 pages, 6 figures, Siam International Conference on Data Minin
Problems with Shapley-value-based explanations as feature importance measures
Game-theoretic formulations of feature importance have become popular as a
way to "explain" machine learning models. These methods define a cooperative
game between the features of a model and distribute influence among these input
elements using some form of the game's unique Shapley values. Justification for
these methods rests on two pillars: their desirable mathematical properties,
and their applicability to specific motivations for explanations. We show that
mathematical problems arise when Shapley values are used for feature importance
and that the solutions to mitigate these necessarily induce further complexity,
such as the need for causal reasoning. We also draw on additional literature to
argue that Shapley values do not provide explanations which suit human-centric
goals of explainability.Comment: Accepted to ICML 202
Realistic Compression of Kinetic Sensor Data
We introduce a realistic analysis for a framework for storing and
processing kinetic data observed by sensor networks. The massive data
sets generated by these networks motivates a significant need for
compression. We are interested in the kinetic data generated by a finite
set of objects moving through space. Our previously introduced framework
and accompanying compression algorithm assumed a given set of sensors,
each of which continuously observes these moving objects in its
surrounding region. The model relies purely on sensor observations; it
allows points to move freely and requires no advance notification of
motion plans. Here, we extend the initial theoretical analysis of this
framework and compression scheme to a more realistic setting. We extend
the current understanding of empirical entropy to introduce definitions
for joint empirical entropy, conditional empirical entropy, and
empirical independence. We also introduce a notion of limited
independence between the outputs of the sensors in the system. We show
that, even with this notion of limited independence and in both the
statistical and empirical settings, the previously introduced
compression algorithm achieves an encoding size on the order of the
optimal
- …