63 research outputs found

    Approximation algorithm for the kinetic robust K-center problem

    Get PDF
    AbstractTwo complications frequently arise in real-world applications, motion and the contamination of data by outliers. We consider a fundamental clustering problem, the k-center problem, within the context of these two issues. We are given a finite point set S of size n and an integer k. In the standard k-center problem, the objective is to compute a set of k center points to minimize the maximum distance from any point of S to its closest center, or equivalently, the smallest radius such that S can be covered by k disks of this radius. In the discrete k-center problem the disk centers are drawn from the points of S, and in the absolute k-center problem the disk centers are unrestricted.We generalize this problem in two ways. First, we assume that points are in continuous motion, and the objective is to maintain a solution over time. Second, we assume that some given robustness parameter 0<t⩽1 is given, and the objective is to compute the smallest radius such that there exist k disks of this radius that cover at least ⌈tn⌉ points of S. We present a kinetic data structure (in the KDS framework) that maintains a (3+ε)-approximation for the robust discrete k-center problem and a (4+ε)-approximation for the robust absolute k-center problem, both under the assumption that k is a constant. We also improve on a previous 8-approximation for the non-robust discrete kinetic k-center problem, for arbitrary k, and show that our data structure achieves a (4+ε)-approximation. All these results hold in any metric space of constant doubling dimension, which includes Euclidean space of constant dimension

    Fair Meta-Learning: Learning How to Learn Fairly

    Full text link
    Data sets for fairness relevant tasks can lack examples or be biased according to a specific label in a sensitive attribute. We demonstrate the usefulness of weight based meta-learning approaches in such situations. For models that can be trained through gradient descent, we demonstrate that there are some parameter configurations that allow models to be optimized from a few number of gradient steps and with minimal data which are both fair and accurate. To learn such weight sets, we adapt the popular MAML algorithm to Fair-MAML by the inclusion of a fairness regularization term. In practice, Fair-MAML allows practitioners to train fair machine learning models from only a few examples when data from related tasks is available. We empirically exhibit the value of this technique by comparing to relevant baselines.Comment: arXiv admin note: substantial text overlap with arXiv:1908.0909

    Runaway Feedback Loops in Predictive Policing

    Full text link
    Predictive policing systems are increasingly used to determine how to allocate police across a city in order to best prevent crime. Discovered crime data (e.g., arrest counts) are used to help update the model, and the process is repeated. Such systems have been empirically shown to be susceptible to runaway feedback loops, where police are repeatedly sent back to the same neighborhoods regardless of the true crime rate. In response, we develop a mathematical model of predictive policing that proves why this feedback loop occurs, show empirically that this model exhibits such problems, and demonstrate how to change the inputs to a predictive policing system (in a black-box manner) so the runaway feedback loop does not occur, allowing the true crime rate to be learned. Our results are quantitative: we can establish a link (in our model) between the degree to which runaway feedback causes problems and the disparity in crime rates between areas. Moreover, we can also demonstrate the way in which \emph{reported} incidents of crime (those reported by residents) and \emph{discovered} incidents of crime (i.e. those directly observed by police officers dispatched as a result of the predictive policing algorithm) interact: in brief, while reported incidents can attenuate the degree of runaway feedback, they cannot entirely remove it without the interventions we suggest.Comment: Extended version accepted to the 1st Conference on Fairness, Accountability and Transparency, 2018. Adds further treatment of reported as well as discovered incident

    Fairness in representation: quantifying stereotyping as a representational harm

    Full text link
    While harms of allocation have been increasingly studied as part of the subfield of algorithmic fairness, harms of representation have received considerably less attention. In this paper, we formalize two notions of stereotyping and show how they manifest in later allocative harms within the machine learning pipeline. We also propose mitigation strategies and demonstrate their effectiveness on synthetic datasets.Comment: 9 pages, 6 figures, Siam International Conference on Data Minin

    Problems with Shapley-value-based explanations as feature importance measures

    Full text link
    Game-theoretic formulations of feature importance have become popular as a way to "explain" machine learning models. These methods define a cooperative game between the features of a model and distribute influence among these input elements using some form of the game's unique Shapley values. Justification for these methods rests on two pillars: their desirable mathematical properties, and their applicability to specific motivations for explanations. We show that mathematical problems arise when Shapley values are used for feature importance and that the solutions to mitigate these necessarily induce further complexity, such as the need for causal reasoning. We also draw on additional literature to argue that Shapley values do not provide explanations which suit human-centric goals of explainability.Comment: Accepted to ICML 202

    Realistic Compression of Kinetic Sensor Data

    Get PDF
    We introduce a realistic analysis for a framework for storing and processing kinetic data observed by sensor networks. The massive data sets generated by these networks motivates a significant need for compression. We are interested in the kinetic data generated by a finite set of objects moving through space. Our previously introduced framework and accompanying compression algorithm assumed a given set of sensors, each of which continuously observes these moving objects in its surrounding region. The model relies purely on sensor observations; it allows points to move freely and requires no advance notification of motion plans. Here, we extend the initial theoretical analysis of this framework and compression scheme to a more realistic setting. We extend the current understanding of empirical entropy to introduce definitions for joint empirical entropy, conditional empirical entropy, and empirical independence. We also introduce a notion of limited independence between the outputs of the sensors in the system. We show that, even with this notion of limited independence and in both the statistical and empirical settings, the previously introduced compression algorithm achieves an encoding size on the order of the optimal
    • …
    corecore