8,969 research outputs found

    A bayesian approach for on-line max and min auditing

    Get PDF
    In this paper we consider the on-line max and min query auditing problem: given a private association between fields in a data set, a sequence of max and min queries that have already been posed about the data, their corresponding answers and a new query, deny the answer if a private information is inferred or give the true answer otherwise. We give a probabilistic definition of privacy and demonstrate that max and min queries, without “no duplicates”assumption, can be audited by means of a Bayesian network. Moreover, we show how our auditing approach is able to manage user prior-knowledge

    Learning Fair Naive Bayes Classifiers by Discovering and Eliminating Discrimination Patterns

    Full text link
    As machine learning is increasingly used to make real-world decisions, recent research efforts aim to define and ensure fairness in algorithmic decision making. Existing methods often assume a fixed set of observable features to define individuals, but lack a discussion of certain features not being observed at test time. In this paper, we study fairness of naive Bayes classifiers, which allow partial observations. In particular, we introduce the notion of a discrimination pattern, which refers to an individual receiving different classifications depending on whether some sensitive attributes were observed. Then a model is considered fair if it has no such pattern. We propose an algorithm to discover and mine for discrimination patterns in a naive Bayes classifier, and show how to learn maximum likelihood parameters subject to these fairness constraints. Our approach iteratively discovers and eliminates discrimination patterns until a fair model is learned. An empirical evaluation on three real-world datasets demonstrates that we can remove exponentially many discrimination patterns by only adding a small fraction of them as constraints

    Explain3D: Explaining Disagreements in Disjoint Datasets

    Get PDF
    Data plays an important role in applications, analytic processes, and many aspects of human activity. As data grows in size and complexity, we are met with an imperative need for tools that promote understanding and explanations over data-related operations. Data management research on explanations has focused on the assumption that data resides in a single dataset, under one common schema. But the reality of today's data is that it is frequently un-integrated, coming from different sources with different schemas. When different datasets provide different answers to semantically similar questions, understanding the reasons for the discrepancies is challenging and cannot be handled by the existing single-dataset solutions. In this paper, we propose Explain3D, a framework for explaining the disagreements across disjoint datasets (3D). Explain3D focuses on identifying the reasons for the differences in the results of two semantically similar queries operating on two datasets with potentially different schemas. Our framework leverages the queries to perform a semantic mapping across the relevant parts of their provenance; discrepancies in this mapping point to causes of the queries' differences. Exploiting the queries gives Explain3D an edge over traditional schema matching and record linkage techniques, which are query-agnostic. Our work makes the following contributions: (1) We formalize the problem of deriving optimal explanations for the differences of the results of semantically similar queries over disjoint datasets. (2) We design a 3-stage framework for solving the optimal explanation problem. (3) We develop a smart-partitioning optimizer that improves the efficiency of the framework by orders of magnitude. (4)~We experiment with real-world and synthetic data to demonstrate that Explain3D can derive precise explanations efficiently

    Regulation and Evolution of Compliance in Common Pool Resources

    Get PDF
    The paper jointly models the evolution of compliance with regulation and the evolution of a CPR stock, by combining replicator dynamics describing compliance with harvesting rules, with resource stock dynamics. This evolutionary approach suggests that coexistence, in long run equilibrium, of both cooperative and non-cooperative rules under regulation is possible. Stock effects on profits and a certain structure of auditing probabilities could imply the emergence of a limit cycle in areas of low stock levels, as an equilibrium outcome for compliance and the biomass stock. It might be easier for the regulator to obtain full compliance under precommitment to fixed auditing probabilities.Common pool resources (CPR), harvesting, regulation, replicator dynamics, compliance

    A concise history of analytical accounting: examining the use of mathematical notions in our discipline.

    Get PDF
    Este trabajo ofrece una sucinta revisiĂłn de los mĂ©todos de matemĂĄtica analĂ­tica empleados en tenedurĂ­a de libros y contabilidad durante los Ășltimos cinco milenios. The paper offers a succinct survey of analytical-mathematical methods as employed in bookkeeping and accounting during some five millennia.Historia de la contabilidad analĂ­tica, uso de nociones matemĂĄticas, ĂĄlgebra matricial, information perspectiva, clean surplus theory, teorĂ­a matemĂĄtica de la agencia. History of analytical accounting, use of mathematical notions, matrix algebra, information perspective, clean surplus theory, mathematical agency theory.

    Sensitivity analysis in multilinear probabilistic models

    Get PDF
    Sensitivity methods for the analysis of the outputs of discrete Bayesian networks have been extensively studied and implemented in different software packages. These methods usually focus on the study of sensitivity functions and on the impact of a parameter change to the Chan–Darwiche distance. Although not fully recognized, the majority of these results rely heavily on the multilinear structure of atomic probabilities in terms of the conditional probability parameters associated with this type of network. By defining a statistical model through the polynomial expression of its associated defining conditional probabilities, we develop here a unifying approach to sensitivity methods applicable to a large suite of models including extensions of Bayesian networks, for instance context-specific ones. Our algebraic approach enables us to prove that for models whose defining polynomial is multilinear both the Chan–Darwiche distance and any divergence in the family of ϕ-divergences are minimized for a certain class of multi-parameter contemporaneous variations when parameters are proportionally covaried
    • 

    corecore