8,969 research outputs found
A bayesian approach for on-line max and min auditing
In this paper we consider the on-line max and min query auditing problem: given a private association between fields in a data set, a sequence of max and min queries that have already been posed about the data, their corresponding answers and a new query, deny the answer if a private information is inferred or give the true answer otherwise. We give a probabilistic definition of privacy and demonstrate that max and min queries, without âno duplicatesâassumption, can be audited by means of a Bayesian network. Moreover, we show how our auditing approach is able to manage user prior-knowledge
Learning Fair Naive Bayes Classifiers by Discovering and Eliminating Discrimination Patterns
As machine learning is increasingly used to make real-world decisions, recent
research efforts aim to define and ensure fairness in algorithmic decision
making. Existing methods often assume a fixed set of observable features to
define individuals, but lack a discussion of certain features not being
observed at test time. In this paper, we study fairness of naive Bayes
classifiers, which allow partial observations. In particular, we introduce the
notion of a discrimination pattern, which refers to an individual receiving
different classifications depending on whether some sensitive attributes were
observed. Then a model is considered fair if it has no such pattern. We propose
an algorithm to discover and mine for discrimination patterns in a naive Bayes
classifier, and show how to learn maximum likelihood parameters subject to
these fairness constraints. Our approach iteratively discovers and eliminates
discrimination patterns until a fair model is learned. An empirical evaluation
on three real-world datasets demonstrates that we can remove exponentially many
discrimination patterns by only adding a small fraction of them as constraints
Explain3D: Explaining Disagreements in Disjoint Datasets
Data plays an important role in applications, analytic processes, and many
aspects of human activity. As data grows in size and complexity, we are met
with an imperative need for tools that promote understanding and explanations
over data-related operations. Data management research on explanations has
focused on the assumption that data resides in a single dataset, under one
common schema. But the reality of today's data is that it is frequently
un-integrated, coming from different sources with different schemas. When
different datasets provide different answers to semantically similar questions,
understanding the reasons for the discrepancies is challenging and cannot be
handled by the existing single-dataset solutions.
In this paper, we propose Explain3D, a framework for explaining the
disagreements across disjoint datasets (3D). Explain3D focuses on identifying
the reasons for the differences in the results of two semantically similar
queries operating on two datasets with potentially different schemas. Our
framework leverages the queries to perform a semantic mapping across the
relevant parts of their provenance; discrepancies in this mapping point to
causes of the queries' differences. Exploiting the queries gives Explain3D an
edge over traditional schema matching and record linkage techniques, which are
query-agnostic. Our work makes the following contributions: (1) We formalize
the problem of deriving optimal explanations for the differences of the results
of semantically similar queries over disjoint datasets. (2) We design a 3-stage
framework for solving the optimal explanation problem. (3) We develop a
smart-partitioning optimizer that improves the efficiency of the framework by
orders of magnitude. (4)~We experiment with real-world and synthetic data to
demonstrate that Explain3D can derive precise explanations efficiently
Regulation and Evolution of Compliance in Common Pool Resources
The paper jointly models the evolution of compliance with regulation and the evolution of a CPR stock, by combining replicator dynamics describing compliance with harvesting rules, with resource stock dynamics. This evolutionary approach suggests that coexistence, in long run equilibrium, of both cooperative and non-cooperative rules under regulation is possible. Stock effects on profits and a certain structure of auditing probabilities could imply the emergence of a limit cycle in areas of low stock levels, as an equilibrium outcome for compliance and the biomass stock. It might be easier for the regulator to obtain full compliance under precommitment to fixed auditing probabilities.Common pool resources (CPR), harvesting, regulation, replicator dynamics, compliance
A concise history of analytical accounting: examining the use of mathematical notions in our discipline.
Este trabajo ofrece una sucinta revisiĂłn de los mĂ©todos de matemĂĄtica analĂtica empleados en tenedurĂa de libros y contabilidad durante los Ășltimos cinco milenios. The paper offers a succinct survey of analytical-mathematical methods as employed in bookkeeping and accounting during some five millennia.Historia de la contabilidad analĂtica, uso de nociones matemĂĄticas, ĂĄlgebra matricial, information perspectiva, clean surplus theory, teorĂa matemĂĄtica de la agencia. History of analytical accounting, use of mathematical notions, matrix algebra, information perspective, clean surplus theory, mathematical agency theory.
Sensitivity analysis in multilinear probabilistic models
Sensitivity methods for the analysis of the outputs of discrete Bayesian networks have been extensively studied and implemented in different software packages. These methods usually focus on the study of sensitivity functions and on the impact of a parameter change to the ChanâDarwiche distance. Although not fully recognized, the majority of these results rely heavily on the multilinear structure of atomic probabilities in terms of the conditional probability parameters associated with this type of network. By defining a statistical model through the polynomial expression of its associated defining conditional probabilities, we develop here a unifying approach to sensitivity methods applicable to a large suite of models including extensions of Bayesian networks, for instance context-specific ones. Our algebraic approach enables us to prove that for models whose defining polynomial is multilinear both the ChanâDarwiche distance and any divergence in the family of Ï-divergences are minimized for a certain class of multi-parameter contemporaneous variations when parameters are proportionally covaried
- âŠ