6,131 research outputs found
On the Complexity of -Closeness Anonymization and Related Problems
An important issue in releasing individual data is to protect the sensitive
information from being leaked and maliciously utilized. Famous privacy
preserving principles that aim to ensure both data privacy and data integrity,
such as -anonymity and -diversity, have been extensively studied both
theoretically and empirically. Nonetheless, these widely-adopted principles are
still insufficient to prevent attribute disclosure if the attacker has partial
knowledge about the overall sensitive data distribution. The -closeness
principle has been proposed to fix this, which also has the benefit of
supporting numerical sensitive attributes. However, in contrast to
-anonymity and -diversity, the theoretical aspect of -closeness has
not been well investigated.
We initiate the first systematic theoretical study on the -closeness
principle under the commonly-used attribute suppression model. We prove that
for every constant such that , it is NP-hard to find an optimal
-closeness generalization of a given table. The proof consists of several
reductions each of which works for different values of , which together
cover the full range. To complement this negative result, we also provide exact
and fixed-parameter algorithms. Finally, we answer some open questions
regarding the complexity of -anonymity and -diversity left in the
literature.Comment: An extended abstract to appear in DASFAA 201
Feature-Based Diversity Optimization for Problem Instance Classification
Understanding the behaviour of heuristic search methods is a challenge. This
even holds for simple local search methods such as 2-OPT for the Traveling
Salesperson problem. In this paper, we present a general framework that is able
to construct a diverse set of instances that are hard or easy for a given
search heuristic. Such a diverse set is obtained by using an evolutionary
algorithm for constructing hard or easy instances that are diverse with respect
to different features of the underlying problem. Examining the constructed
instance sets, we show that many combinations of two or three features give a
good classification of the TSP instances in terms of whether they are hard to
be solved by 2-OPT.Comment: 20 pages, 18 figure
Clustering with diversity
We consider the {\em clustering with diversity} problem: given a set of
colored points in a metric space, partition them into clusters such that each
cluster has at least points, all of which have distinct colors.
We give a 2-approximation to this problem for any when the objective
is to minimize the maximum radius of any cluster. We show that the
approximation ratio is optimal unless , by providing a matching
lower bound. Several extensions to our algorithm have also been developed for
handling outliers. This problem is mainly motivated by applications in
privacy-preserving data publication.Comment: Extended abstract accepted in ICALP 2010. Keywords: Approximation
algorithm, k-center, k-anonymity, l-diversit
Multiwinner Voting with Fairness Constraints
Multiwinner voting rules are used to select a small representative subset of
candidates or items from a larger set given the preferences of voters. However,
if candidates have sensitive attributes such as gender or ethnicity (when
selecting a committee), or specified types such as political leaning (when
selecting a subset of news items), an algorithm that chooses a subset by
optimizing a multiwinner voting rule may be unbalanced in its selection -- it
may under or over represent a particular gender or political orientation in the
examples above. We introduce an algorithmic framework for multiwinner voting
problems when there is an additional requirement that the selected subset
should be "fair" with respect to a given set of attributes. Our framework
provides the flexibility to (1) specify fairness with respect to multiple,
non-disjoint attributes (e.g., ethnicity and gender) and (2) specify a score
function. We study the computational complexity of this constrained multiwinner
voting problem for monotone and submodular score functions and present several
approximation algorithms and matching hardness of approximation results for
various attribute group structure and types of score functions. We also present
simulations that suggest that adding fairness constraints may not affect the
scores significantly when compared to the unconstrained case.Comment: The conference version of this paper appears in IJCAI-ECAI 201
Submodular Optimization with Submodular Cover and Submodular Knapsack Constraints
We investigate two new optimization problems -- minimizing a submodular
function subject to a submodular lower bound constraint (submodular cover) and
maximizing a submodular function subject to a submodular upper bound constraint
(submodular knapsack). We are motivated by a number of real-world applications
in machine learning including sensor placement and data subset selection, which
require maximizing a certain submodular function (like coverage or diversity)
while simultaneously minimizing another (like cooperative cost). These problems
are often posed as minimizing the difference between submodular functions [14,
35] which is in the worst case inapproximable. We show, however, that by
phrasing these problems as constrained optimization, which is more natural for
many applications, we achieve a number of bounded approximation guarantees. We
also show that both these problems are closely related and an approximation
algorithm solving one can be used to obtain an approximation guarantee for the
other. We provide hardness results for both problems thus showing that our
approximation factors are tight up to log-factors. Finally, we empirically
demonstrate the performance and good scalability properties of our algorithms.Comment: 23 pages. A short version of this appeared in Advances of NIPS-201
Multiwinner Elections with Diversity Constraints
We develop a model of multiwinner elections that combines performance-based
measures of the quality of the committee (such as, e.g., Borda scores of the
committee members) with diversity constraints. Specifically, we assume that the
candidates have certain attributes (such as being a male or a female, being
junior or senior, etc.) and the goal is to elect a committee that, on the one
hand, has as high a score regarding a given performance measure, but that, on
the other hand, meets certain requirements (e.g., of the form "at least
of the committee members are junior candidates and at least are
females"). We analyze the computational complexity of computing winning
committees in this model, obtaining polynomial-time algorithms (exact and
approximate) and NP-hardness results. We focus on several natural classes of
voting rules and diversity constraints.Comment: A short version of this paper appears in the proceedings of AAAI-1
Max-sum diversity via convex programming
Diversity maximization is an important concept in information retrieval,
computational geometry and operations research. Usually, it is a variant of the
following problem: Given a ground set, constraints, and a function
that measures diversity of a subset, the task is to select a feasible subset
such that is maximized. The \emph{sum-dispersion} function , which is the sum of the pairwise distances in , is
in this context a prominent diversification measure. The corresponding
diversity maximization is the \emph{max-sum} or \emph{sum-sum diversification}.
Many recent results deal with the design of constant-factor approximation
algorithms of diversification problems involving sum-dispersion function under
a matroid constraint. In this paper, we present a PTAS for the max-sum
diversification problem under a matroid constraint for distances
of \emph{negative type}. Distances of negative type are, for
example, metric distances stemming from the and norm, as well
as the cosine or spherical, or Jaccard distance which are popular similarity
metrics in web and image search
- …