25,488 research outputs found
Robust Group Linkage
We study the problem of group linkage: linking records that refer to entities
in the same group. Applications for group linkage include finding businesses in
the same chain, finding conference attendees from the same affiliation, finding
players from the same team, etc. Group linkage faces challenges not present for
traditional record linkage. First, although different members in the same group
can share some similar global values of an attribute, they represent different
entities so can also have distinct local values for the same or different
attributes, requiring a high tolerance for value diversity. Second, groups can
be huge (with tens of thousands of records), requiring high scalability even
after using good blocking strategies.
We present a two-stage algorithm: the first stage identifies cores containing
records that are very likely to belong to the same group, while being robust to
possible erroneous values; the second stage collects strong evidence from the
cores and leverages it for merging more records into the same group, while
being tolerant to differences in local values of an attribute. Experimental
results show the high effectiveness and efficiency of our algorithm on various
real-world data sets
k-anonymous Microdata Release via Post Randomisation Method
The problem of the release of anonymized microdata is an important topic in
the fields of statistical disclosure control (SDC) and privacy preserving data
publishing (PPDP), and yet it remains sufficiently unsolved. In these research
fields, k-anonymity has been widely studied as an anonymity notion for mainly
deterministic anonymization algorithms, and some probabilistic relaxations have
been developed. However, they are not sufficient due to their limitations,
i.e., being weaker than the original k-anonymity or requiring strong parametric
assumptions. First we propose Pk-anonymity, a new probabilistic k-anonymity,
and prove that Pk-anonymity is a mathematical extension of k-anonymity rather
than a relaxation. Furthermore, Pk-anonymity requires no parametric
assumptions. This property has a significant meaning in the viewpoint that it
enables us to compare privacy levels of probabilistic microdata release
algorithms with deterministic ones. Second, we apply Pk-anonymity to the post
randomization method (PRAM), which is an SDC algorithm based on randomization.
PRAM is proven to satisfy Pk-anonymity in a controlled way, i.e, one can
control PRAM's parameter so that Pk-anonymity is satisfied. On the other hand,
PRAM is also known to satisfy -differential privacy, a recent
popular and strong privacy notion. This fact means that our results
significantly enhance PRAM since it implies the satisfaction of both important
notions: k-anonymity and -differential privacy.Comment: 22 pages, 4 figure
- …