15,636 research outputs found
Visualizing and Interacting with Concept Hierarchies
Concept Hierarchies and Formal Concept Analysis are theoretically well
grounded and largely experimented methods. They rely on line diagrams called
Galois lattices for visualizing and analysing object-attribute sets. Galois
lattices are visually seducing and conceptually rich for experts. However they
present important drawbacks due to their concept oriented overall structure:
analysing what they show is difficult for non experts, navigation is
cumbersome, interaction is poor, and scalability is a deep bottleneck for
visual interpretation even for experts. In this paper we introduce semantic
probes as a means to overcome many of these problems and extend usability and
application possibilities of traditional FCA visualization methods. Semantic
probes are visual user centred objects which extract and organize reduced
Galois sub-hierarchies. They are simpler, clearer, and they provide a better
navigation support through a rich set of interaction possibilities. Since probe
driven sub-hierarchies are limited to users focus, scalability is under control
and interpretation is facilitated. After some successful experiments, several
applications are being developed with the remaining problem of finding a
compromise between simplicity and conceptual expressivity
Privacy via the Johnson-Lindenstrauss Transform
Suppose that party A collects private information about its users, where each
user's data is represented as a bit vector. Suppose that party B has a
proprietary data mining algorithm that requires estimating the distance between
users, such as clustering or nearest neighbors. We ask if it is possible for
party A to publish some information about each user so that B can estimate the
distance between users without being able to infer any private bit of a user.
Our method involves projecting each user's representation into a random,
lower-dimensional space via a sparse Johnson-Lindenstrauss transform and then
adding Gaussian noise to each entry of the lower-dimensional representation. We
show that the method preserves differential privacy---where the more privacy is
desired, the larger the variance of the Gaussian noise. Further, we show how to
approximate the true distances between users via only the lower-dimensional,
perturbed data. Finally, we consider other perturbation methods such as
randomized response and draw comparisons to sketch-based methods. While the
goal of releasing user-specific data to third parties is more broad than
preserving distances, this work shows that distance computations with privacy
is an achievable goal.Comment: 24 page
A Framework for High-Accuracy Privacy-Preserving Mining
To preserve client privacy in the data mining process, a variety of
techniques based on random perturbation of data records have been proposed
recently. In this paper, we present a generalized matrix-theoretic model of
random perturbation, which facilitates a systematic approach to the design of
perturbation mechanisms for privacy-preserving mining. Specifically, we
demonstrate that (a) the prior techniques differ only in their settings for the
model parameters, and (b) through appropriate choice of parameter settings, we
can derive new perturbation techniques that provide highly accurate mining
results even under strict privacy guarantees. We also propose a novel
perturbation mechanism wherein the model parameters are themselves
characterized as random variables, and demonstrate that this feature provides
significant improvements in privacy at a very marginal cost in accuracy.
While our model is valid for random-perturbation-based privacy-preserving
mining in general, we specifically evaluate its utility here with regard to
frequent-itemset mining on a variety of real datasets. The experimental results
indicate that our mechanisms incur substantially lower identity and support
errors as compared to the prior techniques
- …