2,281 research outputs found
Blowfish Privacy: Tuning Privacy-Utility Trade-offs using Policies
Privacy definitions provide ways for trading-off the privacy of individuals
in a statistical database for the utility of downstream analysis of the data.
In this paper, we present Blowfish, a class of privacy definitions inspired by
the Pufferfish framework, that provides a rich interface for this trade-off. In
particular, we allow data publishers to extend differential privacy using a
policy, which specifies (a) secrets, or information that must be kept secret,
and (b) constraints that may be known about the data. While the secret
specification allows increased utility by lessening protection for certain
individual properties, the constraint specification provides added protection
against an adversary who knows correlations in the data (arising from
constraints). We formalize policies and present novel algorithms that can
handle general specifications of sensitive information and certain count
constraints. We show that there are reasonable policies under which our privacy
mechanisms for k-means clustering, histograms and range queries introduce
significantly lesser noise than their differentially private counterparts. We
quantify the privacy-utility trade-offs for various policies analytically and
empirically on real datasets.Comment: Full version of the paper at SIGMOD'14 Snowbird, Utah US
A PAC-Bayesian Analysis of Graph Clustering and Pairwise Clustering
We formulate weighted graph clustering as a prediction problem: given a
subset of edge weights we analyze the ability of graph clustering to predict
the remaining edge weights. This formulation enables practical and theoretical
comparison of different approaches to graph clustering as well as comparison of
graph clustering with other possible ways to model the graph. We adapt the
PAC-Bayesian analysis of co-clustering (Seldin and Tishby, 2008; Seldin, 2009)
to derive a PAC-Bayesian generalization bound for graph clustering. The bound
shows that graph clustering should optimize a trade-off between empirical data
fit and the mutual information that clusters preserve on the graph nodes. A
similar trade-off derived from information-theoretic considerations was already
shown to produce state-of-the-art results in practice (Slonim et al., 2005;
Yom-Tov and Slonim, 2009). This paper supports the empirical evidence by
providing a better theoretical foundation, suggesting formal generalization
guarantees, and offering a more accurate way to deal with finite sample issues.
We derive a bound minimization algorithm and show that it provides good results
in real-life problems and that the derived PAC-Bayesian bound is reasonably
tight
A Convex Relaxation for Weakly Supervised Classifiers
This paper introduces a general multi-class approach to weakly supervised
classification. Inferring the labels and learning the parameters of the model
is usually done jointly through a block-coordinate descent algorithm such as
expectation-maximization (EM), which may lead to local minima. To avoid this
problem, we propose a cost function based on a convex relaxation of the
soft-max loss. We then propose an algorithm specifically designed to
efficiently solve the corresponding semidefinite program (SDP). Empirically,
our method compares favorably to standard ones on different datasets for
multiple instance learning and semi-supervised learning as well as on
clustering tasks.Comment: Appears in Proceedings of the 29th International Conference on
Machine Learning (ICML 2012
Unsupervised Learning from Narrated Instruction Videos
We address the problem of automatically learning the main steps to complete a
certain task, such as changing a car tire, from a set of narrated instruction
videos. The contributions of this paper are three-fold. First, we develop a new
unsupervised learning approach that takes advantage of the complementary nature
of the input video and the associated narration. The method solves two
clustering problems, one in text and one in video, applied one after each other
and linked by joint constraints to obtain a single coherent sequence of steps
in both modalities. Second, we collect and annotate a new challenging dataset
of real-world instruction videos from the Internet. The dataset contains about
800,000 frames for five different tasks that include complex interactions
between people and objects, and are captured in a variety of indoor and outdoor
settings. Third, we experimentally demonstrate that the proposed method can
automatically discover, in an unsupervised manner, the main steps to achieve
the task and locate the steps in the input videos.Comment: Appears in: 2016 IEEE Conference on Computer Vision and Pattern
Recognition (CVPR 2016). 21 page
- …