15 research outputs found

    Explicit Learning Curves for Transduction and Application to Clustering and Compression Algorithms

    Full text link
    Inductive learning is based on inferring a general rule from a finite data set and using it to label new data. In transduction one attempts to solve the problem of using a labeled training set to label a set of unlabeled points, which are given to the learner prior to learning. Although transduction seems at the outset to be an easier task than induction, there have not been many provably useful algorithms for transduction. Moreover, the precise relation between induction and transduction has not yet been determined. The main theoretical developments related to transduction were presented by Vapnik more than twenty years ago. One of Vapnik's basic results is a rather tight error bound for transductive classification based on an exact computation of the hypergeometric tail. While tight, this bound is given implicitly via a computational routine. Our first contribution is a somewhat looser but explicit characterization of a slightly extended PAC-Bayesian version of Vapnik's transductive bound. This characterization is obtained using concentration inequalities for the tail of sums of random variables obtained by sampling without replacement. We then derive error bounds for compression schemes such as (transductive) support vector machines and for transduction algorithms based on clustering. The main observation used for deriving these new error bounds and algorithms is that the unlabeled test points, which in the transductive setting are known in advance, can be used in order to construct useful data dependent prior distributions over the hypothesis space

    Secure and Efficient Matrix Multiplication with MapReduce

    No full text
    International audienceMapReduce is one of the most popular distributed programming paradigms that allows processing big data sets in parallel on a cluster. MapReduce users often outsource data and computations to a public cloud, which yields inherent security concerns. In this paper, we consider the problem of matrix multiplication and one of the most efficient matrix multiplication algorithms: the Strassen-Winograd (SW) algorithm. Our first contribution is a distributed MapReduce algorithm based on SW. Then, we tackle the security concerns that occur when outsourcing matrix multiplication computation to a honest-but-curious cloud i.e., that executes tasks dutifully, but tries to learn as much information as possible. Our main contribution is a secure distributed MapReduce algorithm called S2M3 (Secure Strassen-Winograd Matrix Multiplication with MapReduce) that enjoys security guarantees such as: none of the cloud nodes can learn the input or the output data. We formally prove the security properties of S2M3 and we present an empirical evaluation devoted to show its efficiency

    Secure Joins with MapReduce

    Get PDF
    International audienceMapReduce is one of the most popular programming paradigms that allows a user to process Big data sets. Our goal is to add privacy guarantees to the two standard algorithms of join computation for MapReduce: the cascade algorithm and the hypercube algorithm. We assume that the data is externalized in an honest-but-curious server and a user is allowed to query the join result. We design, implement, and prove the security of two approaches: (i) Secure-Private, assuming that the public cloud and the user do not collude, (ii) Collision-Resistant-Secure-Private, which resists to collusions between the public cloud and the user i.e., when the public cloud knows the secret key of the user

    Stable transductive learning

    No full text
    Abstract. We develop a new error bound for transductive learning algorithms. The slack term in the new bound is a function of a relaxed notion of transductive stability, which measures the sensitivity of the algorithm to most pairwise exchanges of training and test set points. Our bound is based on a novel concentration inequality for symmetric functions of permutations. We also present a simple sampling technique that can estimate, with high probability, the weak stability of transductive learning algorithms with respect to a given dataset. We demonstrate the usefulness of our estimation technique on a well known transductive learning algorithm.
    corecore