40 research outputs found

    Privacy Preserving Multi-Server k-means Computation over Horizontally Partitioned Data

    Full text link
    The k-means clustering is one of the most popular clustering algorithms in data mining. Recently a lot of research has been concentrated on the algorithm when the dataset is divided into multiple parties or when the dataset is too large to be handled by the data owner. In the latter case, usually some servers are hired to perform the task of clustering. The dataset is divided by the data owner among the servers who together perform the k-means and return the cluster labels to the owner. The major challenge in this method is to prevent the servers from gaining substantial information about the actual data of the owner. Several algorithms have been designed in the past that provide cryptographic solutions to perform privacy preserving k-means. We provide a new method to perform k-means over a large set using multiple servers. Our technique avoids heavy cryptographic computations and instead we use a simple randomization technique to preserve the privacy of the data. The k-means computed has exactly the same efficiency and accuracy as the k-means computed over the original dataset without any randomization. We argue that our algorithm is secure against honest but curious and passive adversary.Comment: 19 pages, 4 tables. International Conference on Information Systems Security. Springer, Cham, 201

    Efficient and Error-Correcting Data Structures for Membership and Polynomial Evaluation

    Get PDF
    We construct efficient data structures that are resilient against a constant fraction of adversarial noise. Our model requires that the decoder answers most queries correctly with high probability and for the remaining queries, the decoder with high probability either answers correctly or declares "don't know." Furthermore, if there is no noise on the data structure, it answers all queries correctly with high probability. Our model is the common generalization of a model proposed recently by de Wolf and the notion of "relaxed locally decodable codes" developed in the PCP literature. We measure the efficiency of a data structure in terms of its length, measured by the number of bits in its representation, and query-answering time, measured by the number of bit-probes to the (possibly corrupted) representation. In this work, we study two data structure problems: membership and polynomial evaluation. We show that these two problems have constructions that are simultaneously efficient and error-correcting.Comment: An abridged version of this paper appears in STACS 201
    corecore