35,424 research outputs found

    A high performance k-NN approach using binary neural networks

    Get PDF
    This paper evaluates a novel k-nearest neighbour (k-NN) classifier built from binary neural networks. The binary neural approach uses robust encoding to map standard ordinal, categorical and numeric data sets onto a binary neural network. The binary neural network uses high speed pattern matching to recall a candidate set of matching records, which are then processed by a conventional k-NN approach to determine the k-best matches. We compare various configurations of the binary approach to a conventional approach for memory overheads, training speed, retrieval speed and retrieval accuracy. We demonstrate the superior performance with respect to speed and memory requirements of the binary approach compared to the standard approach and we pinpoint the optimal configurations. (C) 2003 Elsevier Ltd. All rights reserved

    Achieving Secure and Efficient Cloud Search Services: Cross-Lingual Multi-Keyword Rank Search over Encrypted Cloud Data

    Full text link
    Multi-user multi-keyword ranked search scheme in arbitrary language is a novel multi-keyword rank searchable encryption (MRSE) framework based on Paillier Cryptosystem with Threshold Decryption (PCTD). Compared to previous MRSE schemes constructed based on the k-nearest neighbor searcha-ble encryption (KNN-SE) algorithm, it can mitigate some draw-backs and achieve better performance in terms of functionality and efficiency. Additionally, it does not require a predefined keyword set and support keywords in arbitrary languages. However, due to the pattern of exact matching of keywords in the new MRSE scheme, multilingual search is limited to each language and cannot be searched across languages. In this pa-per, we propose a cross-lingual multi-keyword rank search (CLRSE) scheme which eliminates the barrier of languages and achieves semantic extension with using the Open Multilingual Wordnet. Our CLRSE scheme also realizes intelligent and per-sonalized search through flexible keyword and language prefer-ence settings. We evaluate the performance of our scheme in terms of security, functionality, precision and efficiency, via extensive experiments

    SANNS: Scaling Up Secure Approximate k-Nearest Neighbors Search

    Get PDF
    The kk-Nearest Neighbor Search (kk-NNS) is the backbone of several cloud-based services such as recommender systems, face recognition, and database search on text and images. In these services, the client sends the query to the cloud server and receives the response in which case the query and response are revealed to the service provider. Such data disclosures are unacceptable in several scenarios due to the sensitivity of data and/or privacy laws. In this paper, we introduce SANNS, a system for secure kk-NNS that keeps client's query and the search result confidential. SANNS comprises two protocols: an optimized linear scan and a protocol based on a novel sublinear time clustering-based algorithm. We prove the security of both protocols in the standard semi-honest model. The protocols are built upon several state-of-the-art cryptographic primitives such as lattice-based additively homomorphic encryption, distributed oblivious RAM, and garbled circuits. We provide several contributions to each of these primitives which are applicable to other secure computation tasks. Both of our protocols rely on a new circuit for the approximate top-kk selection from nn numbers that is built from O(n+k2)O(n + k^2) comparators. We have implemented our proposed system and performed extensive experimental results on four datasets in two different computation environments, demonstrating more than 1831×18-31\times faster response time compared to optimally implemented protocols from the prior work. Moreover, SANNS is the first work that scales to the database of 10 million entries, pushing the limit by more than two orders of magnitude.Comment: 18 pages, to appear at USENIX Security Symposium 202

    Distributed sparse signal recovery in networked systems

    Get PDF
    In this dissertation, two classes of distributed algorithms are developed for sparse signal recovery in large sensor networks. All the proposed approaches consist of local computation (LC) and global computation (GC) steps carried out by a group of distributed local sensors, and do not require the local sensors to know the global sensing matrix. These algorithms are based on the original approximate message passing (AMP) and iterative hard thresholding (IHT) algorithms in the area of compressed sensing (CS), also known as sparse signal recovery. For distributed AMP (DiAMP), we develop a communication-efficient algorithm GCAMP. Numerical results demonstrate that it outperforms the modified thresholding algorithm (MTA), another popular GC algorithm for Top-K query from distributed large databases. For distributed IHT (DIHT), there is a step size μ\mu which depends on the 2\ell_2 norm of the global sensing matrix A. The exact computation of A2\|A\|_2 is non-separable. We propose a new method, based on the random matrix theory (RMT), to give a very tight statistical upper bound of A2\|A\|_2, and the calculation of that upper bound is separable without any communication cost. In the GC step of DIHT, we develop another algorithm named GC.K, which is also communication-efficient and outperforms MTA. Then, by adjusting the metric of communication cost, which enables transmission of quantized data, and taking advantage of the correlation of data in adjacent iterations, we develop quantized adaptive GCAMP (Q-A-GCAMP) and quantized adaptive GC.K (Q-A-GC.K) algorithms, leading to a significant improvement on communication savings. Furthermore, we prove that state evolution (SE), a fundamental property of AMP that in high dimensionality limit, the output data are asymptotically Gaussian regardless of the distribution of input data, also holds for DiAMP. In addition, compared with the most recent theoretical results that SE holds for sensing matrices with independent subgaussian entries, we prove that the universality of SE can be extended to far more general sensing matrices. These two theoretical results provide strong guarantee of AMP\u27s performance, and greatly broaden its potential applications

    Distributed top-k aggregation queries at large

    Get PDF
    Top-k query processing is a fundamental building block for efficient ranking in a large number of applications. Efficiency is a central issue, especially for distributed settings, when the data is spread across different nodes in a network. This paper introduces novel optimization methods for top-k aggregation queries in such distributed environments. The optimizations can be applied to all algorithms that fall into the frameworks of the prior TPUT and KLEE methods. The optimizations address three degrees of freedom: 1) hierarchically grouping input lists into top-k operator trees and optimizing the tree structure, 2) computing data-adaptive scan depths for different input sources, and 3) data-adaptive sampling of a small subset of input sources in scenarios with hundreds or thousands of query-relevant network nodes. All optimizations are based on a statistical cost model that utilizes local synopses, e.g., in the form of histograms, efficiently computed convolutions, and estimators based on order statistics. The paper presents comprehensive experiments, with three different real-life datasets and using the ns-2 network simulator for a packet-level simulation of a large Internet-style network
    corecore