35,424 research outputs found
A high performance k-NN approach using binary neural networks
This paper evaluates a novel k-nearest neighbour (k-NN) classifier built from binary neural networks. The binary neural approach uses robust encoding to map standard ordinal, categorical and numeric data sets onto a binary neural network. The binary neural network uses high speed pattern matching to recall a candidate set of matching records, which are then processed by a conventional k-NN approach to determine the k-best matches. We compare various configurations of the binary approach to a conventional approach for memory overheads, training speed, retrieval speed and retrieval accuracy. We demonstrate the superior performance with respect to speed and memory requirements of the binary approach compared to the standard approach and we pinpoint the optimal configurations. (C) 2003 Elsevier Ltd. All rights reserved
Achieving Secure and Efficient Cloud Search Services: Cross-Lingual Multi-Keyword Rank Search over Encrypted Cloud Data
Multi-user multi-keyword ranked search scheme in arbitrary language is a
novel multi-keyword rank searchable encryption (MRSE) framework based on
Paillier Cryptosystem with Threshold Decryption (PCTD). Compared to previous
MRSE schemes constructed based on the k-nearest neighbor searcha-ble encryption
(KNN-SE) algorithm, it can mitigate some draw-backs and achieve better
performance in terms of functionality and efficiency. Additionally, it does not
require a predefined keyword set and support keywords in arbitrary languages.
However, due to the pattern of exact matching of keywords in the new MRSE
scheme, multilingual search is limited to each language and cannot be searched
across languages. In this pa-per, we propose a cross-lingual multi-keyword rank
search (CLRSE) scheme which eliminates the barrier of languages and achieves
semantic extension with using the Open Multilingual Wordnet. Our CLRSE scheme
also realizes intelligent and per-sonalized search through flexible keyword and
language prefer-ence settings. We evaluate the performance of our scheme in
terms of security, functionality, precision and efficiency, via extensive
experiments
SANNS: Scaling Up Secure Approximate k-Nearest Neighbors Search
The -Nearest Neighbor Search (-NNS) is the backbone of several
cloud-based services such as recommender systems, face recognition, and
database search on text and images. In these services, the client sends the
query to the cloud server and receives the response in which case the query and
response are revealed to the service provider. Such data disclosures are
unacceptable in several scenarios due to the sensitivity of data and/or privacy
laws.
In this paper, we introduce SANNS, a system for secure -NNS that keeps
client's query and the search result confidential. SANNS comprises two
protocols: an optimized linear scan and a protocol based on a novel sublinear
time clustering-based algorithm. We prove the security of both protocols in the
standard semi-honest model. The protocols are built upon several
state-of-the-art cryptographic primitives such as lattice-based additively
homomorphic encryption, distributed oblivious RAM, and garbled circuits. We
provide several contributions to each of these primitives which are applicable
to other secure computation tasks. Both of our protocols rely on a new circuit
for the approximate top- selection from numbers that is built from comparators.
We have implemented our proposed system and performed extensive experimental
results on four datasets in two different computation environments,
demonstrating more than faster response time compared to
optimally implemented protocols from the prior work. Moreover, SANNS is the
first work that scales to the database of 10 million entries, pushing the limit
by more than two orders of magnitude.Comment: 18 pages, to appear at USENIX Security Symposium 202
Distributed sparse signal recovery in networked systems
In this dissertation, two classes of distributed algorithms are developed for sparse signal recovery in large sensor networks. All the proposed approaches consist of local computation (LC) and global computation (GC) steps carried out by a group of distributed local sensors, and do not require the local sensors to know the global sensing matrix. These algorithms are based on the original approximate message passing (AMP) and iterative hard thresholding (IHT) algorithms in the area of compressed sensing (CS), also known as sparse signal recovery. For distributed AMP (DiAMP), we develop a communication-efficient algorithm GCAMP. Numerical results demonstrate that it outperforms the modified thresholding algorithm (MTA), another popular GC algorithm for Top-K query from distributed large databases. For distributed IHT (DIHT), there is a step size which depends on the norm of the global sensing matrix A. The exact computation of is non-separable. We propose a new method, based on the random matrix theory (RMT), to give a very tight statistical upper bound of , and the calculation of that upper bound is separable without any communication cost. In the GC step of DIHT, we develop another algorithm named GC.K, which is also communication-efficient and outperforms MTA. Then, by adjusting the metric of communication cost, which enables transmission of quantized data, and taking advantage of the correlation of data in adjacent iterations, we develop quantized adaptive GCAMP (Q-A-GCAMP) and quantized adaptive GC.K (Q-A-GC.K) algorithms, leading to a significant improvement on communication savings.
Furthermore, we prove that state evolution (SE), a fundamental property of AMP that in high dimensionality limit, the output data are asymptotically Gaussian regardless of the distribution of input data, also holds for DiAMP. In addition, compared with the most recent theoretical results that SE holds for sensing matrices with independent subgaussian entries, we prove that the universality of SE can be extended to far more general sensing matrices. These two theoretical results provide strong guarantee of AMP\u27s performance, and greatly broaden its potential applications
Distributed top-k aggregation queries at large
Top-k query processing is a fundamental building block for efficient ranking in a large number of applications. Efficiency is a central issue, especially for distributed settings, when the data is spread across different nodes in a network. This paper introduces novel optimization methods for top-k aggregation queries in such distributed environments. The optimizations can be applied to all algorithms that fall into the frameworks of the prior TPUT and KLEE methods. The optimizations address three degrees of freedom: 1) hierarchically grouping input lists into top-k operator trees and optimizing the tree structure, 2) computing data-adaptive scan depths for different input sources, and 3) data-adaptive sampling of a small subset of input sources in scenarios with hundreds or thousands of query-relevant network nodes. All optimizations are based on a statistical cost model that utilizes local synopses, e.g., in the form of histograms, efficiently computed convolutions, and estimators based on order statistics. The paper presents comprehensive experiments, with three different real-life datasets and using the ns-2 network simulator for a packet-level simulation of a large Internet-style network
- …