4,424 research outputs found
SANNS: Scaling Up Secure Approximate k-Nearest Neighbors Search
The -Nearest Neighbor Search (-NNS) is the backbone of several
cloud-based services such as recommender systems, face recognition, and
database search on text and images. In these services, the client sends the
query to the cloud server and receives the response in which case the query and
response are revealed to the service provider. Such data disclosures are
unacceptable in several scenarios due to the sensitivity of data and/or privacy
laws.
In this paper, we introduce SANNS, a system for secure -NNS that keeps
client's query and the search result confidential. SANNS comprises two
protocols: an optimized linear scan and a protocol based on a novel sublinear
time clustering-based algorithm. We prove the security of both protocols in the
standard semi-honest model. The protocols are built upon several
state-of-the-art cryptographic primitives such as lattice-based additively
homomorphic encryption, distributed oblivious RAM, and garbled circuits. We
provide several contributions to each of these primitives which are applicable
to other secure computation tasks. Both of our protocols rely on a new circuit
for the approximate top- selection from numbers that is built from comparators.
We have implemented our proposed system and performed extensive experimental
results on four datasets in two different computation environments,
demonstrating more than faster response time compared to
optimally implemented protocols from the prior work. Moreover, SANNS is the
first work that scales to the database of 10 million entries, pushing the limit
by more than two orders of magnitude.Comment: 18 pages, to appear at USENIX Security Symposium 202
Query-driven learning for predictive analytics of data subspace cardinality
Fundamental to many predictive analytics tasks is the ability to estimate the cardinality (number of data items) of multi-dimensional data subspaces, defined by query selections over datasets. This is crucial for data analysts dealing with, e.g., interactive data subspace explorations, data subspace visualizations, and in query processing optimization. However, in many modern data systems, predictive analytics may be (i) too costly money-wise, e.g., in clouds, (ii) unreliable, e.g., in modern Big Data query engines, where accurate statistics are difficult to obtain/maintain, or (iii) infeasible, e.g., for privacy issues. We contribute a novel, query-driven, function estimation model of analyst-defined data subspace cardinality. The proposed estimation model is highly accurate in terms of prediction and accommodating the well-known selection queries: multi-dimensional range and distance-nearest neighbors (radius) queries. Our function estimation model: (i) quantizes the vectorial query space, by learning the analysts’ access patterns over a data space, (ii) associates query vectors with their corresponding cardinalities of the analyst-defined data subspaces, (iii) abstracts and employs query vectorial similarity to predict the cardinality of an unseen/unexplored data subspace, and (iv) identifies and adapts to possible changes of the query subspaces based on the theory of optimal stopping. The proposed model is decentralized, facilitating the scaling-out of such predictive analytics queries. The research significance of the model lies in that (i) it is an attractive solution when data-driven statistical techniques are undesirable or infeasible, (ii) it offers a scale-out, decentralized training solution, (iii) it is applicable to different selection query types, and (iv) it offers a performance that is superior to that of data-driven approaches
Scalable aggregation predictive analytics: a query-driven machine learning approach
We introduce a predictive modeling solution that provides high quality predictive analytics over aggregation queries in Big Data environments. Our predictive methodology is generally applicable in environments in which large-scale data owners may or may not restrict access to their data and allow only aggregation operators like COUNT to be executed over their data. In this context, our methodology is based on historical queries and their answers to accurately predict ad-hoc queries’ answers. We focus on the widely used set-cardinality, i.e., COUNT, aggregation query, as COUNT is a fundamental operator for both internal data system optimizations and for aggregation-oriented data exploration and predictive analytics. We contribute a novel, query-driven Machine Learning (ML) model whose goals are to: (i) learn the query-answer space from past issued queries, (ii) associate the query space with local linear regression & associative function estimators, (iii) define query similarity, and (iv) predict the cardinality of the answer set of unseen incoming queries, referred to the Set Cardinality Prediction (SCP) problem. Our ML model incorporates incremental ML algorithms for ensuring high quality prediction results. The significance of contribution lies in that it (i) is the only query-driven solution applicable over general Big Data environments, which include restricted-access data, (ii) offers incremental learning adjusted for arriving ad-hoc queries, which is well suited for query-driven data exploration, and (iii) offers a performance (in terms of scalability, SCP accuracy, processing time, and memory requirements) that is superior to data-centric approaches. We provide a comprehensive performance evaluation of our model evaluating its sensitivity, scalability and efficiency for quality predictive analytics. In addition, we report on the development and incorporation of our ML model in Spark showing its superior performance compared to the Spark’s COUNT method
Machine Learning in Wireless Sensor Networks: Algorithms, Strategies, and Applications
Wireless sensor networks monitor dynamic environments that change rapidly
over time. This dynamic behavior is either caused by external factors or
initiated by the system designers themselves. To adapt to such conditions,
sensor networks often adopt machine learning techniques to eliminate the need
for unnecessary redesign. Machine learning also inspires many practical
solutions that maximize resource utilization and prolong the lifespan of the
network. In this paper, we present an extensive literature review over the
period 2002-2013 of machine learning methods that were used to address common
issues in wireless sensor networks (WSNs). The advantages and disadvantages of
each proposed algorithm are evaluated against the corresponding problem. We
also provide a comparative guide to aid WSN designers in developing suitable
machine learning solutions for their specific application challenges.Comment: Accepted for publication in IEEE Communications Surveys and Tutorial
Assessing metric structures on GPGPU environments
Similarity search consists on retrieving objects within a database that are similar or relevant to a particular query. It is a topic of great interest to scientific community because of its many fields of application, such as searching for words and images on the World Wide Web, pattern recognition, detection of plagiarism, multimedia databases, among others. It is modeled through metric spaces, in which objects are represented in a black-box that contains only the distance between objects; calculating the distance function is costly and search systems operate at a high query rate. Metrical structures have been developed to optimize this process; such structures work as indexes and preprocess data to decrease the distance evaluations during the search.
Processing large volumes of data makes unfeasible the use of such structures without using parallel processing environments. Technologies based on multi- CPU and GPU architectures are among the most force due to its costs and performance.XV Workshop de Procesamiento Distribuido y Paralelo (WPDP)Red de Universidades con Carreras en Informática (RedUNCI
Assessing metric structures on GPGPU environments
Similarity search consists on retrieving objects within a database that are similar or relevant to a particular query. It is a topic of great interest to scientific community because of its many fields of application, such as searching for words and images on the World Wide Web, pattern recognition, detection of plagiarism, multimedia databases, among others. It is modeled through metric spaces, in which objects are represented in a black-box that contains only the distance between objects; calculating the distance function is costly and search systems operate at a high query rate. Metrical structures have been developed to optimize this process; such structures work as indexes and preprocess data to decrease the distance evaluations during the search.
Processing large volumes of data makes unfeasible the use of such structures without using parallel processing environments. Technologies based on multi- CPU and GPU architectures are among the most force due to its costs and performance.XV Workshop de Procesamiento Distribuido y Paralelo (WPDP)Red de Universidades con Carreras en Informática (RedUNCI
- …