Search CORE

22 research outputs found

Secure Similarity Search

Author: Bum Han Kim
Dong Hoon Lee
Hyun-A Park
Justin Zhan
Yon Dohn Chung
Publication venue: International Association for Cryptologic Research (IACR)
Publication date: 16/08/2007
Field of study

One of the most substantial ways to protect users\u27 sensitive information is encryption. This paper is about the keyword index search system on encrypted documents. It has been thought that the search with errors over encrypted data is impossible because 1 bit difference over plaintexts may reduce to enormous bits difference over cyphertexts. We propose a novel idea to deal with the search with errors over encrypted data. We develop two similarity search schemes, implement the prototypes and provide substantial analysis. We define security requirements for the similarity search over encrypted data. The first scheme can achieve perfect privacy in similarity search but the second scheme is more efficient

Cryptology ePrint Archive

SSS-V2: Secure Similarity Search

Author: Hyun-A Park
Publication venue: International Association for Cryptologic Research (IACR)
Publication date: 14/11/2013
Field of study

Encrypting information has been regarded as one of the most substantial approaches to protect users’ sensitive information in radically changing internet technology era. In prior research, researchers have considered similarity search over encrypted documents infeasible, because the single-bit difference of a plaintext would result in an enormous bits difference in the corresponding ciphertext. However, we propose a novel idea of Security Similarity Search (SSS) over encrypted documents by applying character-wise encryption with approximate string matching to keyword index search systems. In order to do this, we define the security requirements of similarity search over encrypted data, propose two similarity search schemes, and formally prove the security of the schemes. The first scheme is more efficient, while the second scheme achieves perfect similarity search privacy. Surprisingly, the second scheme turns out to be faster than other keyword index search schemes with keywordwise encryption, while enjoying the same level of security. The schemes of SSS support “like query(‘ab%’)” and a query with misprints in that the character-wise encryption preserves the degree of similarity between two plaintexts, and renders approximate string matching between the corresponding ciphertexts possible without decryption

Cryptology ePrint Archive

Homomorphic Encryption for Speaker Recognition: Protection of Biometric Templates and Vendor Model Parameters

Author: Busch Christoph
Gomez-Barrero Marta
Isadskiy Sergey
Kolberg Jascha
Nautsch Andreas
Publication venue: 'International Speech Communication Association'
Publication date: 09/03/2018
Field of study

Data privacy is crucial when dealing with biometric data. Accounting for the latest European data privacy regulation and payment service directive, biometric template protection is essential for any commercial application. Ensuring unlinkability across biometric service operators, irreversibility of leaked encrypted templates, and renewability of e.g., voice models following the i-vector paradigm, biometric voice-based systems are prepared for the latest EU data privacy legislation. Employing Paillier cryptosystems, Euclidean and cosine comparators are known to ensure data privacy demands, without loss of discrimination nor calibration performance. Bridging gaps from template protection to speaker recognition, two architectures are proposed for the two-covariance comparator, serving as a generative model in this study. The first architecture preserves privacy of biometric data capture subjects. In the second architecture, model parameters of the comparator are encrypted as well, such that biometric service providers can supply the same comparison modules employing different key pairs to multiple biometric service operators. An experimental proof-of-concept and complexity analysis is carried out on the data from the 2013-2014 NIST i-vector machine learning challenge

arXiv.org e-Print Archive

Crossref

Secure Metric-Based Index for Similarity Cloud

Author: B. Hore
D. Novak
E. Chávez
E. Chávez
M. Batko
M. Skala
M.L. Yiu
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2012
Field of study

We propose a similarity index that ensures data privacy and thus is suitable for search systems outsourced in a cloud. The proposed solution can exploit existing efficient metric indexes based on a fixed set of reference points. The method has been fully implemented as a security extension of an existing established approach called M-Index. This Encrypted M-Index supports evaluation of standard range and nearest neighbors queries both in precise and approximate manner. In the first part of this work, we analyze various levels of privacy in existing or future similarity search systems; the proposed solution tries to keep a reasonable privacy level while relocating only the necessary amount of work from server to an authorized client. The Encrypted M-Index has been tested on three real data sets with focus on various cost components

Crossref

Univerzitní repozitář Masarykovy univerzity

Improving k-nn search and subspace clustering based on local intrinsic dimensionality

Author: Wali Arwa M.
Publication venue: Digital Commons @ NJIT
Publication date: 01/07/2018
Field of study

In several novel applications such as multimedia and recommender systems, data is often represented as object feature vectors in high-dimensional spaces. The high-dimensional data is always a challenge for state-of-the-art algorithms, because of the so-called curse of dimensionality . As the dimensionality increases, the discriminative ability of similarity measures diminishes to the point where many data analysis algorithms, such as similarity search and clustering, that depend on them lose their effectiveness. One way to handle this challenge is by selecting the most important features, which is essential for providing compact object representations as well as improving the overall search and clustering performance. Having compact feature vectors can further reduce the storage space and the computational complexity of search and learning tasks. Support-Weighted Intrinsic Dimensionality (support-weighted ID) is a new promising feature selection criterion that estimates the contribution of each feature to the overall intrinsic dimensionality. Support-weighted ID identifies relevant features locally for each object, and penalizes those features that have locally lower discriminative power as well as higher density. In fact, support-weighted ID measures the ability of each feature to locally discriminate between objects in the dataset. Based on support-weighted ID, this dissertation introduces three main research contributions: First, this dissertation proposes NNWID-Descent, a similarity graph construction method that utilizes the support-weighted ID criterion to identify and retain relevant features locally for each object and enhance the overall graph quality. Second, with the aim to improve the accuracy and performance of cluster analysis, this dissertation introduces k-LIDoids, a subspace clustering algorithm that extends the utility of support-weighted ID within a clustering framework in order to gradually select the subset of informative and important features per cluster. k-LIDoids is able to construct clusters together with finding a low dimensional subspace for each cluster. Finally, using the compact object and cluster representations from NNWID-Descent and k-LIDoids, this dissertation defines LID-Fingerprint, a new binary fingerprinting and multi-level indexing framework for the high-dimensional data. LID-Fingerprint can be used for hiding the information as a way of preventing passive adversaries as well as providing an efficient and secure similarity search and retrieval for the data stored on the cloud. When compared to other state-of-the-art algorithms, the good practical performance provides an evidence for the effectiveness of the proposed algorithms for the data in high-dimensional spaces

Digital Commons @ New Jersey Institute of Technology (NJIT)

SoK: Cryptographically Protected Database Search

Author: Cunningham Robert K.
Fuller Benjamin
Gadepally Vijay
Hamlin Ariel
Mitchell John Darby
Shay Richard
Shen Emily
Varia Mayank
Yerukhimovich Arkady
Publication venue
Publication date: 01/01/2017
Field of study

Protected database search systems cryptographically isolate the roles of reading from, writing to, and administering the database. This separation limits unnecessary administrator access and protects data in the case of system breaches. Since protected search was introduced in 2000, the area has grown rapidly; systems are offered by academia, start-ups, and established companies. However, there is no best protected search system or set of techniques. Design of such systems is a balancing act between security, functionality, performance, and usability. This challenge is made more difficult by ongoing database specialization, as some users will want the functionality of SQL, NoSQL, or NewSQL databases. This database evolution will continue, and the protected search community should be able to quickly provide functionality consistent with newly invented databases. At the same time, the community must accurately and clearly characterize the tradeoffs between different approaches. To address these challenges, we provide the following contributions: 1) An identification of the important primitive operations across database paradigms. We find there are a small number of base operations that can be used and combined to support a large number of database paradigms. 2) An evaluation of the current state of protected search systems in implementing these base operations. This evaluation describes the main approaches and tradeoffs for each base operation. Furthermore, it puts protected search in the context of unprotected search, identifying key gaps in functionality. 3) An analysis of attacks against protected search for different base queries. 4) A roadmap and tools for transforming a protected search system into a protected database, including an open-source performance evaluation platform and initial user opinions of protected search.Comment: 20 pages, to appear to IEEE Security and Privac

arXiv.org e-Print Archive

Crossref

Boston University Institutional Repository (OpenBU)

Don't forget private retrieval: distributed private similarity search for large language models

Author: Pentland Alex
South Tobin
Zyskind Guy
Publication venue
Publication date: 21/11/2023
Field of study

While the flexible capabilities of large language models (LLMs) allow them to answer a range of queries based on existing learned knowledge, information retrieval to augment generation is an important tool to allow LLMs to answer questions on information not included in pre-training data. Such private information is increasingly being generated in a wide array of distributed contexts by organizations and individuals. Performing such information retrieval using neural embeddings of queries and documents always leaked information about queries and database content unless both were stored locally. We present Private Retrieval Augmented Generation (PRAG), an approach that uses multi-party computation (MPC) to securely transmit queries to a distributed set of servers containing a privately constructed database to return top-k and approximate top-k documents. This is a first-of-its-kind approach to dense information retrieval that ensures no server observes a client's query or can see the database content. The approach introduces a novel MPC friendly protocol for inverted file approximate search (IVF) that allows for fast document search over distributed and private data in sublinear communication complexity. This work presents new avenues through which data for use in LLMs can be accessed and used without needing to centralize or forgo privacy

arXiv.org e-Print Archive

Efficient and secure document similarity search cloud utilizing mapreduce

Author: Alewiwi Mahmoud Khaled
Publication venue
Publication date: 01/01/2015
Field of study

Document similarity has important real life applications such as finding duplicate web sites and identifying plagiarism. While the basic techniques such as k-similarity algorithms have been long known, overwhelming amount of data, being collected such as in big data setting, calls for novel algorithms to find highly similar documents in reasonably short amount of time. In particular, pairwise comparison of documents sharing a common feature, necessitates prohibitively high storage and computation power. The wide spread availability of cloud computing provides users easy access to high storage and processing power. Furthermore, outsourcing their data to the cloud guarantees reliability and availability for their data while privacy and security concerns are not always properly addressed. This leads to the problem of protecting the privacy of sensitive data against adversaries including the cloud operator. Generally, traditional document similarity algorithms tend to compare all the documents in a data set sharing same terms (words) with query document. In our work, we propose a new filtering technique that works on plaintext data, which decreases the number of comparisons between the query set and the search set to find highly similar documents. The technique, referred as ZOLIP algorithm, is efficient and scalable, but does not provide security. We also design and implement three secure similarity search algorithms for text documents, namely Secure Sketch Search, Secure Minhash Search and Secure ZOLIP. The first algorithm utilizes locality sensitive hashing techniques and cosine similarity. While the second algorithm uses the Minhash Algorithm, the last one uses the encrypted ZOLIP Signature, which is the secure version of the ZOLIP algorithm. We utilize the Hadoop distributed file system and the MapReduce parallel programming model to scale our techniques to big data setting. Our experimental results on real data show that some of the proposed methods perform better than the previous work in the literature in terms of the number of joins, and therefore, speed

Sabanci University Research Database

効率的で安全な集合間類似結合に関する研究

Author: Mateus SilqueiraHicksonCruz
Publication venue
Publication date: 01/01/2018
Field of study

筑波大学 (University of Tsukuba)201

Tsukuba Repository

Chameleon: A Secure Cloud-Enabled and Queryable System with Elastic Properties

Author: Santos João Manuel Ferreira dos
Publication venue
Publication date: 01/12/2021
Field of study

There are two dominant themes that have become increasingly more important in our technological society. First, the recurrent use of cloud-based solutions which provide infrastructures, computation platforms and storage as services. Secondly, the use of applicational large logs for analytics and operational monitoring in critical systems. Moreover, auditing activities, debugging of applications and inspection of events generated by errors or potential unexpected operations - including those generated as alerts by intrusion detection systems - are common situations where extensive logs must be analyzed, and easy access is required. More often than not, a part of the generated logs can be deemed as sensitive, requiring a privacy-enhancing and queryable solution. In this dissertation, our main goal is to propose a novel approach of storing encrypted critical data in an elastic and scalable cloud-based storage, focusing on handling JSONbased ciphered documents. To this end, we make use of Searchable and Homomorphic Encryption methods to allow operations on the ciphered documents. Additionally, our solution allows for the user to be near oblivious to our system’s internals, providing transparency while in use. The achieved end goal is a unified middleware system capable of providing improved system usability, privacy, and rich querying over the data. This previously mentioned objective is addressed while maintaining server-side auditable logs, allowing for searchable capabilities by the log owner or authorized users, with integrity and authenticity proofs. Our proposed solution, named Chameleon, provides rich querying facilities on ciphered data - including conjunctive keyword, ordering correlation and boolean queries - while supporting field searching and nested aggregations. The aforementioned operations allow our solution to provide data analytics upon ciphered JSON documents, using Elasticsearch as our storage and search engine.O uso recorrente de soluções baseadas em nuvem tornaram-se cada vez mais importantes na nossa sociedade. Tais soluções fornecem infraestruturas, computação e armazenamento como serviços, para alem do uso de logs volumosos de sistemas e aplicações para análise e monitoramento operacional em sistemas críticos. Atividades de auditoria, debugging de aplicações ou inspeção de eventos gerados por erros ou possíveis operações inesperadas - incluindo alertas por sistemas de detecção de intrusão - são situações comuns onde logs extensos devem ser analisados com facilidade. Frequentemente, parte dos logs gerados podem ser considerados confidenciais, exigindo uma solução que permite manter a confidencialidades dos dados durante procuras. Nesta dissertação, o principal objetivo é propor uma nova abordagem de armazenar logs críticos num armazenamento elástico e escalável baseado na cloud. A solução proposta suporta documentos JSON encriptados, fazendo uso de Searchable Encryption e métodos de criptografia homomórfica com provas de integridade e autenticação. O objetivo alcançado é um sistema de middleware unificado capaz de fornecer privacidade, integridade e autenticidade, mantendo registos auditáveis do lado do servidor e permitindo pesquisas pelo proprietário dos logs ou usuários autorizados. A solução proposta, Chameleon, visa fornecer recursos de consulta atuando em cima de dados cifrados - incluindo queries conjuntivas, de ordenação e booleanas - suportando pesquisas de campo e agregações aninhadas. As operações suportadas permitem à nossa solução suportar data analytics sobre documentos JSON cifrados, utilizando o Elasticsearch como armazenamento e motor de busca

Repositório da Universidade Nova de Lisboa