15 research outputs found
Efficient and secure ranked multi-keyword search on encrypted cloud data
Information search and document retrieval from a remote database (e.g. cloud server) requires submitting the search terms to the database holder. However, the search terms may contain sensitive information that must be kept secret from the database holder. Moreover, the privacy concerns apply to the relevant documents retrieved by the user in the later stage since they may also contain sensitive data and reveal information about sensitive search terms. A related protocol, Private Information Retrieval (PIR), provides useful cryptographic tools to hide the queried search terms and the data retrieved from the database while returning most relevant documents to the user. In this paper, we propose a practical privacy-preserving ranked keyword search scheme based on PIR that allows multi-keyword queries with ranking capability. The proposed scheme increases the security of the keyword search scheme while still satisfying efficient computation and communication requirements. To the best of our knowledge the majority of previous works are not efficient for assumed scenario where documents are large files. Our scheme outperforms the most efficient proposals in literature in terms of time complexity by several orders of magnitude
A practical and secure multi-keyword search method over encrypted cloud data
Cloud computing technologies become more and more popular every year, as many organizations tend to outsource their data utilizing robust and fast services of clouds while lowering the cost of hardware ownership. Although its benefits are welcomed, privacy is still a remaining concern that needs to be addressed. We propose an efficient privacy-preserving search method over encrypted cloud data that utilizes minhash functions. Most of the work in literature can only support a single feature search in queries which reduces the effectiveness. One of the main advantages of our proposed method is the capability of multi-keyword search in a single query. The proposed method is proved to satisfy adaptive semantic security definition. We also combine an effective ranking capability that is based on term frequency-inverse document frequency (tf-idf) values of keyword document pairs. Our analysis demonstrates that the proposed scheme is proved to be privacy-preserving, efficient and effective
A game theoretic model for digital identity and trust in online communities
Digital identity and trust management mechanisms play an important role on the Internet. They help users make decisions on trustworthiness of digital identities in online communities or ecommerce environments, which have significant security consequences. This work aims to contribute to construction of an analytical foundation for digital identity and trust by adopting a quantitative approach. A game theoretic model is developed to quantify community effects and other factors in trust decisions. The model captures factors such as peer pressure and personality traits. The existence and uniqueness of a Nash equilibrium solution is studied and shown for the trust game defined. In addition, synchronous and asynchronous update algorithms are shown to converge to the Nash equilibrium solution. A numerical analysis is provided for a number
of scenarios that illustrate the interplay between user behavior and community effects
Fuzzy vault scheme for fingerprint verification: implementation, analysis and improvements
Fuzzy vault is a well-known technique that is used in biometric authentication applications. This thesis handles the fuzzy vault scheme and improves it to strengthen against previously suggested attacks while analyzing the effects of these improvements on the performance. We compare the performances of two different methods used in the implementation of fuzzy vault, namely brute force and Reed Solomon decoding with fingerprint biometric data. We show that the locations of fake (chaff) points leak some valuable information and propose a new chaff point placement technique that prevents that information leakage. A novel method for chaff point creation that decreases the success rate of the brute force attack from 100% to less than 3.3% is also proposed in this work. Moreover, a special hash function that allows us to perform matching in the hash space which protects the biometric information against the 'correlation attack' is proposed. Security analysis of this method is also presented in this thesis. We implemented the scheme with and without the hash function to calculate false accept and false reject rates in different settings
Privacy-preserving targeted advertising scheme for IPTV using the cloud
In this paper, we present a privacy-preserving scheme for targeted advertising via the Internet Protocol TV (IPTV). The scheme uses a communication model involving a collection of viewers/subscribers, a content provider (IPTV), an advertiser, and a cloud server. To provide high quality directed advertising service, the advertiser can utilize not only demographic information of subscribers, but also their watching habits. The latter includes watching history, preferences for IPTV content and watching rate, which are published on the cloud server periodically (e.g. weekly) along with anonymized demographics. Since the published data may leak sensitive information about subscribers, it is safeguarded using cryptographic techniques in addition to the anonymization of demographics. The techniques used by the advertiser, which can be manifested in its queries to the cloud, are considered (trade) secrets and therefore are protected as well. The cloud is oblivious to the published data, the queries of the advertiser as well as its own responses to these queries. Only a legitimate advertiser, endorsed with a so-called {\em trapdoor} by the IPTV, can query the cloud and utilize the query results. The performance of the proposed scheme is evaluated with experiments, which show that the scheme is suitable for practical usage
Improved fuzzy vault scheme for fingerprint verification
Fuzzy vault is a well-known technique to address the privacy concerns in biometric identification applications. We revisit the fuzzy vault scheme to address implementation, efficiency, and security issues encountered in its realization. We use the fingerprint data as a case study. We compare the performances of two different methods used in the implementation of fuzzy vault, namely brute force and Reed Solomon decoding. We show that the locations of fake (chaff) points in the vault leak information on the genuine points and propose
a new chaff point placement technique that makes distinguishing genuine points impossible. We also propose a novel method for creation of chaff points that decreases the success rate of the brute force attack from 100% to less than 3.5%. While this paper lays out a complete guideline as to how the fuzzy vault is implemented in an efficient and secure way, it also points out that more research is needed to thwart the proposed attacks by presenting ideas for future research
Private search over big data leveraging distributed file system and parallel processing
In this work, we identify the security and privacy problems associated with a certain Big Data application, namely secure keyword-based search over encrypted cloud data and emphasize the actual challenges and technical difficulties in the Big Data setting. More specifically, we provide definitions from which privacy requirements can be derived. In addition, we adapt an existing work on privacy-preserving keyword-based search method to the Big Data setting, in which, not only data is huge but also changing and accumulating very fast. Our proposal is scalable in the sense that it can leverage distributed file systems and parallel programming techniques such as the Hadoop Distributed File System (HDFS) and the MapReduce programming model, to work with very large data sets. We also propose a lazy idf-updating method that can efficiently handle the relevancy scores of the documents in a dynamically changing, large data set. We empirically show the efficiency and accuracy of the method through extensive set of experiments on real data
A Unified Framework for Secure Search Over Encrypted Cloud Data
This paper presents a unified framework that supports different types of privacy-preserving search queries over encrypted cloud data. In the framework, users can perform any of the multi-keyword search, range search and k-nearest neighbor search operations in a privacy-preserving manner. All three types of queries are transformed into predicate-based search leveraging bucketization, locality sensitive hashing and homomorphic encryption techniques. The proposed framework is implemented using Hadoop MapReduce, and its efficiency and accuracy are evaluated using publicly available real data sets. The implementation results show that the proposed framework can effectively be used in moderate sized data sets and it is scalable for much larger data sets provided that the number of computers in the Hadoop cluster is increased. To the best of our knowledge, the proposed framework is the first privacy-preserving solution, in which three different types of search queries are effectively applied over encrypted data
Secure sketch search for document similarity
Document similarity search is an important problem that has many applications especially in outsourced data. With the wide spread of cloud computing, users tend to outsource their data to remote servers which are not necessarily trusted. This leads to the problem of protecting the privacy of sensitive data. We design and implement two secure similarity search schemes for textual documents utilizing locality sensitive hashing techniques for cosine similarity. While the first one provides very fast search time results and a decent level of privacy, the second method enjoys enhanced security properties such as hiding the search and access patterns but with higher latency
Efficient top-k similarity document search utilizing distributed file systems and cosine similarity
Document similarity has important real life applications such as finding duplicate web sites and identifying plagiarism. While the basic techniques such as k-similarity algorithms have been long known, overwhelming amount of data, being collected such as in big data setting, calls for novel algorithms to find highly similar documents in reasonably short amount of time. In particular, pairwise comparison of documents’ features, a key operation in calculating document similarity, necessitates prohibitively high storage and computation power. In this paper, we propose a new filtering technique that decreases the number of comparisons between the query set and the search set to find highly similar documents. The proposed filtering technique utilizes Z-order prefix, based on the cosine similarity measure, in which only the most important features are used first to find highly similar documents. We propose a three-phase approach, where the phases are near duplicate detection, common important terms and join phase. We utilize the Hadoop distributed file system and the MapReduce parallel programming model to scale our techniques to big data setting. Our experimental results on real data show that the proposed method performs better than the previous work in the literature in terms of the number of joins, and therefore, speed