5,250 research outputs found
Latent Semantic Indexing (LSI) Based Distributed System and Search On Encrypted Data
Latent semantic indexing (LSI) was initially introduced to overcome the issues of synonymy and polysemy of the traditional vector space model (VSM). LSI, however, has challenges of its own, mainly scalability. Despite being introduced in 1990, there are few attempts that provide an efficient solution for LSI, most of the literature is focuses on LSI’s applications rather than improving the original algorithm. In this work we analyze the first framework to provide scalable implementation of LSI and report its performance on the distributed environment of RAAD.
The possibility of adopting LSI in the field of searching over encrypted data is also investigated. The importance of that field is stemmed from the need for cloud computing as an effective computing paradigm that provides an affordable access to high computational power. Encryption is usually applied to prevent unauthorized access to the data (the host is assumed to be curious), however this limits accessibility to the data given that search over encryption is yet to catch with the latest techniques adopted by the Information Retrieval (IR) community. In this work we propose a system that uses LSI for indexing and free-query text for retrieving.
The results show that the available LSI framework does scale on large datasets, however it had some limitations with respect to factors like dictionary size and memory limit. When replicating the exact settings of the baseline on RAAD, it performed relatively slower. This could be resulted by the fact that RAAD uses a distributed file system or because of network latency. The results also show that the proposed system for applying LSI on encrypted data retrieved documents in the same order as the baseline (unencrypted data)
State of The Art and Hot Aspects in Cloud Data Storage Security
Along with the evolution of cloud computing and cloud storage towards matu-
rity, researchers have analyzed an increasing range of cloud computing security
aspects, data security being an important topic in this area. In this paper, we
examine the state of the art in cloud storage security through an overview of
selected peer reviewed publications. We address the question of defining cloud
storage security and its different aspects, as well as enumerate the main vec-
tors of attack on cloud storage. The reviewed papers present techniques for key
management and controlled disclosure of encrypted data in cloud storage, while
novel ideas regarding secure operations on encrypted data and methods for pro-
tection of data in fully virtualized environments provide a glimpse of the toolbox
available for securing cloud storage. Finally, new challenges such as emergent
government regulation call for solutions to problems that did not receive enough
attention in earlier stages of cloud computing, such as for example geographical
location of data. The methods presented in the papers selected for this review
represent only a small fraction of the wide research effort within cloud storage
security. Nevertheless, they serve as an indication of the diversity of problems
that are being addressed
Private search over big data leveraging distributed file system and parallel processing
In this work, we identify the security and privacy problems associated with a certain Big Data application, namely secure keyword-based search over encrypted cloud data and emphasize the actual challenges and technical difficulties in the Big Data setting. More specifically, we provide definitions from which privacy requirements can be derived. In addition, we adapt an existing work on privacy-preserving keyword-based search method to the Big Data setting, in which, not only data is huge but also changing and accumulating very fast. Our proposal is scalable in the sense that it can leverage distributed file systems and parallel programming techniques such as the Hadoop Distributed File System (HDFS) and the MapReduce programming model, to work with very large data sets. We also propose a lazy idf-updating method that can efficiently handle the relevancy scores of the documents in a dynamically changing, large data set. We empirically show the efficiency and accuracy of the method through extensive set of experiments on real data
Achieving Secure and Efficient Cloud Search Services: Cross-Lingual Multi-Keyword Rank Search over Encrypted Cloud Data
Multi-user multi-keyword ranked search scheme in arbitrary language is a
novel multi-keyword rank searchable encryption (MRSE) framework based on
Paillier Cryptosystem with Threshold Decryption (PCTD). Compared to previous
MRSE schemes constructed based on the k-nearest neighbor searcha-ble encryption
(KNN-SE) algorithm, it can mitigate some draw-backs and achieve better
performance in terms of functionality and efficiency. Additionally, it does not
require a predefined keyword set and support keywords in arbitrary languages.
However, due to the pattern of exact matching of keywords in the new MRSE
scheme, multilingual search is limited to each language and cannot be searched
across languages. In this pa-per, we propose a cross-lingual multi-keyword rank
search (CLRSE) scheme which eliminates the barrier of languages and achieves
semantic extension with using the Open Multilingual Wordnet. Our CLRSE scheme
also realizes intelligent and per-sonalized search through flexible keyword and
language prefer-ence settings. We evaluate the performance of our scheme in
terms of security, functionality, precision and efficiency, via extensive
experiments
Chameleon: A Secure Cloud-Enabled and Queryable System with Elastic Properties
There are two dominant themes that have become increasingly more important in our
technological society. First, the recurrent use of cloud-based solutions which provide
infrastructures, computation platforms and storage as services. Secondly, the use of applicational
large logs for analytics and operational monitoring in critical systems. Moreover,
auditing activities, debugging of applications and inspection of events generated by errors
or potential unexpected operations - including those generated as alerts by intrusion
detection systems - are common situations where extensive logs must be analyzed, and
easy access is required. More often than not, a part of the generated logs can be deemed
as sensitive, requiring a privacy-enhancing and queryable solution.
In this dissertation, our main goal is to propose a novel approach of storing encrypted
critical data in an elastic and scalable cloud-based storage, focusing on handling JSONbased
ciphered documents. To this end, we make use of Searchable and Homomorphic
Encryption methods to allow operations on the ciphered documents. Additionally, our
solution allows for the user to be near oblivious to our system’s internals, providing
transparency while in use. The achieved end goal is a unified middleware system capable
of providing improved system usability, privacy, and rich querying over the data. This
previously mentioned objective is addressed while maintaining server-side auditable logs,
allowing for searchable capabilities by the log owner or authorized users, with integrity
and authenticity proofs.
Our proposed solution, named Chameleon, provides rich querying facilities on ciphered
data - including conjunctive keyword, ordering correlation and boolean queries
- while supporting field searching and nested aggregations. The aforementioned operations
allow our solution to provide data analytics upon ciphered JSON documents, using
Elasticsearch as our storage and search engine.O uso recorrente de soluções baseadas em nuvem tornaram-se cada vez mais importantes
na nossa sociedade. Tais soluções fornecem infraestruturas, computação e armazenamento
como serviços, para alem do uso de logs volumosos de sistemas e aplicações para
análise e monitoramento operacional em sistemas crĂticos. Atividades de auditoria, debugging
de aplicações ou inspeção de eventos gerados por erros ou possĂveis operações
inesperadas - incluindo alertas por sistemas de detecção de intrusão - são situações comuns
onde logs extensos devem ser analisados com facilidade. Frequentemente, parte dos
logs gerados podem ser considerados confidenciais, exigindo uma solução que permite
manter a confidencialidades dos dados durante procuras.
Nesta dissertação, o principal objetivo é propor uma nova abordagem de armazenar
logs crĂticos num armazenamento elástico e escalável baseado na cloud. A solução proposta
suporta documentos JSON encriptados, fazendo uso de Searchable Encryption e
métodos de criptografia homomórfica com provas de integridade e autenticação. O objetivo
alcançado é um sistema de middleware unificado capaz de fornecer privacidade,
integridade e autenticidade, mantendo registos auditáveis do lado do servidor e permitindo
pesquisas pelo proprietário dos logs ou usuários autorizados. A solução proposta,
Chameleon, visa fornecer recursos de consulta atuando em cima de dados cifrados - incluindo
queries conjuntivas, de ordenação e booleanas - suportando pesquisas de campo
e agregações aninhadas. As operações suportadas permitem à nossa solução suportar data
analytics sobre documentos JSON cifrados, utilizando o Elasticsearch como armazenamento
e motor de busca
- …