4,169 research outputs found
Proximity Full-Text Search by Means of Additional Indexes with Multi-component Keys: In Pursuit of Optimal Performance
Full-text search engines are important tools for information retrieval. In a
proximity full-text search, a document is relevant if it contains query terms
near each other, especially if the query terms are frequently occurring words.
For each word in a text, we use additional indexes to store information about
nearby words that are at distances from the given word of less than or equal to
the MaxDistance parameter. We showed that additional indexes with
three-component keys can be used to improve the average query execution time by
up to 94.7 times if the queries consist of high-frequency occurring words. In
this paper, we present a new search algorithm with even more performance gains.
We consider several strategies for selecting multi-component key indexes for a
specific query and compare these strategies with the optimal strategy. We also
present the results of search experiments, which show that three-component key
indexes enable much faster searches in comparison with two-component key
indexes.
This is a pre-print of a contribution "Veretennikov A.B. (2019) Proximity
Full-Text Search by Means of Additional Indexes with Multi-component Keys: In
Pursuit of Optimal Performance." published in "Manolopoulos Y., Stupnikov S.
(eds) Data Analytics and Management in Data Intensive Domains. DAMDID/RCDL
2018. Communications in Computer and Information Science, vol 1003" published
by Springer, Cham. This book constitutes the refereed proceedings of the 20th
International Conference on Data Analytics and Management in Data Intensive
Domains, DAMDID/RCDL 2018, held in Moscow, Russia, in October 2018. The 9
revised full papers presented together with three invited papers were carefully
reviewed and selected from 54 submissions. The final authenticated version is
available online at https://doi.org/10.1007/978-3-030-23584-0_7.Comment: Revised paper of "Veretennikov A.B. Proximity full-text search with a
response time guarantee by means of additional indexes with multi-component
keys", Selected Papers of the XX International Conference on Data Analytics
and Management in Data Intensive Domains (DAMDID/RCDL 2018), Moscow, Russia,
October 9-12, 2018, http://ceur-ws.org/Vol-2277,
http://ceur-ws.org/Vol-2277/paper23.pd
An Improved Algorithm for Fast K-Word Proximity Search Based on Multi-Component Key Indexes
A search query consists of several words. In a proximity full-text search, we
want to find documents that contain these words near each other. This task
requires much time when the query consists of high-frequently occurring words.
If we cannot avoid this task by excluding high-frequently occurring words from
consideration by declaring them as stop words, then we can optimize our
solution by introducing additional indexes for faster execution. In a previous
work, we discussed how to decrease the search time with multi-component key
indexes. We had shown that additional indexes can be used to improve the
average query execution time up to 130 times if queries consisted of
high-frequently occurring words. In this paper, we present another search
algorithm that overcomes some limitations of our previous algorithm and
provides even more performance gain.
This is a pre-print of a contribution published in Arai K., Kapoor S., Bhatia
R. (eds) Intelligent Systems and Applications. IntelliSys 2020. Advances in
Intelligent Systems and Computing, vol 1251, published by Springer, Cham. The
final authenticated version is available online at:
https://doi.org/10.1007/978-3-030-55187-2_3
An Improved Algorithm for Fast K-Word Proximity Search Based on Multi-Component Key Indexes
A search query consists of several words. In a proximity full-text search, we want to find documents that contain these words near each other. This task requires much time when the query consists of high-frequently occurring words. If we cannot avoid this task by excluding high-frequently occurring words from consideration by declaring them as stop words, then we can optimize our solution by introducing additional indexes for faster execution. In a previous work, we discussed how to decrease the search time with multi-component key indexes. We had shown that additional indexes can be used to improve the average query execution time up to 130 times if queries consisted of high-frequently occurring words. In this paper, we present another search algorithm that overcomes some limitations of our previous algorithm and provides even more performance gain. © 2021, Springer Nature Switzerland AG
Ранжирование документов при полнотекстовом поиске с учетом расстояния с использованием индексов с многокомпонентными ключами
The problem of proximity full-text search is considered. If a search query contains high-frequently occurring words, then multi-component key indexes deliver improvement of the search speed in comparison with ordinary inverted indexes. It was shown that we can increase the search speed up to 130 times in cases when queries consist of high-frequently occurring words. In this paper, we are investigating how the multi-component key indexes architecture affects the quality of the search. We consider several well-known methods of relevance ranking; these methods are of different authors. Using these methods we perform the search in the ordinary inverted index and then in the index that is enhanced with multi-component key indexes. The results show that with multi-component key indexes we obtain search results that are very near in terms of relevance ranking to the search results that are obtained by means of ordinary inverted indexes. © 2021 Udmurt State University. All rights reserved
A Practical Framework for Storing and Searching Encrypted Data on Cloud Storage
Security has become a significant concern with the increased popularity of
cloud storage services. It comes with the vulnerability of being accessed by
third parties. Security is one of the major hurdles in the cloud server for the
user when the user data that reside in local storage is outsourced to the
cloud. It has given rise to security concerns involved in data confidentiality
even after the deletion of data from cloud storage. Though, it raises a serious
problem when the encrypted data needs to be shared with more people than the
data owner initially designated. However, searching on encrypted data is a
fundamental issue in cloud storage. The method of searching over encrypted data
represents a significant challenge in the cloud.
Searchable encryption allows a cloud server to conduct a search over
encrypted data on behalf of the data users without learning the underlying
plaintexts. While many academic SE schemes show provable security, they usually
expose some query information, making them less practical, weak in usability,
and challenging to deploy. Also, sharing encrypted data with other authorized
users must provide each document's secret key. However, this way has many
limitations due to the difficulty of key management and distribution.
We have designed the system using the existing cryptographic approaches,
ensuring the search on encrypted data over the cloud. The primary focus of our
proposed model is to ensure user privacy and security through a less
computationally intensive, user-friendly system with a trusted third party
entity. To demonstrate our proposed model, we have implemented a web
application called CryptoSearch as an overlay system on top of a well-known
cloud storage domain. It exhibits secure search on encrypted data with no
compromise to the user-friendliness and the scheme's functional performance in
real-world applications.Comment: 146 Pages, Master's Thesis, 6 Chapters, 96 Figures, 11 Table
Chameleon: A Secure Cloud-Enabled and Queryable System with Elastic Properties
There are two dominant themes that have become increasingly more important in our
technological society. First, the recurrent use of cloud-based solutions which provide
infrastructures, computation platforms and storage as services. Secondly, the use of applicational
large logs for analytics and operational monitoring in critical systems. Moreover,
auditing activities, debugging of applications and inspection of events generated by errors
or potential unexpected operations - including those generated as alerts by intrusion
detection systems - are common situations where extensive logs must be analyzed, and
easy access is required. More often than not, a part of the generated logs can be deemed
as sensitive, requiring a privacy-enhancing and queryable solution.
In this dissertation, our main goal is to propose a novel approach of storing encrypted
critical data in an elastic and scalable cloud-based storage, focusing on handling JSONbased
ciphered documents. To this end, we make use of Searchable and Homomorphic
Encryption methods to allow operations on the ciphered documents. Additionally, our
solution allows for the user to be near oblivious to our system’s internals, providing
transparency while in use. The achieved end goal is a unified middleware system capable
of providing improved system usability, privacy, and rich querying over the data. This
previously mentioned objective is addressed while maintaining server-side auditable logs,
allowing for searchable capabilities by the log owner or authorized users, with integrity
and authenticity proofs.
Our proposed solution, named Chameleon, provides rich querying facilities on ciphered
data - including conjunctive keyword, ordering correlation and boolean queries
- while supporting field searching and nested aggregations. The aforementioned operations
allow our solution to provide data analytics upon ciphered JSON documents, using
Elasticsearch as our storage and search engine.O uso recorrente de soluções baseadas em nuvem tornaram-se cada vez mais importantes
na nossa sociedade. Tais soluções fornecem infraestruturas, computação e armazenamento
como serviços, para alem do uso de logs volumosos de sistemas e aplicações para
análise e monitoramento operacional em sistemas críticos. Atividades de auditoria, debugging
de aplicações ou inspeção de eventos gerados por erros ou possíveis operações
inesperadas - incluindo alertas por sistemas de detecção de intrusão - são situações comuns
onde logs extensos devem ser analisados com facilidade. Frequentemente, parte dos
logs gerados podem ser considerados confidenciais, exigindo uma solução que permite
manter a confidencialidades dos dados durante procuras.
Nesta dissertação, o principal objetivo é propor uma nova abordagem de armazenar
logs críticos num armazenamento elástico e escalável baseado na cloud. A solução proposta
suporta documentos JSON encriptados, fazendo uso de Searchable Encryption e
métodos de criptografia homomórfica com provas de integridade e autenticação. O objetivo
alcançado é um sistema de middleware unificado capaz de fornecer privacidade,
integridade e autenticidade, mantendo registos auditáveis do lado do servidor e permitindo
pesquisas pelo proprietário dos logs ou usuários autorizados. A solução proposta,
Chameleon, visa fornecer recursos de consulta atuando em cima de dados cifrados - incluindo
queries conjuntivas, de ordenação e booleanas - suportando pesquisas de campo
e agregações aninhadas. As operações suportadas permitem à nossa solução suportar data
analytics sobre documentos JSON cifrados, utilizando o Elasticsearch como armazenamento
e motor de busca
Multi-Stage Search Architectures for Streaming Documents
The web is becoming more dynamic due to the increasing engagement and contribution of Internet users in the age of social media. A more dynamic web presents new challenges for web search--an important application of Information Retrieval (IR). A stream of new documents constantly flows into the web at a high rate, adding to the old content. In many cases, documents quickly lose their relevance. In these time-sensitive environments, finding relevant content in response to user queries requires a real-time search service; immediate availability of content for search and a fast ranking, which requires an optimized search architecture. These aspects of today's web are at odds with how academic IR researchers have traditionally viewed the web, as a collection of static documents. Moreover, search architectures have received little attention in the IR literature. Therefore, academic IR research, for the most part, does not provide a mechanism to efficiently handle a high-velocity stream of documents, nor does it facilitate real-time ranking.
This dissertation addresses the aforementioned shortcomings. We present an efficient mech- anism to index a stream of documents, thereby enabling immediate availability of content. Our indexer works entirely in main memory and provides a mechanism to control inverted list con- tiguity, thereby enabling faster retrieval. Additionally, we consider document ranking with a machine-learned model, dubbed "Learning to Rank" (LTR), and introduce a novel multi-stage search architecture that enables fast retrieval and allows for more design flexibility. The stages of our architecture include candidate generation (top k retrieval), feature extraction, and docu- ment re-ranking. We compare this architecture with a traditional monolithic architecture where candidate generation and feature extraction occur together. As we lay out our architecture, we present optimizations to each stage to facilitate low-latency ranking. These optimizations include a fast approximate top k retrieval algorithm, document vectors for feature extraction, architecture- conscious implementations of tree ensembles for LTR using predication and vectorization, and algorithms to train tree-based LTR models that are fast to evaluate. We also study the efficiency- effectiveness tradeoffs of these techniques, and empirically evaluate our end-to-end architecture on microblog document collections. We show that our techniques improve efficiency without degrading quality
Secure Remote Storage of Logs with Search Capabilities
Dissertação de Mestrado em Engenharia InformáticaAlong side with the use of cloud-based services, infrastructure and storage, the use of application logs
in business critical applications is a standard practice nowadays. Such application logs must be stored
in an accessible manner in order to used whenever needed. The debugging of these applications is a
common situation where such access is required. Frequently, part of the information contained in logs
records is sensitive.
This work proposes a new approach of storing critical logs in a cloud-based storage recurring to
searchable encryption, inverted indexing and hash chaining techniques to achieve, in a unified way, the
needed privacy, integrity and authenticity while maintaining server side searching capabilities by the logs
owner.
The designed search algorithm enables conjunctive keywords queries plus a fine-grained search
supported by field searching and nested queries, which are essential in the referred use case. To the
best of our knowledge, the proposed solution is also the first to introduce a query language that enables
complex conjunctive keywords and a fine-grained search backed by field searching and sub queries.A gerac¸ ˜ao de logs em aplicac¸ ˜oes e a sua posterior consulta s˜ao fulcrais para o funcionamento de qualquer
neg´ocio ou empresa. Estes logs podem ser usados para eventuais ac¸ ˜oes de auditoria, uma vez
que estabelecem uma baseline das operac¸ ˜oes realizadas. Servem igualmente o prop´ osito de identificar
erros, facilitar ac¸ ˜oes de debugging e diagnosticar bottlennecks de performance. Tipicamente, a maioria
da informac¸ ˜ao contida nesses logs ´e considerada sens´ıvel.
Quando estes logs s˜ao armazenados in-house, as considerac¸ ˜oes relacionadas com anonimizac¸ ˜ao,
confidencialidade e integridade s˜ao geralmente descartadas. Contudo, com o advento das plataformas
cloud e a transic¸ ˜ao quer das aplicac¸ ˜oes quer dos seus logs para estes ecossistemas, processos de
logging remotos, seguros e confidenciais surgem como um novo desafio. Adicionalmente, regulac¸ ˜ao
como a RGPD, imp˜oe que as instituic¸ ˜oes e empresas garantam o armazenamento seguro dos dados.
A forma mais comum de garantir a confidencialidade consiste na utilizac¸ ˜ao de t ´ecnicas criptogr ´aficas
para cifrar a totalidade dos dados anteriormente `a sua transfer ˆencia para o servidor remoto. Caso sejam
necess´ arias capacidades de pesquisa, a abordagem mais simples ´e a transfer ˆencia de todos os dados
cifrados para o lado do cliente, que proceder´a `a sua decifra e pesquisa sobre os dados decifrados.
Embora esta abordagem garanta a confidencialidade e privacidade dos dados, rapidamente se torna
impratic ´avel com o crescimento normal dos registos de log. Adicionalmente, esta abordagem n˜ao faz
uso do potencial total que a cloud tem para oferecer.
Com base nesta tem´ atica, esta tese prop˜oe o desenvolvimento de uma soluc¸ ˜ao de armazenamento
de logs operacionais de forma confidencial, integra e autˆ entica, fazendo uso das capacidades de armazenamento
e computac¸ ˜ao das plataformas cloud. Adicionalmente, a possibilidade de pesquisa sobre
os dados ´e mantida. Essa pesquisa ´e realizada server-side diretamente sobre os dados cifrados e sem
acesso em momento algum a dados n˜ao cifrados por parte do servidor..
Security and Privacy in Mobile Computing: Challenges and Solutions
abstract: Mobile devices are penetrating everyday life. According to a recent Cisco report [10], the number of mobile connected devices such as smartphones, tablets, laptops, eReaders, and Machine-to-Machine (M2M) modules will hit 11.6 billion by 2021, exceeding the world's projected population at that time (7.8 billion). The rapid development of mobile devices has brought a number of emerging security and privacy issues in mobile computing. This dissertation aims to address a number of challenging security and privacy issues in mobile computing.
This dissertation makes fivefold contributions. The first and second parts study the security and privacy issues in Device-to-Device communications. Specifically, the first part develops a novel scheme to enable a new way of trust relationship called spatiotemporal matching in a privacy-preserving and efficient fashion. To enhance the secure communication among mobile users, the second part proposes a game-theoretical framework to stimulate the cooperative shared secret key generation among mobile users. The third and fourth parts investigate the security and privacy issues in mobile crowdsourcing. In particular, the third part presents a secure and privacy-preserving mobile crowdsourcing system which strikes a good balance among object security, user privacy, and system efficiency. The fourth part demonstrates a differentially private distributed stream monitoring system via mobile crowdsourcing. Finally, the fifth part proposes VISIBLE, a novel video-assisted keystroke inference framework that allows an attacker to infer a tablet user's typed inputs on the touchscreen by recording and analyzing the video of the tablet backside during the user's input process. Besides, some potential countermeasures to this attack are also discussed. This dissertation sheds the light on the state-of-the-art security and privacy issues in mobile computing.Dissertation/ThesisDoctoral Dissertation Electrical Engineering 201
- …