3,971 research outputs found
Content-Aware DataGuides for Indexing Large Collections of XML Documents
XML is well-suited for modelling structured data with
textual content. However, most indexing approaches perform
structure and content matching independently, combining
the retrieved path and keyword occurrences in a third
step. This paper shows that retrieval in XML documents can
be accelerated significantly by processing text and structure
simultaneously during all retrieval phases. To this end,
the Content-Aware DataGuide (CADG) enhances the wellknown
DataGuide with (1) simultaneous keyword and path
matching and (2) a precomputed content/structure join. Extensive
experiments prove the CADG to be 50-90% faster
than the DataGuide for various sorts of query and document,
including difficult cases such as poorly structured
queries and recursive document paths. A new query classification
scheme identifies precise query characteristics with
a predominant influence on the performance of the individual
indices. The experiments show that the CADG is applicable
to many real-world applications, in particular large
collections of heterogeneously structured XML documents
Trade-offs in Private Search
Encrypted search -- performing queries on protected data -- is a well researched problem. However, existing solutions have inherent inefficiency that raises questions of practicality. Here, we step back from the goal of achieving maximal privacy guarantees in an encrypted search scenario to consider efficiency as a priority. We propose a privacy framework for search that allows tuning and optimization of the trade-offs between privacy and efficiency. As an instantiation of the privacy framework we introduce a tunable search system based on the SADS scheme and provide detailed measurements demonstrating the trade-offs of the constructed system. We also analyze other existing encrypted search schemes with respect to this framework. We further propose a protocol that addresses the challenge of document content retrieval in a search setting with relaxed privacy requirements
Accessibility in User Reviews for Mobile Apps: An Automated Detection Approach
In recent years, mobile accessibility has become an important trend with the goal of allowing all users the possibility of using any app without many restrictions. Recent work demonstrated that user reviews include insights that are useful for app evolution. However, with the increase in the amount of received reviews, manually analyzing them is tedious and time-consuming, especially when searching for accessibility reviews. The goal of this thesis is to support the automated identification of accessibility in user reviews, to help practitioners in prioritizing their handling, and thus, creating more inclusive apps. Particularly, we design a model that takes as input accessibility user reviews, learns their keyword-based features, in order to make a binary decision, for a given review, on whether it is about accessibility or not. The model is evaluated using a total of 5326 mobile app reviews. The findings show that (1) our approach can accurately identify accessibility reviews, outperforming two baselines, namely keyword-based detector and a random classifier; (2) our model achieves F1-measure of 90.7\% with relatively small training dataset; however, F1-measure value improves as we add to the training dataset
Bloom Filters Optimized Wu-Manber for Intrusion Detection
With increasing number and severity of attacks, monitoring ingress and egress network traffic is becoming essential everyday task. Intrusion detection systems are the main tools for capturing and searching network traffic for potential harm. Signature-based intrusion detection systems are the most widely used, and they simply use a pattern matching algorithms to locate attack signatures in intercepted network traffic. Pattern matching algorithms are very expensive in terms of running time and memory usage, leaving intrusion detection systems unable to detect attacks in real-time. We propose a Bloom filters optimized Wu-Manber pattern matching algorithm to speed up intrusion detection. The Bloom filter programs the hash table into a vector, which is quickly queried to exclude unnecessary searches. On average hash table searches are avoided 10.6% of the time. The proposed algorithm achieves a best-case speedup of 66% and worst-case speedup of 33% over Wu-Manber at the cost of 0.33% memory usage increase
Bloom Filters Optimized Wu-Manber for Intrusion Detection
With increasing number and severity of attacks, monitoring ingress and egress network traffic is becoming essential everyday task. Intrusion detection systems are the main tools for capturing and searching network traffic for potential harm. Signature -based intrusion detection systems are the most widely used, and they simply use a pattern matching algorithms to locate attack signatures in intercepted network traffic. Pattern matching algorithms are very expensive in terms of running time and memory usage, leaving intrusion detection systems unable to detect attacks in real-time. We propose a Bloom filters optimized Wu-Manber pattern matching algorithm to speed up intrusion detection. The Bloom filter programs the hash table into a vector, which is quickly queried to exclude unnecessary searches. On average hash table searches are avoided 10.6% of the time. The proposed algorithm achieves a best -case speedup of 66% and worst -case speedup of 33% over Wu-Manber at the cost of 0.33% memory usage increase
A New Multi-threaded and Interleaving Approach to Enhance String Matching for Intrusion Detection Systems
String matching algorithms are computationally intensive operations in computer science. The algorithms find the occurrences of one or more strings patterns in a larger string or text. String matching algorithms are important for network security, biomedical applications, Web search, and social networks. Nowadays, the high network speeds and large storage capacity put a high requirement on string matching methods to perform the task in a short time. Traditionally, Aho-Corasick algorithm, which is used to find the string matches, is executed sequentially. In this paper, a new multi-threaded and interleaving approach of Aho-Corasick using graphics processing units (GPUs) is designed and implemented to achieve high-speed string matching. Compute Unified Device Architecture (CUDA) programming language is used to implement the proposed parallel version. Experimental results show that our approach achieves more than 5X speedup over the sequential and other parallel implementations. Hence, a wide range of applications can benefit from our solution to perform string matching faster than ever before
- …