Search CORE

3,971 research outputs found

Content-Aware DataGuides for Indexing Large Collections of XML Documents

Author: Bry François
Meuss Holger
Schulz Klaus U.
Weigel Felix
Publication venue
Publication date: 01/01/2003
Field of study

XML is well-suited for modelling structured data with textual content. However, most indexing approaches perform structure and content matching independently, combining the retrieved path and keyword occurrences in a third step. This paper shows that retrieval in XML documents can be accelerated significantly by processing text and structure simultaneously during all retrieval phases. To this end, the Content-Aware DataGuide (CADG) enhances the wellknown DataGuide with (1) simultaneous keyword and path matching and (2) a precomputed content/structure join. Extensive experiments prove the CADG to be 50-90% faster than the DataGuide for various sorts of query and document, including difficult cases such as poorly structured queries and recursive document paths. A new query classification scheme identifies precise query characteristics with a predominant influence on the performance of the individual indices. The experiments show that the CADG is applicable to many real-world applications, in particular large collections of heterogeneously structured XML documents

CiteSeerX

Trade-offs in Private Search

Author: Bellovin Steven Michael
Malkin Tal G.
Pappas Vasileios
Raykova Mariana Petrova
Vo Binh D.
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2010
Field of study

Encrypted search -- performing queries on protected data -- is a well researched problem. However, existing solutions have inherent inefficiency that raises questions of practicality. Here, we step back from the goal of achieving maximal privacy guarantees in an encrypted search scenario to consider efficiency as a priority. We propose a privacy framework for search that allows tuning and optimization of the trade-offs between privacy and efficiency. As an instantiation of the privacy framework we introduce a tunable search system based on the SADS scheme and provide detailed measurements demonstrating the trade-offs of the constructed system. We also analyze other existing encrypted search schemes with respect to this framework. We further propose a protocol that addresses the challenge of document content retrieval in a search setting with relaxed privacy requirements

Accessibility in User Reviews for Mobile Apps: An Automated Detection Approach

Author: Tamjeed Murtaza
Publication venue: RIT Scholar Works
Publication date: 01/08/2020
Field of study

In recent years, mobile accessibility has become an important trend with the goal of allowing all users the possibility of using any app without many restrictions. Recent work demonstrated that user reviews include insights that are useful for app evolution. However, with the increase in the amount of received reviews, manually analyzing them is tedious and time-consuming, especially when searching for accessibility reviews. The goal of this thesis is to support the automated identification of accessibility in user reviews, to help practitioners in prioritizing their handling, and thus, creating more inclusive apps. Particularly, we design a model that takes as input accessibility user reviews, learns their keyword-based features, in order to make a binary decision, for a given review, on whether it is about accessibility or not. The model is evaluated using a total of 5326 mobile app reviews. The findings show that (1) our approach can accurately identify accessibility reviews, outperforming two baselines, namely keyword-based detector and a random classifier; (2) our model achieves F1-measure of 90.7\% with relatively small training dataset; however, F1-measure value improves as we add to the training dataset

RIT Scholar Works

Bloom Filters Optimized Wu-Manber for Intrusion Detection

Author: Al-Khamaiseh Koloud
Aldwairi Monther
Alharbi Fatima
Shah Babar
Publication venue: (Print) 1558-7215
Publication date: 01/01/2016
Field of study

With increasing number and severity of attacks, monitoring ingress and egress network traffic is becoming essential everyday task. Intrusion detection systems are the main tools for capturing and searching network traffic for potential harm. Signature-based intrusion detection systems are the most widely used, and they simply use a pattern matching algorithms to locate attack signatures in intercepted network traffic. Pattern matching algorithms are very expensive in terms of running time and memory usage, leaving intrusion detection systems unable to detect attacks in real-time. We propose a Bloom filters optimized Wu-Manber pattern matching algorithm to speed up intrusion detection. The Bloom filter programs the hash table into a vector, which is quickly queried to exclude unnecessary searches. On average hash table searches are avoided 10.6% of the time. The proposed algorithm achieves a best-case speedup of 66% and worst-case speedup of 33% over Wu-Manber at the cost of 0.33% memory usage increase

Embry-Riddle Aeronautical University

Bloom Filters Optimized Wu-Manber for Intrusion Detection

Author: Al-Khamaiseh Koloud
Aldwairi Monther
Alharbi Fatima
Shah Babar
Publication venue: ZU Scholars
Publication date: 01/01/2016
Field of study

With increasing number and severity of attacks, monitoring ingress and egress network traffic is becoming essential everyday task. Intrusion detection systems are the main tools for capturing and searching network traffic for potential harm. Signature -based intrusion detection systems are the most widely used, and they simply use a pattern matching algorithms to locate attack signatures in intercepted network traffic. Pattern matching algorithms are very expensive in terms of running time and memory usage, leaving intrusion detection systems unable to detect attacks in real-time. We propose a Bloom filters optimized Wu-Manber pattern matching algorithm to speed up intrusion detection. The Bloom filter programs the hash table into a vector, which is quickly queried to exclude unnecessary searches. On average hash table searches are avoided 10.6% of the time. The proposed algorithm achieves a best -case speedup of 66% and worst -case speedup of 33% over Wu-Manber at the cost of 0.33% memory usage increase

A New Multi-threaded and Interleaving Approach to Enhance String Matching for Intrusion Detection Systems

Author: AlHajouj Bushra
Jarrah Moath
Shatnawi Ali M
Publication venue: 'Auricle Technologies, Pvt., Ltd.'
Publication date: 17/04/2022
Field of study

String matching algorithms are computationally intensive operations in computer science. The algorithms find the occurrences of one or more strings patterns in a larger string or text. String matching algorithms are important for network security, biomedical applications, Web search, and social networks. Nowadays, the high network speeds and large storage capacity put a high requirement on string matching methods to perform the task in a short time. Traditionally, Aho-Corasick algorithm, which is used to find the string matches, is executed sequentially. In this paper, a new multi-threaded and interleaving approach of Aho-Corasick using graphics processing units (GPUs) is designed and implemented to achieve high-speed string matching. Compute Unified Device Architecture (CUDA) programming language is used to implement the proposed parallel version. Experimental results show that our approach achieves more than 5X speedup over the sequential and other parallel implementations. Hence, a wide range of applications can benefit from our solution to perform string matching faster than ever before

International Journal of Communication Networks and Information Security (IJCNIS)