122 research outputs found

    Designing optimal- and fast-on-average pattern matching algorithms

    Full text link
    Given a pattern ww and a text tt, the speed of a pattern matching algorithm over tt with regard to ww, is the ratio of the length of tt to the number of text accesses performed to search ww into tt. We first propose a general method for computing the limit of the expected speed of pattern matching algorithms, with regard to ww, over iid texts. Next, we show how to determine the greatest speed which can be achieved among a large class of algorithms, altogether with an algorithm running this speed. Since the complexity of this determination make it impossible to deal with patterns of length greater than 4, we propose a polynomial heuristic. Finally, our approaches are compared with 9 pre-existing pattern matching algorithms from both a theoretical and a practical point of view, i.e. both in terms of limit expected speed on iid texts, and in terms of observed average speed on real data. In all cases, the pre-existing algorithms are outperformed

    Comparison of search algorithms in Javanese-Indonesian dictionary application

    Get PDF
    This study aims to compare the performance of Boyer-Moore, Knuth morris pratt, and Horspool algorithms in searching for the meaning of words in the Java-Indonesian dictionary search application in terms of accuracy and processing time. Performance Testing is used to test the performance of algorithm implementations in applications. The test results show that the Boyer Moore and Knuth Morris Pratt algorithms have an accuracy rate of 100%, and the Horspool algorithm 85.3%. While the processing time, Knuth Morris Pratt algorithm has the highest average speed level of 25ms, Horspool 39.9 ms, while the average speed of the Boyer Moore algorithm is 44.2 ms. While the complexity test results, the Boyer Moore algorithm has an overall number of n 26n2, Knuth Morris Pratt and Horspool 20n2 each

    An Algorithm to Compute the Character Access Count Distribution for Pattern Matching Algorithms

    Get PDF
    We propose a framework for the exact probabilistic analysis of window-based pattern matching algorithms, such as Boyer--Moore, Horspool, Backward DAWG Matching, Backward Oracle Matching, and more. In particular, we develop an algorithm that efficiently computes the distribution of a pattern matching algorithm's running time cost (such as the number of text character accesses) for any given pattern in a random text model. Text models range from simple uniform models to higher-order Markov models or hidden Markov models (HMMs). Furthermore, we provide an algorithm to compute the exact distribution of \emph{differences} in running time cost of two pattern matching algorithms. Methodologically, we use extensions of finite automata which we call \emph{deterministic arithmetic automata} (DAAs) and \emph{probabilistic arithmetic automata} (PAAs)~\cite{Marschall2008}. Given an algorithm, a pattern, and a text model, a PAA is constructed from which the sought distributions can be derived using dynamic programming. To our knowledge, this is the first time that substring- or suffix-based pattern matching algorithms are analyzed exactly by computing the whole distribution of running time cost. Experimentally, we compare Horspool's algorithm, Backward DAWG Matching, and Backward Oracle Matching on prototypical patterns of short length and provide statistics on the size of minimal DAAs for these computations

    A Survey of Software-based String Matching Algorithms for Forensic Analysis

    Get PDF
    Employing a fast string matching algorithm is essential for minimizing the overhead of extracting structured files from a raw disk image. In this paper, we summarize the concept, implementation, and main features of ten software-based string matching algorithms, and evaluate their applicability for forensic analysis. We provide comparisons between the selected software-based string matching algorithms from the perspective of forensic analysis by conducting their performance evaluation for file carving. According to the experimental results, the Shift-Or algorithm (R. Baeza-Yates & Gonnet, 1992) and the Karp-Rabin algorithm (Karp & Rabin, 1987) have the minimized search time for identifying the locations of specified headers and footers in the target disk. Keywords: string matching algorithm, forensic analysis, file carving, Scalpel, data recover

    Performance Study of the Running Times of well known Pattern Matching Algorithms for Signature-based Intrusion Detection Systems

    Get PDF
    Intrusion detection system (IDS) is the basic component of any network defense scheme. Signature based intrusion detection techniques are widely used in networks for fast response to detect threats. One of the main challenges faced by signature-based IDS is that every signature requires an entry in the database, and so a complete database might contain hundreds or even thousands of entries. Each packet is to be compared with all the entries in the database. This can be highly resource-consuming and doing so will slow down the throughput and making the IDS vulnerable. Since pattern matching computations dominate in the overall performance of a Signature-based IDS, efficient pattern matching algorithms should be used which use minimal computer storage and which minimize the searching response time. In this paper we present a performance study of the running times of different well known pattern matching algorithms using multiple sliding windows approach. DOI: 10.17762/ijritcc2321-8169.150613

    Advanced Searching Algorithms and its Behavior on Text Structures

    Get PDF
    This research investigates the behavior of the Boyer-Moore-Horspool (BMH) and the Boyer-Moore-Raita (BMR) string-matching algorithms using multilingual texts. The performance is computed based on searching for patterns in master strings. Experiments are conducted using a number of pattern lengths with many experiments repetition. The experimental results show that on average the number of comparisons per character passed in the case of the BMR is less than the number encountered by the BMH variant. The improvement is due to properties of the text structures. These experiments may lead to more theoretical and practical studies to develop new variants of algorithms. Using multilingual text structures provide more insight into the theory and structure of algorithms as multilingual text structures have different set of characters and dependencies, and the character properties have different type of structures. Since many applications of today depend on searching algorithms, therefore researchers need to explore every possibility that lead to improving the efficiency of searching and matching mechanisms. The time performance of exact string pattern matching can be greatly improved if an efficient algorithm is used. Considering, for example, the growing amount of text handled in the electronic patient records, it is worth and essential, in these cases and others, to searching for an efficient algorithm to deal with such huge items of information. Keywords: Matching, Boyer-Moore, Raita algorithm, Searching, multilingua
    • …
    corecore