1,492 research outputs found

    Boyer-Moore strategy to efficient approximate string matching

    Get PDF
    International audienceWe propose a simple but e cient algorithm for searching all occurrences of a pattern or a class of patterns (length m) in a text (length n) with at most k mismatches. This algorithm relies on the Shift-Add algorithm of Baeza-Yates and Gonnet [6], which involves representing by a bit number the current state of the search and uses the ability of programming languages to handle bit words. State representation should not, therefore, exceeds the word size w, that is, m(⌈log2(k+1)⌉+1 )≤w. This algorithm consists in a preprocessing step and a searching step. It is linear and performs 3n operations during the searching step. Notions of shift and character skip found in the Boyer-Moore (BM) [9] approach, are introduced in this algorithm. Provided that the considered alphabet is large enough (compared to the Pattern length), the average number of operations performed by our algorithm during the searching step becomes n(2+(k+4)/(m-k))

    Overlapped Text Partition Algorithm for Pattern Matching on Hypercube Networked Model

    Get PDF
    The web has been continuously growing and getting hourglass shape. The indexed web is measured to contain at least 30 billion pages. It is no surprise that searching data poses serious challenges in terms of quality and speed. Another important subtask of the pattern discovery process is sting matching, where in which the pattern occurrence is already known and we need determine how often and where it is occurs in given text. The target of current research challenges and identified the new trends i.e distributed environment where in which the given text file is divided into subparts and distributed to N no. of processors organized in hypercube networked fashion .To improve the search speed and reduce the time complexity we need to run the string matching algorithms in parallel distributed environment called as hypercube networked model using RMI method. we considered both KV-KMP and KV-boyer-moore string matching algorithms for pattern matching in large text data bases using three data sets and graph's drawn for different patterns

    Comparison of search algorithms in Javanese-Indonesian dictionary application

    Get PDF
    This study aims to compare the performance of Boyer-Moore, Knuth morris pratt, and Horspool algorithms in searching for the meaning of words in the Java-Indonesian dictionary search application in terms of accuracy and processing time. Performance Testing is used to test the performance of algorithm implementations in applications. The test results show that the Boyer Moore and Knuth Morris Pratt algorithms have an accuracy rate of 100%, and the Horspool algorithm 85.3%. While the processing time, Knuth Morris Pratt algorithm has the highest average speed level of 25ms, Horspool 39.9 ms, while the average speed of the Boyer Moore algorithm is 44.2 ms. While the complexity test results, the Boyer Moore algorithm has an overall number of n 26n2, Knuth Morris Pratt and Horspool 20n2 each

    Fast Exact String Pattern-matching Algorithms Adapted to the Characteristics of the Medical Language

    Get PDF
    Objective: The authors consider the problem of exact string pattern matching using algorithms that do not require any preprocessing. To choose the most appropriate algorithm, distinctive features of the medical language must be taken into account. The characteristics of medical language are emphasized in this regard, the best algorithm of those reviewed is proposed, and detailed evaluations of time complexity for processing medical texts are provided. Design: The authors first illustrate and discuss the techniques of various string pattern-matching algorithms. Next, the source code and the behavior of representative exact string pattern-matching algorithms are presented in a comprehensive manner to promote their implementation. Detailed explanations of the use of various techniques to improve performance are given. Measurements: Real-time measures of time complexity with English medical texts are presented. They lead to results distinct from those found in the computer science literature, which are typically computed with normally distributed texts. Results: The Boyer-Moore-Horspool algorithm achieves the best overall results when used with medical texts. This algorithm usually performs at least twice as fast as the other algorithms tested. Conclusion: The time performance of exact string pattern matching can be greatly improved if an efficient algorithm is used. Considering the growing amount of text handled in the electronic patient record, it is worth implementing this efficient algorith

    String Matching Problems with Parallel Approaches An Evaluation for the Most Recent Studies

    Get PDF
    In recent years string matching plays a functional role in many application like information retrieval, gene analysis, pattern recognition, linguistics, bioinformatics etc. For understanding the functional requirements of string matching algorithms, we surveyed the real time parallel string matching patterns to handle the current trends. Primarily, in this paper, we focus on present developments of parallel string matching, and the central ideas of the algorithms and their complexities. We present the performance of the different algorithms and their effectiveness. Finally this analysis helps the researchers to develop the better techniques

    A Compact Index for Order-Preserving Pattern Matching

    Full text link
    Order-preserving pattern matching was introduced recently but it has already attracted much attention. Given a reference sequence and a pattern, we want to locate all substrings of the reference sequence whose elements have the same relative order as the pattern elements. For this problem we consider the offline version in which we build an index for the reference sequence so that subsequent searches can be completed very efficiently. We propose a space-efficient index that works well in practice despite its lack of good worst-case time bounds. Our solution is based on the new approach of decomposing the indexed sequence into an order component, containing ordering information, and a delta component, containing information on the absolute values. Experiments show that this approach is viable, faster than the available alternatives, and it is the first one offering simultaneously small space usage and fast retrieval.Comment: 16 pages. A preliminary version appeared in the Proc. IEEE Data Compression Conference, DCC 2017, Snowbird, UT, USA, 201
    • …
    corecore