471 research outputs found

    String pattern matching algorithms: An empirical analysis

    Get PDF

    The Boyer-Moore-Galil String Searching Strategies Revisited

    Get PDF

    Fast Exact String Pattern-matching Algorithms Adapted to the Characteristics of the Medical Language

    Get PDF
    Objective: The authors consider the problem of exact string pattern matching using algorithms that do not require any preprocessing. To choose the most appropriate algorithm, distinctive features of the medical language must be taken into account. The characteristics of medical language are emphasized in this regard, the best algorithm of those reviewed is proposed, and detailed evaluations of time complexity for processing medical texts are provided. Design: The authors first illustrate and discuss the techniques of various string pattern-matching algorithms. Next, the source code and the behavior of representative exact string pattern-matching algorithms are presented in a comprehensive manner to promote their implementation. Detailed explanations of the use of various techniques to improve performance are given. Measurements: Real-time measures of time complexity with English medical texts are presented. They lead to results distinct from those found in the computer science literature, which are typically computed with normally distributed texts. Results: The Boyer-Moore-Horspool algorithm achieves the best overall results when used with medical texts. This algorithm usually performs at least twice as fast as the other algorithms tested. Conclusion: The time performance of exact string pattern matching can be greatly improved if an efficient algorithm is used. Considering the growing amount of text handled in the electronic patient record, it is worth implementing this efficient algorith

    Measuring the Propagation of Information in Partial Evaluation

    Get PDF
    We present the first measurement-based analysis of the information propagated by a partial evaluator. Our analysis is based on measuring implementations of string-matching algorithms, based on the observation that the sequence of character comparisons accurately reflects maintained information. Notably, we can easily prove matchers to be different and we show that they display more variety and finesse than previously believed. As a consequence, we are able to pinpoint differences and inaccuracies in many results previously considered equivalent. Our analysis includes a framework that lets us obtain string matchers - notably the family of Boyer-Moore algorithms - in a systematic formalism-independent way from a few information-propagation primitives. By leveraging the existing research in string matching, we show that the landscape of information propagation is non-trivial in the sense that small changes in information propagation may dramatically change the properties of the resulting string matchers. We thus expect that this work will prove useful as a test and feedback mechanism for information propagation in the development of advanced program transformations, such as GPC or Supercompilation

    Generate fuzzy string-matching to build self attention on Indonesian medical-chatbot

    Get PDF
    Chatbot is a form of interactive conversation that requires quick and precise answers. The process of identifying answers to users’ questions involves string matching and handling incorrect spelling. Therefore, a system that can independently predict and correct letters is highly necessary. The approach used to address this issue is to enhance the fuzzy string-matching method by incorporating several features for self-attention. The combination of fuzzy string-matching methods employed includes Jaro Winkler distance + Levenshtein Damerau distance and Damerau Levenshtein + Rabin Carp. The reason for using this combination is their ability not only to match strings but also to correct word typing errors. This research contributes by developing a self-attention mechanism through a modified fuzzy string-matching model with enhanced word feature structures. The goal is to utilize this self-attention mechanism in constructing the Indonesian medical bidirectional encoder representations from transformers (IM-BERT). This will serve as a foundation for additional features to provide accurate answers in the Indonesian medical question and answer system, achieving an exact match of 85.7% and an F1-score of 87.6%

    Parallel String Matching with Multi Core Processors-A Comparative Study for Gene Sequences

    Get PDF
    The increase in huge amount of data is seen clearly in present days because of requirement for storing more information. To extract certain data from this large database is a very difficult task, including text processing, information retrieval, text mining, pattern recognition and DNA sequencing. So we need concurrent events and high performance computing models for extracting the data. This will create a challenge to the researchers. One of the solutions is parallel algorithms for string matching on computing models. In this we implemented parallel string matching with JAVA Multi threading with multi core processing, and performed a comparative study on Knuth Morris Pratt, Boyer Moore and Brute force string matching algorithms. For testing our system we take a gene sequence which consists of lacks of records. From the test results it is shown that the multicore processing is better compared to lower versions. Finally this proposed parallel string matching with multicore processing is better compared to other sequential approaches
    • …
    corecore