5 research outputs found

    Sobre el número máximo de factores frecuentes distintos en una cadena de símbolos

    Get PDF
    Las cadenas de sımbolos, como fuente de informacion, siempre han sido un recurso del que poder extraer conocimiento y, actualmente, el numero de aplicaciones y casos reales que las usan sigue creciendo, de forma que avances en este ambito repercutiran en multiples disciplinas. En esta comunicacion se estudia la complejidad del problema de descubrir factores (subcadenas) frecuentes en cadenas de sımbolos de longitud n, añadiendo la caracterıstica de que dicha busqueda pueda estar dirigida por un soporte (frecuencia) k mınimo que deben alcanzar dichos factores. Se analiza como afecta este resultado a algoritmos conocidos para este problema y se calcula de manera efectiva el numero maximo de factores k-frecuentes en una cadena. Se llega a demostrar que, aunque la complejidad en general es cuadratica en la longitud n de la cadena, si el soporte k es al menosraiz(n), la complejidad es lineal en n. Ese soporte es suficientemente interesante.Universidad de Málaga. Campus de Excelencia Internacional Andalucía Tech. Este trabajo ha sido parcialmente financiado por el I Plan Propio de Investigacion y Transferencia de la Universidad de Malaga

    A Survey of String Matching Algorithms

    Get PDF
    ABSTRACT The concept of string matching algorithms are playing an important role of string algorithms in finding a place where one or several strings (patterns) are found in a large body of text (e.g., data streaming, a sentence, a paragraph, a book, etc.). Its application covers a wide range, including intrusion detection Systems (IDS) in computer networks, applications in bioinformatics, detecting plagiarism, information security, pattern recognition, document matching and text mining. In this paper we present a short survey for well-known and recent updated and hybrid string matching algorithms. These algorithms can be divided into two major categories, known as exact string matching and approximate string matching. The string matching classification criteria was selected to highlight important features of matching strategies, in order to identify challenges and vulnerabilities

    Exact string matching algorithms for searching DNA and protein sequences and searching chemical databases

    Get PDF
    The enormous quantities of biological and chemical files and databases are likely to grow year on year, consequently giving rise to the need to develop string-matching algorithms capable of minimizing the searching response time. Being aware of this need, this thesis aims to develop string matching algorithms to search biological sequences and chemical structures by studying exact string matching algorithms in detail. As a result, this research developed a new classification of string matching algorithms containing eight categories according to the pre-processing function of algorithms and proposed five new string matching algorithms; BRBMH, BRQS, Odd and Even algorithm (OE), Random String Matching algorithm (RSMA) and Skip Shift New algorithm (SSN). The main purpose behind the proposed algorithms is to reduce the searching response time and the total number of comparisons. They are tested by comparing them with four well- known standard algorithms, Boyer Moore Horspool (BMH), Quick Search (QS), TVSBS and BRFS. This research applied all of the algorithms to sample data files by implementing three types of tests. The number of comparison tests showed a substantial difference in the number of comparisons our algorithms use compared to the non-hybrid algorithms such as QS and BMH. In addition, the tests showed considerable difference between our algorithms and other hybrid algorithm such as TVSBS and BRFS. For instance, the average elapsed search time tests showed that our algorithms presented better average elapsed search time than the BRFS, TVSBS, QS and BMH algorithms, while the average number of tests showed better number of attempts compared to BMH, QS, TVSBS and BRFS algorithms. A new contribution has been added by this research by using the fastest proposed algorithm, the SSN algorithm, to develop a chemical structure searching toolkit to search chemical structures in our local database. The new algorithms were paralleled using OpenMP and MPI parallel models and tested at the University of Science Malaysia (USM) on a Stealth Cluster with different number of threads and processors to improve the speed of searching pattern in the given text which, as we believe, is another contribution
    corecore