6 research outputs found
Exact string matching algorithms for searching DNA and protein sequences and searching chemical databases
The enormous quantities of biological and chemical files and databases are likely to grow year on year, consequently giving rise to the need to develop string-matching algorithms capable of minimizing the searching response time. Being aware of this need, this thesis aims to develop string matching algorithms to search biological sequences and chemical structures by studying exact string matching algorithms in detail. As a result, this research developed a new classification of string matching algorithms containing eight categories according to the pre-processing function of algorithms and proposed five new string matching algorithms; BRBMH, BRQS, Odd and Even algorithm (OE), Random String Matching algorithm (RSMA) and Skip Shift New algorithm (SSN).
The main purpose behind the proposed algorithms is to reduce the searching response time and the total number of comparisons. They are tested by comparing them with four well- known standard algorithms, Boyer Moore Horspool (BMH), Quick Search (QS), TVSBS and BRFS.
This research applied all of the algorithms to sample data files by implementing three types of tests. The number of comparison tests showed a substantial difference in the number of comparisons our algorithms use compared to the non-hybrid algorithms such as QS and BMH. In addition, the tests showed considerable difference between our algorithms and other hybrid algorithm such as TVSBS and BRFS. For instance, the average elapsed search time tests showed that our algorithms presented better average elapsed search time than the BRFS, TVSBS, QS and BMH algorithms, while the average number of tests showed better number of attempts compared to BMH, QS, TVSBS and BRFS algorithms. A new contribution has been added by this research by using the fastest proposed algorithm, the SSN algorithm, to develop a chemical structure searching toolkit to search chemical structures in our local database. The new algorithms were paralleled using OpenMP and MPI parallel models and tested at the University of Science Malaysia (USM) on a Stealth Cluster with different number of threads and processors to improve the speed of searching pattern in the given text which, as we believe, is another contribution
PARALLEL PROCESSING OUTCOMES OF E-ABDULRAZZAQ ALGORITHM USING MULTI-CORE TECHNIQUE
The string matching problem is considered one of the substantial problems in the fields of computer science like speech and pattern recognition, signal and image processing, and artificial intelligence (AI). The increase in the speedup of performance is considered an important factor in meeting the growth rate of databases, Subsequently, one of the determinations to address this issue is the parallelization for exact string matching algorithms. In this study, the E-Abdulrazzaq string matching algorithm is chosen to be executed with the multi-core environment utilizing the OpenMP paradigm which can be utilized to decrease the execution time and increase the speedup of the algorithm. The parallelization algorithm got positive results within the parallel execution time, and excellent speeding-up capabilities, in comparison to the successive result. The Protein database showed optimal results in parallel execution time, and when utilizing short and long pattern lengths. The DNA database showed optimal speedup execution when utilizing short and long pattern lengths, while no specific database obtained the worst results
Wavefront Longest Common Subsequence Algorithm On Multicore And Gpgpu Platform.
String comparison is a central operation in numerous applications. It has a critical task in many operations such as data mining, spelling error correction and molecular biology (Tan et al, 2007; Michailidis and Margaritis, 2000)
Revisiting Multiple Pattern Matching
We consider the classical exact multiple string matching problem. The proposed solution is based on a combination of a few ideas: using q-grams instead of single characters, pattern superimposition, bit-parallelism and alphabet size reduction. We discuss the pros and cons of various alternatives to achieve the possibly best combination of techniques. The main contribution of this paper are different alphabet mapping methods that allow to reduce memory requirements and use larger q-grams. The experimental results show that the presented algorithm is competitive in most practical cases. One of the tests shows also that tailoring our scheme to search over a byte-encoded text results in speedups in comparison to searching over a plain text
GPU-based odd and even hybrid string matching algorithm
String matching is considered as one of the
fundamental problems in computer science.Many
computer applications provide the string matching
utility for their users, and how fast one or more
occurrences of a given pattern can be found in a text plays a prominent role in their user satisfaction.Although numerous algorithms and methods are available to solve the string matching problem, the remarkable increase in the amount of data which is produced and stored by modern computational devices demands researchers to find much more efficient ways for dealing with this issue.In this research, the Odd and Even (OE) hybrid string matching algorithm is redesigned to be executed on the Graphics Processing Unit (GPU), which can be utilized to reduce the burden of compute-intensive operations from the Central Processing Unit (CPU).In fact, capabilities of the GPU as a massively parallel processor are employed to enhance the performance of the existing hybrid string matching algorithms.Different types of data are used to evaluate the impact of parallelization and implementation of both algorithms on the GPU. Experimental results indicate that the performance of the hybrid string matching algorithms has been improved, and the speedup, which has been obtained, is considerable enough to suggest the GPU as the suitable platform for these hybrid string-matching algorithms
Recommended from our members
Cosmic String Radiation with Adaptive Mesh Refinement
Cosmic strings are a fundamental feature of many physically-motivated field theories which inevitably form in the early Universe, as a result of a symmetry breaking phase transition. An important example is the Peccei-Quinn mechanism, from which strings emerge as a potential source of dark matter axions. They are also a strong source of gravitational wave (GW) emission, with the potential for detection by the Laser Interferometer Gravitational-Wave Observatory (LIGO) and other GW experiments.
The nonlinear evolution of cosmic strings has been extensively studied using large-scale numerical simulations. However, the vast difference in scale between a typical string width and the string curvature poses a significant computational challenge. This is usually addressed by approximating the string to have either zero or fixed comoving width, resulting in inconsistencies in predictions between different methods. One technique that can address this issue is adaptive mesh refinement (AMR), which allows the resolution of the numerical grid to adapt to the scale of the features of interest in the simulation. This thesis uses GRChombo, a sophisticated code originally designed for numerical relativity, to perform the first AMR simulations of global cosmic strings. We also present our numerical contributions to GRChombo as a core developer, including novel diagnostic tools and performance enhancement.
We perform a detailed quantitative investigation of single sinusoidally displaced string configurations, comparing oscillating string trajectories with a backreaction model accounting for radiation energy losses. We conclude that analytic radiation modelling in the thin-string (Nambu-Goto) limit provides the appropriate picture for cosmological evolution. We also investigate the resulting massless (Goldstone boson or axion) and massive (Higgs) radiation signals, using quantitative diagnostic tools to determine their eigenmode decomposition. We find that the massless quadrupole is dominant and massive radiation is strongly suppressed with increasing mass, with a complex wavepacket structure that is sensitive to numerical resolution. String network configurations are also simulated, with advanced visualisation of radiation used to reveal new qualitative phenomena as strings reconnect and small loops decay. The thesis concludes with the cosmological implications of this work, considering dark matter axions radiated by cosmic strings and the outlook for gravitational wave signatures