Search CORE

85,332 research outputs found

Efficient comparison based string matching

Author: Breslauer D. (Dany)
Galil Z.
Publication venue: 'Cambridge University Press (CUP)'
Publication date: 01/01/1993
Field of study

A novel method for comparing topological models of protein structures enhanced with ligand information

Author: Altschul
Barton
Barton
Berman
Berman
Bourne
Bradley
Bray
Brazma
Brenner
Chalk
Chandonia
David Gilbert
Doolittle
Gilbert
Gilbert
Gilbert
Gromiha
Harrison
Higgins
Holm
Koch
Madej
Mallika
Mallika Veeramalai
Michalopoulos
Mizuguchi
Nagano
Nobeli
Orengo
Russell
Sowdhamini
Sternberg
Torrance
Veeramalai
Viksna
von Grotthuss
Westhead
Westhead
Xue
Ye
Publication venue: 'Oxford University Press (OUP)'
Publication date: 07/10/2008
Field of study

This article is available open access through the publisher’s website through the link below. Copyright @ 2008 The Authors.We introduce TOPS+ strings, a highly abstract string-based model of protein topology that permits efficient computation of structure comparison, and can optionally represent ligand information. In this model, we consider loops as secondary structure elements (SSEs) as well as helices and strands; in addition we represent ligands as first class objects. Interactions between SSEs and between SSEs and ligands are described by incoming/outgoing arcs and ligand arcs, respectively; and SSEs are annotated with arc interaction direction and type. We are able to abstract away from the ligands themselves, to give a model characterized by a regular grammar rather than the context sensitive grammar of the original TOPS model. Our TOPS+ strings model is sufficiently descriptive to obtain biologically meaningful results and has the advantage of permitting fast string-based structure matching and comparison as well as avoiding issues of Non-deterministic Polynomial time (NP)-completeness associated with graph problems. Our structure comparison method is computationally more efficient in identifying distantly related proteins than BLAST, CLUSTALW, SSAP and TOPS because of the compact and abstract string-based representation of protein structure which records both topological and biochemical information including the functionally important loop regions of the protein structures. The accuracy of our comparison method is comparable with that of TOPS. Also, we have demonstrated that our TOPS+ strings method out-performs the TOPS method for the ligand-dependent protein structures and provides biologically meaningful results. Availability: The TOPS+ strings comparison server is available from http://balabio.dcs.gla.ac.uk/mallika/WebTOPS/topsplus.html.University of Glasgo

Crossref

Brunel University Research Archive

Generic Subsequence Matching Framework: Modularity, Flexibility, Efficiency

Author: A. Guttman
C. Faloutsos
C.S. Perng
D. Novak
E. Keogh
E. Keogh
F. Korn
H. Sakoe
J. Shieh
M. Batko
Publication venue
Publication date: 01/01/2012
Field of study

Subsequence matching has appeared to be an ideal approach for solving many problems related to the fields of data mining and similarity retrieval. It has been shown that almost any data class (audio, image, biometrics, signals) is or can be represented by some kind of time series or string of symbols, which can be seen as an input for various subsequence matching approaches. The variety of data types, specific tasks and their partial or full solutions is so wide that the choice, implementation and parametrization of a suitable solution for a given task might be complicated and time-consuming; a possibly fruitful combination of fragments from different research areas may not be obvious nor easy to realize. The leading authors of this field also mention the implementation bias that makes difficult a proper comparison of competing approaches. Therefore we present a new generic Subsequence Matching Framework (SMF) that tries to overcome the aforementioned problems by a uniform frame that simplifies and speeds up the design, development and evaluation of subsequence matching related systems. We identify several relatively separate subtasks solved differently over the literature and SMF enables to combine them in straightforward manner achieving new quality and efficiency. This framework can be used in many application domains and its components can be reused effectively. Its strictly modular architecture and openness enables also involvement of efficient solutions from different fields, for instance efficient metric-based indexes. This is an extended version of a paper published on DEXA 2012.Comment: This is an extended version of a paper published on DEXA 201

arXiv.org e-Print Archive

Crossref

Univerzitní repozitář Masarykovy univerzity

Comparison of Knuth Morris Pratt and Boyer Moore algorithms for a web-based dictionary of computer terms

Author: Khumaidi Ali
Putro Harjono Padmono
Ronisah Yusuf Aras
Publication venue: 'Universitas Ahmad Dahlan, Kampus 3'
Publication date: 01/01/2020
Field of study

Computer students need a dictionary of computer terms to deepen lectures. In developing dictionary applications, the term computer will choose the fastest and most efficient memory algorithm. The comparison algorithm is Knuth Morris Pratt (KMP) and Boyer Moore (BM) algorithm. Based on previous research, the KMP algorithm has a better performance compared to other string matching algorithms. However, other studies have concluded that the BM algorithm has better performance. Besides, the Zhu-Takaoka algorithm is more efficient than the KMP algorithm in dictionary development. The BM algorithm has the same search concept as the Zhu-Takaoka algorithm. The determination of the fastest and most efficient algorithm in this study uses the Exponential Comparison Method (ECM). ECM sets criteria for when searching and using the memory in the search process. The results of the comparison of the KMP and BM algorithm are the search time for the BM algorithm is 37.9%, and the KMP algorithm is 62.1%. The results of the use of search memory for the KMP algorithm are 50.6%, and the BM algorithm is 49.4%. The total ECM score shows that the BM algorithm is 0.55% better than the KMP algorithm

Journal of Education and Learning (EduLearn)

UAD Journal Management System

Dynamic Thresholding Mechanisms for IR-Based Filtering in Efficient Source Code Plagiarism Detection

Author: Karnalim Oscar
Sulistiani Lisan
Publication venue
Publication date: 28/10/2018
Field of study

To solve time inefficiency issue, only potential pairs are compared in string-matching-based source code plagiarism detection; wherein potentiality is defined through a fast-yet-order-insensitive similarity measurement (adapted from Information Retrieval) and only pairs which similarity degrees are higher or equal to a particular threshold is selected. Defining such threshold is not a trivial task considering the threshold should lead to high efficiency improvement and low effectiveness reduction (if it is unavoidable). This paper proposes two thresholding mechanisms---namely range-based and pair-count-based mechanism---that dynamically tune the threshold based on the distribution of resulted similarity degrees. According to our evaluation, both mechanisms are more practical to be used than manual threshold assignment since they are more proportional to efficiency improvement and effectiveness reduction.Comment: The 2018 International Conference on Advanced Computer Science and Information Systems (ICACSIS

arXiv.org e-Print Archive

Crossref

Feedback Generation for Performance Problems in Introductory Programming Assignments

Author: Gulwani Sumit
Radiček Ivan
Zuleger Florian
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2014
Field of study

Providing feedback on programming assignments manually is a tedious, error prone, and time-consuming task. In this paper, we motivate and address the problem of generating feedback on performance aspects in introductory programming assignments. We studied a large number of functionally correct student solutions to introductory programming assignments and observed: (1) There are different algorithmic strategies, with varying levels of efficiency, for solving a given problem. These different strategies merit different feedback. (2) The same algorithmic strategy can be implemented in countless different ways, which are not relevant for reporting feedback on the student program. We propose a light-weight programming language extension that allows a teacher to define an algorithmic strategy by specifying certain key values that should occur during the execution of an implementation. We describe a dynamic analysis based approach to test whether a student's program matches a teacher's specification. Our experimental results illustrate the effectiveness of both our specification language and our dynamic analysis. On one of our benchmarks consisting of 2316 functionally correct implementations to 3 programming problems, we identified 16 strategies that we were able to describe using our specification language (in 95 minutes after inspecting 66, i.e., around 3%, implementations). Our dynamic analysis correctly matched each implementation with its corresponding specification, thereby automatically producing the intended feedback.Comment: Tech report/extended version of FSE 2014 pape

arXiv.org e-Print Archive

Crossref

Searching by approximate personal-name matching

Author: Camps Pare Rafael
Daude Ventura Jordi
Publication venue
Publication date: 01/01/2003
Field of study

We discuss the design, building and evaluation of a method to access theinformation of a person, using his name as a search key, even if it has deformations. We present a similarity function, the DEA function, based on the probabilities of the edit operations accordingly to the involved letters and their position, and using a variable threshold. The efficacy of DEA is quantitatively evaluated, without human relevance judgments, very superior to the efficacy of known methods. A very efficient approximate search technique for the DEA function is also presented based on a compacted trie-tree structure.Postprint (published version

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Average-Case Optimal Approximate Circular String Matching

Author: CS Iliopoulos
E Ukkonen
F Fernandes
GM Landau
K Fredriksson
P-H Hsu
T Hirvola
T Lee
WI Chang
Publication venue
Publication date: 24/02/2015
Field of study

Approximate string matching is the problem of finding all factors of a text t of length n that are at a distance at most k from a pattern x of length m. Approximate circular string matching is the problem of finding all factors of t that are at a distance at most k from x or from any of its rotations. In this article, we present a new algorithm for approximate circular string matching under the edit distance model with optimal average-case search time O(n(k + log m)/m). Optimal average-case search time can also be achieved by the algorithms for multiple approximate string matching (Fredriksson and Navarro, 2004) using x and its rotations as the set of multiple patterns. Here we reduce the preprocessing time and space requirements compared to that approach

arXiv.org e-Print Archive

CiteSeerX

Crossref

King's Research Portal

Approximate Two-Party Privacy-Preserving String Matching with Linear Complexity

Author: Beck Martin
Kerschbaum Florian
Publication venue
Publication date: 12/02/2013
Field of study

Consider two parties who want to compare their strings, e.g., genomes, but do not want to reveal them to each other. We present a system for privacy-preserving matching of strings, which differs from existing systems by providing a deterministic approximation instead of an exact distance. It is efficient (linear complexity), non-interactive and does not involve a third party which makes it particularly suitable for cloud computing. We extend our protocol, such that it mitigates iterated differential attacks proposed by Goodrich. Further an implementation of the system is evaluated and compared against current privacy-preserving string matching algorithms.Comment: 6 pages, 4 figure

arXiv.org e-Print Archive

Crossref