Search CORE

22,517 research outputs found

Efficient Online String Matching through Linked Weak Factors

Author: Faro Simone
Palmer Matthew N.
Scafiti Stefano
Publication venue
Publication date: 24/10/2023
Field of study

Online string matching is a computational problem involving the search for patterns or substrings in a large text dataset, with the pattern and text being processed sequentially, without prior access to the entire text. Its relevance stems from applications in data compression, data mining, text editing, and bioinformatics, where rapid and efficient pattern matching is crucial. Various solutions have been proposed over the past few decades, employing diverse techniques. Recently, weak recognition approaches have attracted increasing attention. This paper presents Hash Chain, a new algorithm based on a robust weak factor recognition approach that connects adjacent factors through hashing. Despite its O(nm) complexity, the algorithm exhibits a sublinear behavior in practice and achieves superior performance compared to the most effective algorithms

arXiv.org e-Print Archive

Adapting a relation extraction pipeline for the BioCreAtIvE II task

Author: Grover Claire
Haddow Barry
Klein Ewan
Matthews Michael
Nielsen Leif Arda
Tobin Richard
Wang Xinglong
Publication venue
Publication date: 01/01/2007
Field of study

Edinburgh Research Explorer

Probabilistic mathematical formula recognition using a 2D context-free graph grammar

Author: Celik Mehmet
Yanikoglu Berrin
Yanıkoğlu Berrin
Çelik Mehmet
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2011
Field of study

We present a probabilistic framework for the mathematical expression recognition problem. The developed system is flexible in that its grammar can be extended easily thanks to its graph grammar which eliminates the need for specifying rule precedence. It is also optimal in the sense that all possible interpretations of the expressions are expanded without making early commitments or hard decisions. In this paper, we give an overview of the whole system and describe in detail the graph grammar and the parsing process used in the system, along with some preliminary results on character, structure and expression recognition performances

CiteSeerX

Crossref

Bilkent University Institutional Repository

Sabanci University Research Database

Rank, select and access in grammar-compressed strings

Author: Belazzougui Djamal
Puglisi Simon J.
Tabei Yasuo
Publication venue
Publication date: 14/08/2014
Field of study

Given a string

S

of length

N

on a fixed alphabet of

\sigma

symbols, a grammar compressor produces a context-free grammar

G

of size

n

that generates

S

and only

S

. In this paper we describe data structures to support the following operations on a grammar-compressed string: \mbox{rank}_c(S,i) (return the number of occurrences of symbol

c

before position

i

S

); \mbox{select}_c(S,i) (return the position of the

i

th occurrence of

c

S

); and \mbox{access}(S,i,j) (return substring

S[i,j]

). For rank and select we describe data structures of size

O(n\sigma\log N)

bits that support the two operations in

O(\log N)

time. We propose another structure that uses

O(n\sigma\log (N/n)(\log N)^{1+\epsilon})

bits and that supports the two queries in

O(\log N/\log\log N)

, where

\epsilon>0

is an arbitrary constant. To our knowledge, we are the first to study the asymptotic complexity of rank and select in the grammar-compressed setting, and we provide a hardness result showing that significantly improving the bounds we achieve would imply a major breakthrough on a hard graph-theoretical problem. Our main result for access is a method that requires