Search CORE

4,643 research outputs found

Efficient Online String Matching through Linked Weak Factors

Author: Faro Simone
Palmer Matthew N.
Scafiti Stefano
Publication venue
Publication date: 24/10/2023
Field of study

Online string matching is a computational problem involving the search for patterns or substrings in a large text dataset, with the pattern and text being processed sequentially, without prior access to the entire text. Its relevance stems from applications in data compression, data mining, text editing, and bioinformatics, where rapid and efficient pattern matching is crucial. Various solutions have been proposed over the past few decades, employing diverse techniques. Recently, weak recognition approaches have attracted increasing attention. This paper presents Hash Chain, a new algorithm based on a robust weak factor recognition approach that connects adjacent factors through hashing. Despite its O(nm) complexity, the algorithm exhibits a sublinear behavior in practice and achieves superior performance compared to the most effective algorithms

arXiv.org e-Print Archive

On virtual partitioning of large dictionaries for contextual post-processing to improve character recognition

Author: Hoch Rainer
Publication venue: Sonstige Einrichtungen. DFKI Deutsches Forschungszentrum für Künstliche Intelligenz
Publication date: 01/01/1993
Field of study

This paper presents a new approach to the partitioning of large dictionaries by virtual views. The basic idea is that additional knowledge sources of text recognition and text analysis are employed for fast dictionary look-up in order to prune search space through static or dynamic views. The heart of the system is a redundant hashing technique which involves a set of hash functions dealing with noisy input efficiently. Currently, the system is composed of two main system components: the dictionary generator and the dictionary controller. While the dictionary generator initially builds the system by using profiles and source dictionaries, the controller allows the flexible integration of different search heuristics. Results prove that our system achieves a respectable speed-up of dictionary access time

Universaar

Acronym

Random Access to Grammar Compressed Strings

Author: Bille Philip
Landau Gad M.
Raman Rajeev
Sadakane Kunihiko
Satti Srinivasa Rao
Weimann Oren
Publication venue
Publication date: 01/01/2011
Field of study

Grammar based compression, where one replaces a long string by a small context-free grammar that generates the string, is a simple and powerful paradigm that captures many popular compression schemes. In this paper, we present a novel grammar representation that allows efficient random access to any character or substring without decompressing the string. Let

S

be a string of length

N

compressed into a context-free grammar

\mathcal{S}

of size

n

. We present two representations of

\mathcal{S}

achieving

O(\log N)

random access time, and either

O(n\cdot \alpha_k(n))

construction time and space on the pointer machine model, or

O(n)

construction time and space on the RAM. Here,

\alpha_k(n)

is the inverse of the

k^{th}

row of Ackermann's function. Our representations also efficiently support decompression of any substring in

S

: we can decompress any substring of length

m

in the same complexity as a single random access query and additional

O(m)

time. Combining these results with fast algorithms for uncompressed approximate string matching leads to several efficient algorithms for approximate string matching on grammar-compressed strings without decompression. For instance, we can find all approximate occurrences of a pattern

P

with at most

k

errors in time

O(n(\min\{|P|k, k^4 + |P|\} + \log N) + occ)

, where

occ

is the number of occurrences of

P

S

. Finally, we generalize our results to navigation and other operations on grammar-compressed ordered trees. All of the above bounds significantly improve the currently best known results. To achieve these bounds, we introduce several new techniques and data structures of independent interest, including a predecessor data structure, two "biased" weighted ancestor data structures, and a compact representation of heavy paths in grammars.Comment: Preliminary version in SODA 201

arXiv.org e-Print Archive

Crossref

Online Research Database In Technology

Leicester Research Archive

Adapting a relation extraction pipeline for the BioCreAtIvE II task

Author: Grover Claire
Haddow Barry
Klein Ewan
Matthews Michael
Nielsen Leif Arda
Tobin Richard
Wang Xinglong
Publication venue
Publication date: 01/01/2007
Field of study

Edinburgh Research Explorer

Learning a Family of Detectors

Author: Yuan Quan
Publication venue: Boston University Computer Science Department
Publication date: 30/06/2009
Field of study

Object detection and recognition are important problems in computer vision. The challenges of these problems come from the presence of noise, background clutter, large within class variations of the object class and limited training data. In addition, the computational complexity in the recognition process is also a concern in practice. In this thesis, we propose one approach to handle the problem of detecting an object class that exhibits large within-class variations, and a second approach to speed up the classification processes. In the first approach, we show that foreground-background classification (detection) and within-class classification of the foreground class (pose estimation) can be jointly solved with using a multiplicative form of two kernel functions. One kernel measures similarity for foreground-background classification. The other kernel accounts for latent factors that control within-class variation and implicitly enables feature sharing among foreground training samples. For applications where explicit parameterization of the within-class states is unavailable, a nonparametric formulation of the kernel can be constructed with a proper foreground distance/similarity measure. Detector training is accomplished via standard Support Vector Machine learning. The resulting detectors are tuned to specific variations in the foreground class. They also serve to evaluate hypotheses of the foreground state. When the image masks for foreground objects are provided in training, the detectors can also produce object segmentation. Methods for generating a representative sample set of detectors are proposed that can enable efficient detection and tracking. In addition, because individual detectors verify hypotheses of foreground state, they can also be incorporated in a tracking-by-detection frame work to recover foreground state in image sequences. To run the detectors efficiently at the online stage, an input-sensitive speedup strategy is proposed to select the most relevant detectors quickly. The proposed approach is tested on data sets of human hands, vehicles and human faces. On all data sets, the proposed approach achieves improved detection accuracy over the best competing approaches. In the second part of the thesis, we formulate a filter-and-refine scheme to speed up recognition processes. The binary outputs of the weak classifiers in a boosted detector are used to identify a small number of candidate foreground state hypotheses quickly via Hamming distance or weighted Hamming distance. The approach is evaluated in three applications: face recognition on the face recognition grand challenge version 2 data set, hand shape detection and parameter estimation on a hand data set, and vehicle detection and estimation of the view angle on a multi-pose vehicle data set. On all data sets, our approach is at least five times faster than simply evaluating all foreground state hypotheses with virtually no loss in classification accuracy

Boston University Institutional Repository (OpenBU)

Stochastic accumulation of feature information in perception and memory

Author: Adelman
Adelman
Ashby
Ashby
Ashby
Ashby
Ashby
Barsalou
Bausenhart
Biederman
Bogacz
Bower
Bower
Brockdorff
Brown
Brown
Bundesen
Busey
Carrasco
Carrasco
Carrasco
Carrasco
Cohen
Cowan
Dale
Davis
Diller
Dosher
Dosher
Dosher
Dosher
Dosher
Dunn
Eckstein
Eriksen
Estes
Estes
Estes
Fific
Filoteo
Freeman
Freeman
Freeman
Friedman
Garavan
Giordano
Gold
Grainger
Grainger
Gronlund
Gronlund
Guest
Guest
Guest
Guest
Gureckis
GÃ¶the
Healy
Healy
Heekeren
Heit
Heit
Hintzman
Hockley
Hummel
Inglis
Kanai
Kent
Kent
Kent
Kent
Kent
Kent
Kent
Kruschke
Kwantes
LaBerge
Lamberts
Lamberts
Lamberts
Lamberts
Lamberts
Lamberts
Lamberts
Lamberts
Lamberts
Lamberts
Lamberts
Lindell
Little
Little
Liu
Logan
Logan
Luce
Luce
Luck
Ma
Maddox
Maddox
Maddox
Maddox
Maddox
Maddox
Malmberg
Marslen-Wilson
McClelland
McClelland
McElree
McElree
McGill
Meyer
Miller
Newell
Norman
Norris
Nosofsky
Nosofsky
Nosofsky
Nosofsky
Nosofsky
Nosofsky
Nosofsky
Oberauer
Paap
Pachella
Palmer
Pezzulo
Posner
Purcell
Ratcliff
Ratcliff
Ratcliff
Reed
Reed
Rehder
Rotello
Rotello
Rumelhart
Rumelhart
Salthouse
Schall
Schneider
Shepard
Smith
Song
Spivey
Stewart
Takeda
Townsend
Treisman
Treue
Tversky
Usher
Whitney
Wickelgren
Wickelgren
Wolfe
Wolford
Publication venue: 'Frontiers Media SA'
Publication date: 01/01/2014
Field of study

It is now well established that the time course of perceptual processing influences the first second or so of performance in a wide variety of cognitive tasks. Over the last20 years, there has been a shift from modeling the speed at which a display is processed, to modeling the speed at which different features of the display are perceived and formalizing how this perceptual information is used in decision making. The first of these models(Lamberts, 1995) was implemented to fit the time course of performance in a speeded perceptual categorization task and assumed a simple stochastic accumulation of feature information. Subsequently, similar approaches have been used to model performance in a range of cognitive tasks including identification, absolute identification, perceptual matching, recognition, visual search, and word processing, again assuming a simple stochastic accumulation of feature information from both the stimulus and representations held in memory. These models are typically fit to data from signal-to-respond experiments whereby the effects of stimulus exposure duration on performance are examined, but response times (RTs) and RT distributions have also been modeled. In this article, we review this approach and explore the insights it has provided about the interplay between perceptual processing, memory retrieval, and decision making in a variety of tasks. In so doing, we highlight how such approaches can continue to usefully contribute to our understanding of cognition

Crossref

Nottingham Trent Institutional Repository (IRep)

Directory of Open Access Journals

Frontiers - Publisher Connector

PubMed Central

Warwick Research Archives Portal Repository

White Rose Research Online

Explore Bristol Research

LATE Ain'T Earley: A Faster Parallel Earley Parser

Author: Ahrens Peter
Feser John
Hui Robin
Publication venue
Publication date: 15/07/2018
Field of study

We present the LATE algorithm, an asynchronous variant of the Earley algorithm for parsing context-free grammars. The Earley algorithm is naturally task-based, but is difficult to parallelize because of dependencies between the tasks. We present the LATE algorithm, which uses additional data structures to maintain information about the state of the parse so that work items may be processed in any order. This property allows the LATE algorithm to be sped up using task parallelism. We show that the LATE algorithm can achieve a 120x speedup over the Earley algorithm on a natural language task

arXiv.org e-Print Archive