Search CORE

31,353 research outputs found

Improved Approximate String Matching and Regular Expression Matching on Ziv-Lempel Compressed Texts

Author: A. Amir
E.W. Myers
G. Navarro
G. Navarro
G. Navarro
G.M. Landau
J. Kärkkäinen
J. Ziv
J. Ziv
K. Thompson
M. Dietzfelbinger
M. Farach
P. Sellers
R. Cole
T.A. Welch
V. Mäkinen
Publication venue
Publication date: 01/01/2007
Field of study

We study the approximate string matching and regular expression matching problem for the case when the text to be searched is compressed with the Ziv-Lempel adaptive dictionary compression schemes. We present a time-space trade-off that leads to algorithms improving the previously known complexities for both problems. In particular, we significantly improve the space bounds, which in practical applications are likely to be a bottleneck

arXiv.org e-Print Archive

CiteSeerX

Crossref

University of Southern Denmark Research Output

Online Research Database In Technology

Path, theme and narrative in open plan exhibition settings

Author: Dalton Ruth
Dalton Ruth
Peponis John
Wineman Jean
Publication venue
Publication date: 01/06/2003
Field of study

Three arguments are made based on the analysis of science exhibitions. First,sufficiently refined techniques of spatial analysis allow us to model the impact oflayout upon visitors' paths, even in moderately sized open plans which allow almostrandom patterns of movement and relatively unobstructed visibility. Second, newlydeveloped or adapted techniques of analysis allow us to make a transition frommodeling the mechanics of spatial movement (the way in which movement is affectedby the distribution of obstacles and boundaries), to modeling the manner in whichmovement might register additional aspects of visual information. Third, theadvantages of such purely spatial modes of analysis extend into providing us with asharper understanding of some of the pragmatic constrains within which exhibitioncontent is conceived and designed

Northumbria Research Link

Knowledge-based machine vision systems for space station automation

Author: Chipman Laure J.
Ranganath Heggere S.
Publication venue
Publication date
Field of study

Computer vision techniques which have the potential for use on the space station and related applications are assessed. A knowledge-based vision system (expert vision system) and the development of a demonstration system for it are described. This system implements some of the capabilities that would be necessary in a machine vision system for the robot arm of the laboratory module in the space station. A Perceptics 9200e image processor, on a host VAXstation, was used to develop the demonstration system. In order to use realistic test images, photographs of actual space shuttle simulator panels were used. The system's capabilities of scene identification and scene matching are discussed

NASA Technical Reports Server

Path, theme and narrative in open plan exhibition settings

Author: Conroy-Dalton R
Dalton N
Peponis J
Wineman J
Publication venue
Publication date: 01/06/2003
Field of study

UCL Discovery

Pattern Matching and Discourse Processing in Information Extraction from Japanese Text

Author: Eriguchi Y.
Hara M.
Kitani T.
Publication venue
Publication date: 01/01/1994
Field of study

Information extraction is the task of automatically picking up information of interest from an unconstrained text. Information of interest is usually extracted in two steps. First, sentence level processing locates relevant pieces of information scattered throughout the text; second, discourse processing merges coreferential information to generate the output. In the first step, pieces of information are locally identified without recognizing any relationships among them. A key word search or simple pattern search can achieve this purpose. The second step requires deeper knowledge in order to understand relationships among separately identified pieces of information. Previous information extraction systems focused on the first step, partly because they were not required to link up each piece of information with other pieces. To link the extracted pieces of information and map them onto a structured output format, complex discourse processing is essential. This paper reports on a Japanese information extraction system that merges information using a pattern matcher and discourse processor. Evaluation results show a high level of system performance which approaches human performance.Comment: See http://www.jair.org/ for any accompanying file

arXiv.org e-Print Archive

CiteSeerX

Suffix Tree of Alignment: An Efficient Index for Similar Data

Author: A. Amir
D. Gusfield
E. Ukkonen
E.M. McCreight
G. Navarro
H.H. Do
J. Ziv
K. Sadakane
M. Crochemore
M. Farach-Colton
P. Bille
R. Grossi
R.A. Baeza-Yates
S. Huang
S. Karlin
S. Kuruppu
V. Levenshtein
V. Mäkinen
V. Mäkinen
Publication venue
Publication date: 01/01/2013
Field of study

We consider an index data structure for similar strings. The generalized suffix tree can be a solution for this. The generalized suffix tree of two strings

A

and

B

is a compacted trie representing all suffixes in

A

and

B

. It has

|A|+|B|

leaves and can be constructed in

O(|A|+|B|)

time. However, if the two strings are similar, the generalized suffix tree is not efficient because it does not exploit the similarity which is usually represented as an alignment of

A

and

B

. In this paper we propose a space/time-efficient suffix tree of alignment which wisely exploits the similarity in an alignment. Our suffix tree for an alignment of

A

and

B

has

|A| + l_d + l_1

leaves where

l_d

is the sum of the lengths of all parts of

B

different from

A

and

l_1

is the sum of the lengths of some common parts of

A

and

B

. We did not compromise the pattern search to reduce the space. Our suffix tree can be searched for a pattern

P

O(|P|+occ)

time where

occ

is the number of occurrences of

P

A

and

B

. We also present an efficient algorithm to construct the suffix tree of alignment. When the suffix tree is constructed from scratch, the algorithm requires

O(|A| + l_d + l_1 + l_2)

time where

l_2

is the sum of the lengths of other common substrings of

A

and

B

. When the suffix tree of

A

is already given, it requires

O(l_d + l_1 + l_2)

time.Comment: 12 page

arXiv.org e-Print Archive

CiteSeerX

Crossref

King's Research Portal

An optimized TOPS+ comparison method for enhanced TOPS models

Author: A Brazma
A Harrison
A Harrison
CA Orengo
CA Orengo
CA Orengo
CJ van Rijsbergen
D Gilbert
D Gilbert
D Westhead
David Gilbert
G Valiente
Gabriel Valiente
GJ Barton
GM Torrance
HM Berman
HM Grindley
I Koch
I Michalopoulos
IN Shindyalov
J Handl
J Viksna
K Mizuguchi
L Holm
LP Chew
M Veeramalai
M Veeramalai
M Veeramalai
Mallika Veeramalai
N Krasnogor
RB Russell
S Goldsmith-Fischman
SB Needleman
SS Krishna
T Madej
T Madej
TF Smith
VI Levenshtein
WR Taylor
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2010
Field of study

This article has been made available through the Brunel Open Access Publishing Fund.Background Although methods based on highly abstract descriptions of protein structures, such as VAST and TOPS, can perform very fast protein structure comparison, the results can lack a high degree of biological significance. Previously we have discussed the basic mechanisms of our novel method for structure comparison based on our TOPS+ model (Topological descriptions of Protein Structures Enhanced with Ligand Information). In this paper we show how these results can be significantly improved using parameter optimization, and we call the resulting optimised TOPS+ method as advanced TOPS+ comparison method i.e. advTOPS+. Results We have developed a TOPS+ string model as an improvement to the TOPS [1-3] graph model by considering loops as secondary structure elements (SSEs) in addition to helices and strands, representing ligands as first class objects, and describing interactions between SSEs, and SSEs and ligands, by incoming and outgoing arcs, annotating SSEs with the interaction direction and type. Benchmarking results of an all-against-all pairwise comparison using a large dataset of 2,620 non-redundant structures from the PDB40 dataset [4] demonstrate the biological significance, in terms of SCOP classification at the superfamily level, of our TOPS+ comparison method. Conclusions Our advanced TOPS+ comparison shows better performance on the PDB40 dataset [4] compared to our basic TOPS+ method, giving 90 percent accuracy for SCOP alpha+beta; a 6 percent increase in accuracy compared to the TOPS and basic TOPS+ methods. It also outperforms the TOPS, basic TOPS+ and SSAP comparison methods on the Chew-Kedem dataset [5], achieving 98 percent accuracy. Software Availability: The TOPS+ comparison server is available at http://balabio.dcs.gla.ac.uk/mallika/WebTOPS/.This article is available through the Brunel Open Access Publishing Fun

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Brunel University Research Archive

FlashProfile: A Framework for Synthesizing Data Profiles

Author: Gulwani Sumit
Jain Prateek
Millstein Todd
Padhi Saswat
Perelman Daniel
Polozov Oleksandr
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 24/10/2018
Field of study

We address the problem of learning a syntactic profile for a collection of strings, i.e. a set of regex-like patterns that succinctly describe the syntactic variations in the strings. Real-world datasets, typically curated from multiple sources, often contain data in various syntactic formats. Thus, any data processing task is preceded by the critical step of data format identification. However, manual inspection of data to identify the different formats is infeasible in standard big-data scenarios. Prior techniques are restricted to a small set of pre-defined patterns (e.g. digits, letters, words, etc.), and provide no control over granularity of profiles. We define syntactic profiling as a problem of clustering strings based on syntactic similarity, followed by identifying patterns that succinctly describe each cluster. We present a technique for synthesizing such profiles over a given language of patterns, that also allows for interactive refinement by requesting a desired number of clusters. Using a state-of-the-art inductive synthesis framework, PROSE, we have implemented our technique as FlashProfile. Across

153

tasks over

75

large real datasets, we observe a median profiling time of only

\sim\,0.7\,

s. Furthermore, we show that access to syntactic profiles may allow for more accurate synthesis of programs, i.e. using fewer examples, in programming-by-example (PBE) workflows such as FlashFill.Comment: 28 pages, SPLASH (OOPSLA) 201

arXiv.org e-Print Archive

eScholarship - University of California