Search CORE

1,126 research outputs found

A comparative evaluation of name-matching algorithms

Author: L. Karl Branting
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2003
Field of study

Name matching—recognizing when two different strings are likely to denote the same entity—is an important task in many legal information systems, such as case-management systems. The naming conventions peculiar to legal cases limit the effectiveness of generic approximate string-matching algorithms in this task. This paper proposes a three-stage framework for name matching, identifies how each stage in the framework addresses the naming variations that typically arise in legal cases, describes several alternative approaches to each stage, and evaluates the performance of various combinations of the alternatives on a representative collection of names drawn from a United States District Court case management system. The best tradeoff between accuracy and efficiency in this collection was achieved by algorithms that standardize capitalization, spacing, and punctuation; filter redundant terms; index using an abstraction function that is both order-insensitive and tolerant of small numbers of omissions or additions; and compare names in a symmetrical, word-by-word fashion. 1

CiteSeerX

Crossref

IST Austria Technical Report

Author: Chatterjee Krishnendu
Ibsen-Jensen Rasmus
Majumdar Rupak
Publication venue: IST Austria
Publication date: 01/01/2013
Field of study

The edit distance between two (untimed) traces is the minimum cost of a sequence of edit operations (insertion, deletion, or substitution) needed to transform one trace to the other. Edit distances have been extensively studied in the untimed setting, and form the basis for approximate matching of sequences in different domains such as coding theory, parsing, and speech recognition. In this paper, we lift the study of edit distances from untimed languages to the timed setting. We define an edit distance between timed words which incorporates both the edit distance between the untimed words and the absolute difference in timestamps. Our edit distance between two timed words is computable in polynomial time. Further, we show that the edit distance between a timed word and a timed language generated by a timed automaton, defined as the edit distance between the word and the closest word in the language, is PSPACE-complete. While computing the edit distance between two timed automata is undecidable, we show that the approximate version, where we decide if the edit distance between two timed automata is either less than a given parameter or more than delta away from the parameter, for delta>0, can be solved in exponential space and is EXPSPACE-hard. Our definitions and techniques can be generalized to the setting of hybrid systems, and we show analogous decidability results for rectangular automata

IST Austria: PubRep (Institute of Science and Technology)

Edit Distance for Pushdown Automata

Author: Chatterjee Krishnendu
Henzinger Thomas A.
Ibsen-Jensen Rasmus
Otop Jan
Publication venue
Publication date: 01/01/2017
Field of study

The edit distance between two words

w_1, w_2

is the minimal number of word operations (letter insertions, deletions, and substitutions) necessary to transform

w_1

w_2

. The edit distance generalizes to languages

\mathcal{L}_1, \mathcal{L}_2

, where the edit distance from

\mathcal{L}_1

\mathcal{L}_2

is the minimal number

k

such that for every word from

\mathcal{L}_1

there exists a word in

\mathcal{L}_2

with edit distance at most

k

. We study the edit distance computation problem between pushdown automata and their subclasses. The problem of computing edit distance to a pushdown automaton is undecidable, and in practice, the interesting question is to compute the edit distance from a pushdown automaton (the implementation, a standard model for programs with recursion) to a regular language (the specification). In this work, we present a complete picture of decidability and complexity for the following problems: (1)~deciding whether, for a given threshold

k

, the edit distance from a pushdown automaton to a finite automaton is at most

k

, and (2)~deciding whether the edit distance from a pushdown automaton to a finite automaton is finite.Comment: An extended version of a paper accepted to ICALP 2015 with the same title. The paper has been accepted to the LMCS journa

arXiv.org e-Print Archive

Episciences.org

IST PubRep

IST Austria: PubRep (Institute of Science and Technology)

Weighted Transducers for Robustness Verification

Author: Filiot Emmanuel
Mazzocchi Nicolas
Sankaranarayanan Sriram
Trivedi Ashutosh
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 31st International Conference on Concurrency Theory (CONCUR 2020)
Publication date: 01/01/2020
Field of study

Automata theory provides us with fundamental notions such as languages, membership, emptiness and inclusion that in turn allow us to specify and verify properties of reactive systems in a useful manner. However, these notions all yield "yes"/"no" answers that sometimes fall short of being satisfactory answers when the models being analyzed are imperfect, and the observations made are prone to errors. To address this issue, a common engineering approach is not just to verify that a system satisfies a property, but whether it does so robustly. We present notions of robustness that place a metric on words, thus providing a natural notion of distance between words. Such a metric naturally leads to a topological neighborhood of words and languages, leading to quantitative and robust versions of the membership, emptiness and inclusion problems. More generally, we consider weighted transducers to model the cost of errors. Such a transducer models neighborhoods of words by providing the cost of rewriting a word into another. The main contribution of this work is to study robustness verification problems in the context of weighted transducers. We provide algorithms for solving the robust and quantitative versions of the membership and inclusion problems while providing useful motivating case studies including approximate pattern matching problems to detect clinically relevant events in a large type-1 diabetes dataset

Dagstuhl Research Online Publication Server

DI-fusion

Testing Membership for Timed Automata

Author: de Rougemont Michel
Lassaigne Richard
Publication venue
Publication date: 28/05/2020
Field of study

Given a timed automata which admits thick components and a timed word

x

, we present a tester which decides if

x

is in the language of the automaton or if

x

\epsilon

-far from the language, using finitely many samples taken from the weighted time distribution

\mu

associated with an input

x

. We introduce a distance between timed words, the {\em timed edit distance}, which generalizes the classical edit distance. A timed word

x

\epsilon

-far from a timed language if its relative distance to the language is greater than

\epsilon

.Comment: 26 page

arXiv.org e-Print Archive

IST Austria Technical Report

Author: Chatterjee Krishnendu
Henzinger Thomas A
Ibsen-Jensen Rasmus
Otop Jan
Publication venue: IST Austria
Publication date: 01/01/2015
Field of study

The edit distance between two words w1, w2 is the minimal number of word operations (letter insertions, deletions, and substitutions) necessary to transform w1 to w2. The edit distance generalizes to languages L1, L2, where the edit distance is the minimal number k such that for every word from L1 there exists a word in L2 with edit distance at most k. We study the edit distance computation problem between pushdown automata and their subclasses. The problem of computing edit distance to a pushdown automaton is undecidable, and in practice, the interesting question is to compute the edit distance from a pushdown automaton (the implementation, a standard model for programs with recursion) to a regular language (the specification). In this work, we present a complete picture of decidability and complexity for deciding whether, for a given threshold k, the edit distance from a pushdown automaton to a finite automaton is at most k

IST Austria: PubRep (Institute of Science and Technology)

Unsupervised Detection of Cell-Assembly Sequences by Similarity-Based Clustering

Author: David R. Euston
Keita Watanabe
Masami Tatsuno
Tatsuya Haga
Tomoki Fukai
Publication venue: 'Frontiers Media SA'
Publication date: 31/05/2019
Field of study

Neurons which fire in a fixed temporal pattern (i.e., "cell assemblies") are hypothesized to be a fundamental unit of neural information processing. Several methods are available for the detection of cell assemblies without a time structure. However, the systematic detection of cell assemblies with time structure has been challenging, especially in large datasets, due to the lack of efficient methods for handling the time structure. Here, we show a method to detect a variety of cell-assembly activity patterns, recurring in noisy neural population activities at multiple timescales. The key innovation is the use of a computer science method to comparing strings ("edit similarity"), to group spikes into assemblies. We validated the method using artificial data and experimental data, which were previously recorded from the hippocampus of male Long-Evans rats and the prefrontal cortex of male Brown Norway/Fisher hybrid rats. From the hippocampus, we could simultaneously extract place-cell sequences occurring on different timescales during navigation and awake replay. From the prefrontal cortex, we could discover multiple spike sequences of neurons encoding different segments of a goal-directed task. Unlike conventional event-driven statistical approaches, our method detects cell assemblies without creating event-locked averages. Thus, the method offers a novel analytical tool for deciphering the neural code during arbitrary behavioral and mental processes

OIST Institutional Repository

Fast Filter-and-Refine Algorithms for Subsequence Selection

Author: OOI Beng-Chin
PANG Hwee Hwa
WANG Hao
WONG Limsoon
YU Cui
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/07/2002
Field of study

Institutional Knowledge at Singapore Management University