Search CORE

369 research outputs found

Adaptive Computation of the Swap-Insert Correction Distance

Author: Daniel Meister
FN Abu-Khzam
RA Wagner
RA Wagner
Publication venue
Publication date: 27/06/2015
Field of study

The Swap-Insert Correction distance from a string

S

of length

n

to another string

L

of length

m\geq n

on the alphabet

[1..d]

is the minimum number of insertions, and swaps of pairs of adjacent symbols, converting

S

into

L

. Contrarily to other correction distances, computing it is NP-Hard in the size

d

of the alphabet. We describe an algorithm computing this distance in time within

O(d^2 nm g^{d-1})

, where there are

n_\alpha

occurrences of

\alpha

S

m_\alpha

occurrences of

\alpha

L

, and where

g=\max_{\alpha\in[1..d]} \min\{n_\alpha,m_\alpha-n_\alpha\}

measures the difficulty of the instance. The difficulty

g

is bounded by above by various terms, such as the length of the shortest string

S

, and by the maximum number of occurrences of a single character in

S

. Those results illustrate how, in many cases, the correction distance between two strings can be easier to compute than in the worst case scenario.Comment: 16 pages, no figures, long version of the extended abstract accepted to SPIRE 201

arXiv.org e-Print Archive

Crossref

A Reformulation of Matrix Graph Grammars with Boolean Complexes

Author: Lara Juan de
Pérez Velasco Pedro Pablo
Publication venue: University of Delaware. Deparment of Mathematical Sciences
Publication date: 01/01/2009
Field of study

Prior publication in the Electronic Journal of Combinatorics.Graph transformation is concerned with the manipulation of graphs by means of rules. Graph grammars have been traditionally studied using techniques from category theory. In previous works, we introduced Matrix Graph Grammars (MGG) as a purely algebraic approach for the study of graph dynamics, based on the representation of simple graphs by means of their adjacency matrices. The observation that, in addition to positive information, a rule implicitly defines negative conditions for its application (edges cannot become dangling, and cannot be added twice as we work with simple digraphs) has led to a representation of graphs as two matrices encoding positive and negative information. Using this representation, we have reformulated the main concepts in MGGs, while we have introduced other new ideas. In particular, we present (i) a new formulation of productions together with an abstraction of them (so called swaps), (ii) the notion of coherence, which checks whether a production sequence can be potentially applied, (iii) the minimal graph enabling the applicability of a sequence, and (iv) the conditions for compatibility of sequences (lack of dangling edges) and G-congruence (whether two sequences have the same minimal initial graph).This work has been partially sponsored by the Spanish Ministry of Science and Innovation, project METEORIC (TIN2008-02081/TIN)

CiteSeerX

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Biblos-e Archivo

Matching Lenses: Alignment and View Update

Author: Davi Barbosa M.J.
Julien Cretin
Michael Greenberg
Nate Foster
Pierce Benjamin C
Publication venue: ScholarlyCommons
Publication date: 08/01/2010
Field of study

Bidirectional programming languages have been proposed as a practical approach to the view update problem. Programs in these languages, often called lenses, can be read in two ways— from left to right as functions mapping sources to views, and from right to left as functions mapping updated views back to updated sources. Lenses address the view update problem by making it possible to define a view and its associated update policy together. One issue that has not received sufficient attention in the design of bidirectional languages is alignment. In general, to correctly propagate an update to a view, a lens needs to match up the pieces of the edited view with corresponding pieces of the underlying source. Unfortunately, existing bidirectional languages are extremely limited in their treatment of alignment—they only support simple strategies that do not suffice for many examples of practical interest. In this paper, we propose a novel framework of matching lenses that extends basic lenses with new mechanisms for calculating and using alignments. We enrich the types of lenses with “chunks” that identify the locations of data that should be re-aligned after updates, and we formulate refined behavioral laws that capture essential constraints on the handling of chunks. To demonstrate the utility of our approach, we develop a core language of matching lenses for string data, and we extend it with primitives for describing a number of useful alignment heuristics

ScholarlyCommons@Penn

A Conditional Random Field for Discriminatively-trained Finite-state String Edit Distance

Author: Bellare Kedar
McCallum Andrew
Pereira Fernando
Publication venue: ScholarWorks@UMass Amherst
Publication date: 01/01/2005
Field of study

The need to measure sequence similarity arises in information extraction, object identity, data mining, biological sequence analysis, and other domains. This paper presents discriminative string-edit CRFs, a finitestate conditional random field model for edit sequences between strings. Conditional random fields have advantages over generative approaches to this problem, such as pair HMMs or the work of Ristad and Yianilos, because as conditionally-trained methods, they enable the use of complex, arbitrary actions and features of the input strings. As in generative models, the training data does not have to specify the edit sequences between the given string pairs. Unlike generative models, however, our model is trained on both positive and negative instances of string pairs. We present positive experimental results on several data sets

CiteSeerX

ScholarWorks@UMass Amherst

Database Streaming Compression on Memory-Limited Machines

Author: Bruccoleri Damon F.
Publication venue: NSUWorks
Publication date: 01/01/2018
Field of study

Dynamic Huffman compression algorithms operate on data-streams with a bounded symbol list. With these algorithms, the complete list of symbols must be contained in main memory or secondary storage. A horizontal format transaction database that is streaming can have a very large item list. Many nodes tax both the processing hardware primary memory size, and the processing time to dynamically maintain the tree. This research investigated Huffman compression of a transaction-streaming database with a very large symbol list, where each item in the transaction database schema’s item list is a symbol to compress. The constraint of a large symbol list is, in this research, equivalent to the constraint of a memory-limited machine. A large symbol set will result if each item in a large database item list is a symbol to compress in a database stream. In addition, database streams may have some temporal component spanning months or years. Finally, the horizontal format is the format most suited to a streaming transaction database because the transaction IDs are not known beforehand This research prototypes an algorithm that will compresses a transaction database stream. There are several advantages to the memory limited dynamic Huffman algorithm. Dynamic Huffman algorithms are single pass algorithms. In many instances a second pass over the data is not possible, such as with streaming databases. Previous dynamic Huffman algorithms are not memory limited, they are asymptotic to O(n), where n is the number of distinct item IDs. Memory is required to grow to fit the n items. The improvement of the new memory limited Dynamic Huffman algorithm is that it would have an O(k) asymptotic memory requirement; where k is the maximum number of nodes in the Huffman tree, k \u3c n, and k is a user chosen constant. The new memory limited Dynamic Huffman algorithm compresses horizontally encoded transaction databases that do not contain long runs of 0’s or 1’s

ProQuest OAI Repository

NSU Works