Search CORE

2,696 research outputs found

Regular Languages meet Prefix Sorting

Author: Alanko Jarno
D'Agostino Giovanna
Policriti Alberto
Prezza Nicola
Publication venue
Publication date: 09/07/2019
Field of study

Indexing strings via prefix (or suffix) sorting is, arguably, one of the most successful algorithmic techniques developed in the last decades. Can indexing be extended to languages? The main contribution of this paper is to initiate the study of the sub-class of regular languages accepted by an automaton whose states can be prefix-sorted. Starting from the recent notion of Wheeler graph [Gagie et al., TCS 2017]-which extends naturally the concept of prefix sorting to labeled graphs-we investigate the properties of Wheeler languages, that is, regular languages admitting an accepting Wheeler finite automaton. Interestingly, we characterize this family as the natural extension of regular languages endowed with the co-lexicographic ordering: when sorted, the strings belonging to a Wheeler language are partitioned into a finite number of co-lexicographic intervals, each formed by elements from a single Myhill-Nerode equivalence class. Moreover: (i) We show that every Wheeler NFA (WNFA) with

n

states admits an equivalent Wheeler DFA (WDFA) with at most

2n-1-|\Sigma|

states that can be computed in

O(n^3)

time. This is in sharp contrast with general NFAs. (ii) We describe a quadratic algorithm to prefix-sort a proper superset of the WDFAs, a

O(n\log n)

-time online algorithm to sort acyclic WDFAs, and an optimal linear-time offline algorithm to sort general WDFAs. By contribution (i), our algorithms can also be used to index any WNFA at the moderate price of doubling the automaton's size. (iii) We provide a minimization theorem that characterizes the smallest WDFA recognizing the same language of any input WDFA. The corresponding constructive algorithm runs in optimal linear time in the acyclic case, and in

O(n\log n)

time in the general case. (iv) We show how to compute the smallest WDFA equivalent to any acyclic DFA in nearly-optimal time.Comment: added minimization theorems; uploaded submitted version; New version with new results (W-MH theorem, linear determinization), added author: Giovanna D'Agostin

arXiv.org e-Print Archive

Crossref

Archivio della ricerca- LUISS Libera Università Internazionale degli Studi Sociali Guido Carli di Roma

Building Efficient and Compact Data Structures for Simplicial Complexes

Author: Boissonnat Jean-Daniel
S. Karthik C.
Tavenas Sébastien
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 08/09/2016
Field of study

The Simplex Tree (ST) is a recently introduced data structure that can represent abstract simplicial complexes of any dimension and allows efficient implementation of a large range of basic operations on simplicial complexes. In this paper, we show how to optimally compress the Simplex Tree while retaining its functionalities. In addition, we propose two new data structures called the Maximal Simplex Tree (MxST) and the Simplex Array List (SAL). We analyze the compressed Simplex Tree, the Maximal Simplex Tree, and the Simplex Array List under various settings.Comment: An extended abstract appeared in the proceedings of SoCG 201

arXiv.org e-Print Archive

HAL-ENS-LYON

INRIA a CCSD electronic archive server

Hal-Diderot

A Transition-Based Directed Acyclic Graph Parser for UCCA

Author: Abend Omri
Hershcovich Daniel
Rappoport Ari
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2017
Field of study

We present the first parser for UCCA, a cross-linguistically applicable framework for semantic representation, which builds on extensive typological work and supports rapid annotation. UCCA poses a challenge for existing parsing techniques, as it exhibits reentrancy (resulting in DAG structures), discontinuous structures and non-terminal nodes corresponding to complex semantic units. To our knowledge, the conjunction of these formal properties is not supported by any existing parser. Our transition-based parser, which uses a novel transition set and features based on bidirectional LSTMs, has value not just for UCCA parsing: its ability to handle more general graph structures can inform the development of parsers for other semantic DAG structures, and in languages that frequently use discontinuous structures.Comment: 16 pages; Accepted as long paper at ACL201

arXiv.org e-Print Archive

Crossref

Streaming Algorithms for Submodular Function Maximization

Author: A Badanidiyuru Varadaraja
A Chakrabarti
A Goyal
A Gupta
A Kulik
G Calinescu
G Calinescu
GL Nemhauser
J Feigenbaum
J Lee
J Lee
J Vondrák
M Bateni
M Feldman
ML Fisher
N Bansal
Y Filmus
Publication venue
Publication date: 29/04/2015
Field of study

We consider the problem of maximizing a nonnegative submodular set function

f:2^{\mathcal{N}} \rightarrow \mathbb{R}^+

subject to a

p

-matchoid constraint in the single-pass streaming setting. Previous work in this context has considered streaming algorithms for modular functions and monotone submodular functions. The main result is for submodular functions that are {\em non-monotone}. We describe deterministic and randomized algorithms that obtain a

\Omega(\frac{1}{p})

-approximation using

O(k \log k)

-space, where

k

is an upper bound on the cardinality of the desired set. The model assumes value oracle access to

f

and membership oracles for the matroids defining the

p

-matchoid constraint.Comment: 29 pages, 7 figures, extended abstract to appear in ICALP 201

arXiv.org e-Print Archive

Crossref

Conceptual Information Compression and Efficient Pattern Search

Author: Angelova Galia
Mihov Stoyan
Publication venue: Institute of Mathematics and Informatics Bulgarian Academy of Sciences
Publication date: 01/01/2008
Field of study

This paper introduces an encoding of knowledge representation statements as regular languages and proposes a two-phase approach to processing of explicitly declared conceptual information. The idea is presented for the simple conceptual graphs where conceptual pattern search is implemented by the so called projection operation. Projection calculations are organised into off-line preprocessing and run-time computations. This enables fast run-time treatment of NP-complete problems, given that the intermediate results of the off-line phase are kept in suitable data structures. The experiments with randomly-generated, middle-size knowledge bases support the claim that the suggested approach radically improves the run-time conceptual pattern search

Bulgarian Digital Mathematics Library at IMI-BAS

Interaction Grammars

Author: Bruno Guillaume
Bruno Guillaume
Guy Perrier
Guy Perrier
Thème Sym
Équipe-projet Calligramme
Publication venue
Publication date: 01/01/2008
Field of study

Interaction Grammar (IG) is a grammatical formalism based on the notion of polarity. Polarities express the resource sensitivity of natural languages by modelling the distinction between saturated and unsaturated syntactic structures. Syntactic composition is represented as a chemical reaction guided by the saturation of polarities. It is expressed in a model-theoretic framework where grammars are constraint systems using the notion of tree description and parsing appears as a process of building tree description models satisfying criteria of saturation and minimality

arXiv.org e-Print Archive

CiteSeerX

INRIA a CCSD electronic archive server

HAL Descartes