2,696 research outputs found
Regular Languages meet Prefix Sorting
Indexing strings via prefix (or suffix) sorting is, arguably, one of the most
successful algorithmic techniques developed in the last decades. Can indexing
be extended to languages? The main contribution of this paper is to initiate
the study of the sub-class of regular languages accepted by an automaton whose
states can be prefix-sorted. Starting from the recent notion of Wheeler graph
[Gagie et al., TCS 2017]-which extends naturally the concept of prefix sorting
to labeled graphs-we investigate the properties of Wheeler languages, that is,
regular languages admitting an accepting Wheeler finite automaton.
Interestingly, we characterize this family as the natural extension of regular
languages endowed with the co-lexicographic ordering: when sorted, the strings
belonging to a Wheeler language are partitioned into a finite number of
co-lexicographic intervals, each formed by elements from a single Myhill-Nerode
equivalence class. Moreover: (i) We show that every Wheeler NFA (WNFA) with
states admits an equivalent Wheeler DFA (WDFA) with at most
states that can be computed in time. This is in sharp contrast with
general NFAs. (ii) We describe a quadratic algorithm to prefix-sort a proper
superset of the WDFAs, a -time online algorithm to sort acyclic
WDFAs, and an optimal linear-time offline algorithm to sort general WDFAs. By
contribution (i), our algorithms can also be used to index any WNFA at the
moderate price of doubling the automaton's size. (iii) We provide a
minimization theorem that characterizes the smallest WDFA recognizing the same
language of any input WDFA. The corresponding constructive algorithm runs in
optimal linear time in the acyclic case, and in time in the
general case. (iv) We show how to compute the smallest WDFA equivalent to any
acyclic DFA in nearly-optimal time.Comment: added minimization theorems; uploaded submitted version; New version
with new results (W-MH theorem, linear determinization), added author:
Giovanna D'Agostin
Building Efficient and Compact Data Structures for Simplicial Complexes
The Simplex Tree (ST) is a recently introduced data structure that can
represent abstract simplicial complexes of any dimension and allows efficient
implementation of a large range of basic operations on simplicial complexes. In
this paper, we show how to optimally compress the Simplex Tree while retaining
its functionalities. In addition, we propose two new data structures called the
Maximal Simplex Tree (MxST) and the Simplex Array List (SAL). We analyze the
compressed Simplex Tree, the Maximal Simplex Tree, and the Simplex Array List
under various settings.Comment: An extended abstract appeared in the proceedings of SoCG 201
A Transition-Based Directed Acyclic Graph Parser for UCCA
We present the first parser for UCCA, a cross-linguistically applicable
framework for semantic representation, which builds on extensive typological
work and supports rapid annotation. UCCA poses a challenge for existing parsing
techniques, as it exhibits reentrancy (resulting in DAG structures),
discontinuous structures and non-terminal nodes corresponding to complex
semantic units. To our knowledge, the conjunction of these formal properties is
not supported by any existing parser. Our transition-based parser, which uses a
novel transition set and features based on bidirectional LSTMs, has value not
just for UCCA parsing: its ability to handle more general graph structures can
inform the development of parsers for other semantic DAG structures, and in
languages that frequently use discontinuous structures.Comment: 16 pages; Accepted as long paper at ACL201
Streaming Algorithms for Submodular Function Maximization
We consider the problem of maximizing a nonnegative submodular set function
subject to a -matchoid
constraint in the single-pass streaming setting. Previous work in this context
has considered streaming algorithms for modular functions and monotone
submodular functions. The main result is for submodular functions that are {\em
non-monotone}. We describe deterministic and randomized algorithms that obtain
a -approximation using -space, where is
an upper bound on the cardinality of the desired set. The model assumes value
oracle access to and membership oracles for the matroids defining the
-matchoid constraint.Comment: 29 pages, 7 figures, extended abstract to appear in ICALP 201
Conceptual Information Compression and Efficient Pattern Search
This paper introduces an encoding of knowledge representation statements
as regular languages and proposes a two-phase approach to
processing of explicitly declared conceptual information. The idea is presented
for the simple conceptual graphs where conceptual pattern search is
implemented by the so called projection operation. Projection calculations
are organised into off-line preprocessing and run-time computations. This
enables fast run-time treatment of NP-complete problems, given that the
intermediate results of the off-line phase are kept in suitable data structures.
The experiments with randomly-generated, middle-size knowledge
bases support the claim that the suggested approach radically improves the
run-time conceptual pattern search
Interaction Grammars
Interaction Grammar (IG) is a grammatical formalism based on the notion of
polarity. Polarities express the resource sensitivity of natural languages by
modelling the distinction between saturated and unsaturated syntactic
structures. Syntactic composition is represented as a chemical reaction guided
by the saturation of polarities. It is expressed in a model-theoretic framework
where grammars are constraint systems using the notion of tree description and
parsing appears as a process of building tree description models satisfying
criteria of saturation and minimality
- …