Search CORE

705 research outputs found

Computing Covers under Substring Consistent Equivalence Relations

Author: A Amir
A Amir
A Amir
A Apostolico
A Apostolico
A Apostolico
BS Baker
C Iliopoulos
CS Iliopoulos
D Breslauer
D Moore
D Moore
DE Knuth
G Gourdel
GS Brodal
J Kim
M Christou
M Christou
M Kubica
T Ehlers
Y Li
Y Matsuoka
Publication venue
Publication date: 30/07/2020
Field of study

Covers are a kind of quasiperiodicity in strings. A string

C

is a cover of another string

T

if any position of

T

is inside some occurrence of

C

T

. The shortest and longest cover arrays of

T

have the lengths of the shortest and longest covers of each prefix of

T

, respectively. The literature has proposed linear-time algorithms computing longest and shortest cover arrays taking border arrays as input. An equivalence relation

\approx

over strings is called a substring consistent equivalence relation (SCER) iff

X \approx Y

implies (1)

|X| = |Y|

and (2)

X[i:j] \approx Y[i:j]

for all

1 \le i \le j \le |X|

. In this paper, we generalize the notion of covers for SCERs and prove that existing algorithms to compute the shortest cover array and the longest cover array of a string

T

under the identity relation will work for any SCERs taking the accordingly generalized border arrays.Comment: 16 page

arXiv.org e-Print Archive

Crossref

Computing NP-Hard Repetitiveness Measures via MAX-SAT

Author: Bannai Hideo
Goto Keisuke
Ishihata Masakazu
Kanda Shunsuke
Nishimoto Takaaki
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 30th Annual European Symposium on Algorithms (ESA 2022)
Publication date: 01/01/2022
Field of study

Repetitiveness measures reveal profound characteristics of datasets, and give rise to compressed data structures and algorithms working in compressed space. Alas, the computation of some of these measures is NP-hard, and straight-forward computation is infeasible for datasets of even small sizes. Three such measures are the smallest size of a string attractor, the smallest size of a bidirectional macro scheme, and the smallest size of a straight-line program. While a vast variety of implementations for heuristically computing approximations exist, exact computation of these measures has received little to no attention. In this paper, we present MAX-SAT formulations that provide the first non-trivial implementations for exact computation of smallest string attractors, smallest bidirectional macro schemes, and smallest straight-line programs. Computational experiments show that our implementations work for texts of length up to a few hundred for straight-line programs and bidirectional macro schemes, and texts even over a million for string attractors

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

A Formal Framework for Linguistic Annotation

Author: Bird Steven
Liberman Mark
Publication venue
Publication date: 01/01/1999
Field of study

`Linguistic annotation' covers any descriptive or analytic notations applied to raw language data. The basic data may be in the form of time functions -- audio, video and/or physiological recordings -- or it may be textual. The added notations may include transcriptions of all sorts (from phonetic features to discourse structures), part-of-speech and sense tagging, syntactic analysis, `named entity' identification, co-reference annotation, and so on. While there are several ongoing efforts to provide formats and tools for such annotations and to publish annotated linguistic databases, the lack of widely accepted standards is becoming a critical problem. Proposed standards, to the extent they exist, have focussed on file formats. This paper focuses instead on the logical structure of linguistic annotations. We survey a wide variety of existing annotation formats and demonstrate a common conceptual core, the annotation graph. This provides a formal framework for constructing, maintaining and searching linguistic annotations, while remaining consistent with many alternative data structures and file formats.Comment: 49 page

arXiv.org e-Print Archive

CiteSeerX

ScholarlyCommons@Penn

部分文字列一貫同値関係の下での文字列パターン照合問題のためのduel-and-sweepアルゴリズム

Author: Jargalsaikhan Davaajav
Publication venue
Publication date: 25/03/2022
Field of study

Tohoku University篠原歩課

Tohoku University Repository (TOUR) / 東北大学機関リポジトリ

Combinatorial generation via permutation languages. II. Lattice congruences

Author: Hoang Hung Phuc
Mütze Torsten
Publication venue
Publication date: 24/06/2020
Field of study

This paper deals with lattice congruences of the weak order on the symmetric group, and initiates the investigation of the cover graphs of the corresponding lattice quotients. These graphs also arise as the skeleta of the so-called quotientopes, a family of polytopes recently introduced by Pilaud and Santos [Bull. Lond. Math. Soc., 51:406-420, 2019], which generalize permutahedra, associahedra, hypercubes and several other polytopes. We prove that all of these graphs have a Hamilton path, which can be computed by a simple greedy algorithm. This is an application of our framework for exhaustively generating various classes of combinatorial objects by encoding them as permutations. We also characterize which of these graphs are vertex-transitive or regular via their arc diagrams, give corresponding precise and asymptotic counting results, and we determine their minimum and maximum degrees. Moreover, we investigate the relation between lattice congruences of the weak order and pattern-avoiding permutations

arXiv.org e-Print Archive

Recommended from our members

New Applications of the Nearest-Neighbor Chain Algorithm

Author: Mamano Grande Nil
Publication venue: eScholarship, University of California
Publication date: 01/01/2019
Field of study

The nearest-neighbor chain algorithm was proposed in the eighties as a way to speed up certain hierarchical clustering algorithms. In the first part of the dissertation, we show that its application is not limited to clustering. We apply it to a variety of geometric and combinatorial problems. In each case, we show that the nearest-neighbor chain algorithm finds the same solution as a preexistent greedy algorithm, but often with an improved runtime. We obtain speedups over greedy algorithms for Euclidean TSP, Steiner TSP in planar graphs, straight skeletons, a geometric coverage problem, and three stable matching models. In the second part, we study the stable-matching Voronoi diagram, a type of plane partition which combines properties of stable matchings and Voronoi diagrams. We propose political redistricting as an application. We also show that it is impossible to compute this diagram in an algebraic model of computation, and give three algorithmic approaches to overcome this obstacle. One of them is based on the nearest-neighbor chain algorithm, linking the two parts together

eScholarship - University of California