705 research outputs found

    Computing Covers under Substring Consistent Equivalence Relations

    Full text link
    Covers are a kind of quasiperiodicity in strings. A string CC is a cover of another string TT if any position of TT is inside some occurrence of CC in TT. The shortest and longest cover arrays of TT have the lengths of the shortest and longest covers of each prefix of TT, respectively. The literature has proposed linear-time algorithms computing longest and shortest cover arrays taking border arrays as input. An equivalence relation \approx over strings is called a substring consistent equivalence relation (SCER) iff XYX \approx Y implies (1) X=Y|X| = |Y| and (2) X[i:j]Y[i:j]X[i:j] \approx Y[i:j] for all 1ijX1 \le i \le j \le |X|. In this paper, we generalize the notion of covers for SCERs and prove that existing algorithms to compute the shortest cover array and the longest cover array of a string TT under the identity relation will work for any SCERs taking the accordingly generalized border arrays.Comment: 16 page

    Computing NP-Hard Repetitiveness Measures via MAX-SAT

    Get PDF
    Repetitiveness measures reveal profound characteristics of datasets, and give rise to compressed data structures and algorithms working in compressed space. Alas, the computation of some of these measures is NP-hard, and straight-forward computation is infeasible for datasets of even small sizes. Three such measures are the smallest size of a string attractor, the smallest size of a bidirectional macro scheme, and the smallest size of a straight-line program. While a vast variety of implementations for heuristically computing approximations exist, exact computation of these measures has received little to no attention. In this paper, we present MAX-SAT formulations that provide the first non-trivial implementations for exact computation of smallest string attractors, smallest bidirectional macro schemes, and smallest straight-line programs. Computational experiments show that our implementations work for texts of length up to a few hundred for straight-line programs and bidirectional macro schemes, and texts even over a million for string attractors

    A Formal Framework for Linguistic Annotation

    Get PDF
    `Linguistic annotation' covers any descriptive or analytic notations applied to raw language data. The basic data may be in the form of time functions -- audio, video and/or physiological recordings -- or it may be textual. The added notations may include transcriptions of all sorts (from phonetic features to discourse structures), part-of-speech and sense tagging, syntactic analysis, `named entity' identification, co-reference annotation, and so on. While there are several ongoing efforts to provide formats and tools for such annotations and to publish annotated linguistic databases, the lack of widely accepted standards is becoming a critical problem. Proposed standards, to the extent they exist, have focussed on file formats. This paper focuses instead on the logical structure of linguistic annotations. We survey a wide variety of existing annotation formats and demonstrate a common conceptual core, the annotation graph. This provides a formal framework for constructing, maintaining and searching linguistic annotations, while remaining consistent with many alternative data structures and file formats.Comment: 49 page

    Combinatorial generation via permutation languages. II. Lattice congruences

    Full text link
    This paper deals with lattice congruences of the weak order on the symmetric group, and initiates the investigation of the cover graphs of the corresponding lattice quotients. These graphs also arise as the skeleta of the so-called quotientopes, a family of polytopes recently introduced by Pilaud and Santos [Bull. Lond. Math. Soc., 51:406-420, 2019], which generalize permutahedra, associahedra, hypercubes and several other polytopes. We prove that all of these graphs have a Hamilton path, which can be computed by a simple greedy algorithm. This is an application of our framework for exhaustively generating various classes of combinatorial objects by encoding them as permutations. We also characterize which of these graphs are vertex-transitive or regular via their arc diagrams, give corresponding precise and asymptotic counting results, and we determine their minimum and maximum degrees. Moreover, we investigate the relation between lattice congruences of the weak order and pattern-avoiding permutations
    corecore