Search CORE

1,060 research outputs found

Computing Covers Using Prefix Tables

Author: Alatabbi Ali
Rahman M. Sohel
Smyth W. F.
Publication venue
Publication date: 01/01/2015
Field of study

An \emph{indeterminate string}

x = x[1..n]

on an alphabet

\Sigma

is a sequence of nonempty subsets of

\Sigma

;

x

is said to be \emph{regular} if every subset is of size one. A proper substring

u

of regular

x

is said to be a \emph{cover} of

x

iff for every

i \in 1..n

, an occurrence of

u

x

includes

x[i]

. The \emph{cover array}

\gamma = \gamma[1..n]

x

is an integer array such that

\gamma[i]

is the longest cover of

x[1..i]

. Fifteen years ago a complex, though nevertheless linear-time, algorithm was proposed to compute the cover array of regular

x

based on prior computation of the border array of

x

. In this paper we first describe a linear-time algorithm to compute the cover array of regular string

x

based on the prefix table of

x

. We then extend this result to indeterminate strings.Comment: 14 pages, 1 figur

arXiv.org e-Print Archive

Research Repository

King's Research Portal

String Covering: A Survey

Author: Mhaskar Neerja
Smyth W. F.
Publication venue
Publication date: 21/11/2022
Field of study

The study of strings is an important combinatorial field that precedes the digital computer. Strings can be very long, trillions of letters, so it is important to find compact representations. Here we first survey various forms of one potential compaction methodology, the cover of a given string x, initially proposed in a simple form in 1990, but increasingly of interest as more sophisticated variants have been discovered. We then consider covering by a seed; that is, a cover of a superstring of x. We conclude with many proposals for research directions that could make significant contributions to string processing in future

arXiv.org e-Print Archive

Episciences.org

Enhanced covers of regular & indeterminate strings using prefix tables

Author: Alatabbi A.
Rahman M.S.
Simpson J.
Smyth W.F.
Sohidull Islam A.S.
Publication venue: 'Cornell University Library'
Publication date: 01/01/2015
Field of study

A \itbf{cover} of a string x=x[1..n] is a proper substring u of x such that x can be constructed from possibly overlapping instances of u. A recent paper \cite{FIKPPST13} relaxes this definition --- an \itbf{enhanced cover} u of x is a border of x (that is, a proper prefix that is also a suffix) that covers a {\it maximum} number of positions in x (not necessarily all) --- and proposes efficient algorithms for the computation of enhanced covers. These algorithms depend on the prior computation of the \itbf{border array} β[1..n], where β[i] is the length of the longest border of x[1..i], 1≤i≤n. In this paper, we first show how to compute enhanced covers using instead the \itbf{prefix table}: an array π[1..n] such that π[i] is the length of the longest substring of x beginning at position i that matches a prefix of x. Unlike the border array, the prefix table is robust: its properties hold also for \itbf{indeterminate strings} --- that is, strings defined on {\it subsets} of the alphabet Σ rather than individual elements of Σ. Thus, our algorithms, in addition to being faster in practice and more space-efficient than those of \cite{FIKPPST13}, allow us to easily extend the computation of enhanced covers to indeterminate strings. Both for regular and indeterminate strings, our algorithms execute in expected linear time. Along the way we establish an important theoretical result: that the expected maximum length of any border of any prefix of a regular string x is approximately 1.64 for binary alphabets, less for larger one

Research Repository

Quasi-Periodicity in Streams

Author: Gawrychowski Pawel
Radoszewski Jakub
Starikovskaya Tatiana
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019)
Publication date: 01/01/2019
Field of study

In this work, we show two streaming algorithms for computing the length of the shortest cover of a string of length n. We start by showing a two-pass algorithm that uses O(log^2 n) space and then show a one-pass streaming algorithm that uses O(sqrt{n log n}) space. Both algorithms run in near-linear time. The algorithms are randomized and compute the answer incorrectly with probability inverse-polynomial in n. We also show that there is no sublinear-space streaming algorithm for computing the length of the shortest seed of a string

INRIA a CCSD electronic archive server

Dagstuhl Research Online Publication Server

Automatic Normalization of Temporal Expressions

Author: Ceri Binding
Douglas Tudhope
Publication venue: 'Ubiquity Press, Ltd.'
Publication date: 01/03/2023
Field of study

Dates, periods and timespans are described in archaeological datasets using a number of different textual patterns for which myriad variations exist, rendering direct automated comparison difficult. The issue can occur even within records from the same dataset and is further compounded when attempting to integrate multilingual data – particularly where dates may be expressed in words rather than numbers. The same problem can be found in temporal metadata, whether manually entered or generated via Natural Language Processing (NLP) techniques from reports and grey literature. Resolving and normalizing dates and periods to internationally agreed standard formats enables efficient data integration, interchange, search, comparison and visualization. This paper reports on the design and implementation of a tool to normalize temporal expressions to a numerical time axis and reflects on key issues. Textual patterns for seven categories of temporal expression have been normalized: Ordinal named or numbered centuries; Year spans; Single year (with tolerance); Decades; Century spans; Single year with prefix; Named periods. The following languages are currently supported: Dutch, English, French, German, Italian, Norwegian, Spanish, Swedish, Welsh. Methods are described together with an (open source) normalization tool developed in Python and four applications of the method are discussed, together with limitations and future work. Results are presented from diverse data sets and languages. The input is a temporal text string and a language code (ISO639-1). The output is a tab delimited text file with start/end years (in ISO 8601 format), relative to Common Era (CE). The normalized outputs are provided as additional attributes along with the original text expression for consuming software to employ in end-user applications

Directory of Open Access Journals

University of South Wales Research Explorer

31th International Symposium on Theoretical Aspects of Computer Science: STACS '14, March 5th to March 8th, 2014, Lyon, France

Author: STACS <31 2014, Lyon>
Publication venue: Schloss Dagstuhl - Leibniz-Zentrum für Informatik
Publication date: 01/03/2014
Field of study

Digitale Bibliothek Thüringen

28th Annual Symposium on Combinatorial Pattern Matching : CPM 2017, July 4-6, 2017, Warsaw, Poland

Author
Publication venue: Schloss Dagstuhl - Leibniz-Zentrum für Informatik
Publication date: 01/07/2017
Field of study

Peer reviewe

Helsingin yliopiston digitaalinen arkisto

Tracing the Compositional Process. Sound art that rewrites its own past: formation, praxis and a computer framework

Author: Rutz Hanns Holger
Publication venue: Plymouth University
Publication date: 01/01/2014
Field of study

The domain of this thesis is electroacoustic computer-based music and sound art. It investigates a facet of composition which is often neglected or ill-defined: the process of composing itself and its embedding in time. Previous research mostly focused on instrumental composition or, when electronic music was included, the computer was treated as a tool which would eventually be subtracted from the equation. The aim was either to explain a resultant piece of music by reconstructing the intention of the composer, or to explain human creativity by building a model of the mind. Our aim instead is to understand composition as an irreducible unfolding of material traces which takes place in its own temporality. This understanding is formalised as a software framework that traces creation time as a version graph of transactions. The instantiation and manipulation of any musical structure implemented within this framework is thereby automatically stored in a database. Not only can it be queried ex post by an external researcher—providing a new quality for the empirical analysis of the activity of composing—but it is an integral part of the composition environment. Therefore it can recursively become a source for the ongoing composition and introduce new ways of aesthetic expression. The framework aims to unify creation and performance time, fixed and generative composition, human and algorithmic “writing”, a writing that includes indeterminate elements which condense as concurrent vertices in the version graph. The second major contribution is a critical epistemological discourse on the question of ob- servability and the function of observation. Our goal is to explore a new direction of artistic research which is characterised by a mixed methodology of theoretical writing, technological development and artistic practice. The form of the thesis is an exercise in becoming process-like itself, wherein the epistemic thing is generated by translating the gaps between these three levels. This is my idea of the new aesthetics: That through the operation of a re-entry one may establish a sort of process “form”, yielding works which go beyond a categorical either “sound-in-itself” or “conceptualism”. Exemplary processes are revealed by deconstructing a series of existing pieces, as well as through the successful application of the new framework in the creation of new pieces

Plymouth Electronic Archive and Research Library

30th International Symposium on Theoretical Aspects of Computer Science: STACS '13, February 27th to March 2nd, 2013, Kiel, Germany

Author: STACS <30 2013, Kiel>
Publication venue: Schloss Dagstuhl - Leibniz-Zentrum für Informatik
Publication date: 01/02/2013
Field of study

Digitale Bibliothek Thüringen