409 research outputs found

    Online Pattern Matching for String Edit Distance with Moves

    Full text link
    Edit distance with moves (EDM) is a string-to-string distance measure that includes substring moves in addition to ordinal editing operations to turn one string to the other. Although optimizing EDM is intractable, it has many applications especially in error detections. Edit sensitive parsing (ESP) is an efficient parsing algorithm that guarantees an upper bound of parsing discrepancies between different appearances of the same substrings in a string. ESP can be used for computing an approximate EDM as the L1 distance between characteristic vectors built by node labels in parsing trees. However, ESP is not applicable to a streaming text data where a whole text is unknown in advance. We present an online ESP (OESP) that enables an online pattern matching for EDM. OESP builds a parse tree for a streaming text and computes the L1 distance between characteristic vectors in an online manner. For the space-efficient computation of EDM, OESP directly encodes the parse tree into a succinct representation by leveraging the idea behind recent results of a dynamic succinct tree. We experimentally test OESP on the ability to compute EDM in an online manner on benchmark datasets, and we show OESP's efficiency.Comment: This paper has been accepted to the 21st edition of the International Symposium on String Processing and Information Retrieval (SPIRE2014

    Modelling of Crack Growth Processes by Introduction of a Characteristic Length

    Get PDF

    Online Self-Indexed Grammar Compression

    Full text link
    Although several grammar-based self-indexes have been proposed thus far, their applicability is limited to offline settings where whole input texts are prepared, thus requiring to rebuild index structures for given additional inputs, which is often the case in the big data era. In this paper, we present the first online self-indexed grammar compression named OESP-index that can gradually build the index structure by reading input characters one-by-one. Such a property is another advantage which enables saving a working space for construction, because we do not need to store input texts in memory. We experimentally test OESP-index on the ability to build index structures and search query texts, and we show OESP-index's efficiency, especially space-efficiency for building index structures.Comment: To appear in the Proceedings of the 22nd edition of the International Symposium on String Processing and Information Retrieval (SPIRE2015

    Identification of Dynamical Properties from Correlation Function Estimates

    Get PDF

    Danskernes forventninger til selvkørende biler.

    Get PDF
    Vejdirektoratet har sammen med Wilke undersøgt danskernes forventninger til selvkørende biler med vægt på accept, drivere, bekymringer og brugsforventninger. I overensstemmelse med andre undersøgelser forventer danskerne at køre mere i bil i en situation med helt selvkørende biler. Forventningen dækker både over øget kørsel hverdage og weekends/ferie, længere transportafstande, samt over at forskellige grupper der i dag enten ikke kan køre i bil, eller kan lide at køre bil, forventer at kunne køre mere. De fordele som befolkningen forventer at selvkørende biler giver er især øget sikkerhed, aflastningen fra køre-opgaver og muligheden for at slappe af undervejs. Bekymringerne knytter sig til at miste færdigheder og tabe kontrol, samt spørgsmål om ansvar og muligheder for misbrug/hacking.I forhold til accept og hvem der forventninger at bruge selvkørende biler når det bliver muligt, afspejler undersøgelsens resultater både en sammenhæng med behovet for biltransport, oplevelsen af trængsel; en generel betydning af alder og teknologiorientering; og en mulig betydning af befolkningens erfaringer med fører-støtte systemer i biler (delvis automatisering)

    Linear-Time Computation of Cyclic Roots and Cyclic Covers of a String

    Get PDF
    Cyclic versions of covers and roots of a string are considered in this paper. A prefix V of a string S is a cyclic root of S if S is a concatenation of cyclic rotations of V. A prefix V of S is a cyclic cover of S if the occurrences of the cyclic rotations of V cover all positions of S. We present ?(n)-time algorithms computing all cyclic roots (using number-theoretic tools) and all cyclic covers (using tools related to seeds) of a length-n string over an integer alphabet. Our results improve upon ?(n log log n) and ?(n log n) time complexities of recent algorithms of Grossi et al. (WALCOM 2023) for the respective problems and provide novel approaches to the problems. As a by-product, we obtain an optimal data structure for Internal Circular Pattern Matching queries that generalize Internal Pattern Matching and Cyclic Equivalence queries of Kociumaka et al. (SODA 2015)

    Composite repetition-aware data structures

    Get PDF
    In highly repetitive strings, like collections of genomes from the same species, distinct measures of repetition all grow sublinearly in the length of the text, and indexes targeted to such strings typically depend only on one of these measures. We describe two data structures whose size depends on multiple measures of repetition at once, and that provide competitive tradeoffs between the time for counting and reporting all the exact occurrences of a pattern, and the space taken by the structure. The key component of our constructions is the run-length encoded BWT (RLBWT), which takes space proportional to the number of BWT runs: rather than augmenting RLBWT with suffix array samples, we combine it with data structures from LZ77 indexes, which take space proportional to the number of LZ77 factors, and with the compact directed acyclic word graph (CDAWG), which takes space proportional to the number of extensions of maximal repeats. The combination of CDAWG and RLBWT enables also a new representation of the suffix tree, whose size depends again on the number of extensions of maximal repeats, and that is powerful enough to support matching statistics and constant-space traversal.Comment: (the name of the third co-author was inadvertently omitted from previous version

    Compressed Subsequence Matching and Packed Tree Coloring

    Get PDF
    We present a new algorithm for subsequence matching in grammar compressed strings. Given a grammar of size nn compressing a string of size NN and a pattern string of size mm over an alphabet of size σ\sigma, our algorithm uses O(n+nσw)O(n+\frac{n\sigma}{w}) space and O(n+nσw+mlogNlogwocc)O(n+\frac{n\sigma}{w}+m\log N\log w\cdot occ) or O(n+nσwlogw+mlogNocc)O(n+\frac{n\sigma}{w}\log w+m\log N\cdot occ) time. Here ww is the word size and occocc is the number of occurrences of the pattern. Our algorithm uses less space than previous algorithms and is also faster for occ=o(nlogN)occ=o(\frac{n}{\log N}) occurrences. The algorithm uses a new data structure that allows us to efficiently find the next occurrence of a given character after a given position in a compressed string. This data structure in turn is based on a new data structure for the tree color problem, where the node colors are packed in bit strings.Comment: To appear at CPM '1
    corecore