2,695 research outputs found

    Succinct Dictionary Matching With No Slowdown

    Full text link
    The problem of dictionary matching is a classical problem in string matching: given a set S of d strings of total length n characters over an (not necessarily constant) alphabet of size sigma, build a data structure so that we can match in a any text T all occurrences of strings belonging to S. The classical solution for this problem is the Aho-Corasick automaton which finds all occ occurrences in a text T in time O(|T| + occ) using a data structure that occupies O(m log m) bits of space where m <= n + 1 is the number of states in the automaton. In this paper we show that the Aho-Corasick automaton can be represented in just m(log sigma + O(1)) + O(d log(n/d)) bits of space while still maintaining the ability to answer to queries in O(|T| + occ) time. To the best of our knowledge, the currently fastest succinct data structure for the dictionary matching problem uses space O(n log sigma) while answering queries in O(|T|log log n + occ) time. In this paper we also show how the space occupancy can be reduced to m(H0 + O(1)) + O(d log(n/d)) where H0 is the empirical entropy of the characters appearing in the trie representation of the set S, provided that sigma < m^epsilon for any constant 0 < epsilon < 1. The query time remains unchanged.Comment: Corrected typos and other minor error

    A Computational Interpretation of Context-Free Expressions

    Full text link
    We phrase parsing with context-free expressions as a type inhabitation problem where values are parse trees and types are context-free expressions. We first show how containment among context-free and regular expressions can be reduced to a reachability problem by using a canonical representation of states. The proofs-as-programs principle yields a computational interpretation of the reachability problem in terms of a coercion that transforms the parse tree for a context-free expression into a parse tree for a regular expression. It also yields a partial coercion from regular parse trees to context-free ones. The partial coercion from the trivial language of all words to a context-free expression corresponds to a predictive parser for the expression

    Correction: Relative abundance of and composition within fungal orders differ between cheatgrass (Bromus tectorum) and sagebrush (Artemisia tridentata)-associated soils

    Get PDF
    Nonnative Bromus tectorum (cheatgrass) is decimating sagebrush steppe, one of the largest ecosystems in the Western United States, and is causing regional-scale shifts in the predominant plant-fungal interactions. Sagebrush, a native perennial, hosts arbuscular mycorrhizal fungi (AMF), whereas cheatgrass, a winter annual, is a relatively poor host of AMF. This shift is likely intertwined with decreased carbon (C)-sequestration in cheatgrass-invaded soils and alterations in overall soil fungal community composition and structure, but the latter remain unresolved. We examined soil fungal communities using high throughput amplicon sequencing (ribosomal large subunit gene) in the 0-4 cm and 4-8 cm depth intervals of six cores from cheatgrass- and six cores from sagebrush-dominated soils. Sagebrush core surfaces (0-4 cm) contained higher nitrogen and total C than cheatgrass core surfaces; these differences mirrored the presence of glomalin related soil proteins (GRSP), which has been associated with AMF activity and increased C-sequestration. Fungal richness was not significantly affected by vegetation type, depth or an interaction of the two factors. However, the relative abundance of seven taxonomic orders was significantly affected by vegetation type or the interaction between vegetation type and depth. Teloschistales, Spizellomycetales, Pezizales and Cantharellales were more abundant in sagebrush libraries and contain mycorrhizal, lichenized and basal lineages of fungi. Only two orders (Coniochaetales and Sordariales), which contain numerous economically important pathogens and opportunistic saprotrophs, were more abundant in cheatgrass libraries. Pleosporales, Agaricales, Helotiales and Hypocreales were most abundant across all libraries, but the number of genera detected within these orders was as much as 29 times lower in cheatgrass relative to sagebrush libraries. These compositional differences between fungal communities associated with cheatgrass- and sagebrush-dominated soils warrant future research to examine soil fungal community composition across more sites and time points as well as in association with native grass species that also occupy cheatgrass-invaded ecosystems

    Linear Parsing Expression Grammars

    Full text link
    PEGs were formalized by Ford in 2004, and have several pragmatic operators (such as ordered choice and unlimited lookahead) for better expressing modern programming language syntax. Since these operators are not explicitly defined in the classic formal language theory, it is significant and still challenging to argue PEGs' expressiveness in the context of formal language theory.Since PEGs are relatively new, there are several unsolved problems.One of the problems is revealing a subclass of PEGs that is equivalent to DFAs. This allows application of some techniques from the theory of regular grammar to PEGs. In this paper, we define Linear PEGs (LPEGs), a subclass of PEGs that is equivalent to DFAs. Surprisingly, LPEGs are formalized by only excluding some patterns of recursive nonterminal in PEGs, and include the full set of ordered choice, unlimited lookahead, and greedy repetition, which are characteristic of PEGs. Although the conversion judgement of parsing expressions into DFAs is undecidable in general, the formalism of LPEGs allows for a syntactical judgement of parsing expressions.Comment: Parsing expression grammars, Boolean finite automata, Packrat parsin

    Constructing multiple unique input/output sequences using metaheuristic optimisation techniques

    Get PDF
    Multiple unique input/output sequences (UIOs) are often used to generate robust and compact test sequences in finite state machine (FSM) based testing. However, computing UIOs is NP-hard. Metaheuristic optimisation techniques (MOTs) such as genetic algorithms (GAs) and simulated annealing (SA) are effective in providing good solutions for some NP-hard problems. In the paper, the authors investigate the construction of UIOs by using MOTs. They define a fitness function to guide the search for potential UIOs and use sharing techniques to encourage MOTs to locate UIOs that are calculated as local optima in a search domain. They also compare the performance of GA and SA for UIO construction. Experimental results suggest that, after using a sharing technique, both GA and SA can find a majority of UIOs from the models under test
    • 

    corecore