5,190 research outputs found

    A Generalization of the Trie Data Structure

    Get PDF
    Tries, a form of string-indexed look-up structure, are generalized to permit indexing by terms built according to an arbitrary signature. The construction is parametric with respect to the type of data to be stored as values; this is essential, because the recursion which defines tries appeals from one value type to others. Trie (for any fixed signature) is then a functor, and the corresponding look-up function is a natural isomorphism. The trie functor is in principle definable by the initial fixed point semantics of Smyth and Plotkin. We simplify the construction, however, by introducing the category-cpo , a class of category within which calculations can retain some domain-theoretic flavor. Our construction of tries extends easily to many-sorted signatures

    The Wavelet Trie: Maintaining an Indexed Sequence of Strings in Compressed Space

    Full text link
    An indexed sequence of strings is a data structure for storing a string sequence that supports random access, searching, range counting and analytics operations, both for exact matches and prefix search. String sequences lie at the core of column-oriented databases, log processing, and other storage and query tasks. In these applications each string can appear several times and the order of the strings in the sequence is relevant. The prefix structure of the strings is relevant as well: common prefixes are sought in strings to extract interesting features from the sequence. Moreover, space-efficiency is highly desirable as it translates directly into higher performance, since more data can fit in fast memory. We introduce and study the problem of compressed indexed sequence of strings, representing indexed sequences of strings in nearly-optimal compressed space, both in the static and dynamic settings, while preserving provably good performance for the supported operations. We present a new data structure for this problem, the Wavelet Trie, which combines the classical Patricia Trie with the Wavelet Tree, a succinct data structure for storing a compressed sequence. The resulting Wavelet Trie smoothly adapts to a sequence of strings that changes over time. It improves on the state-of-the-art compressed data structures by supporting a dynamic alphabet (i.e. the set of distinct strings) and prefix queries, both crucial requirements in the aforementioned applications, and on traditional indexes by reducing space occupancy to close to the entropy of the sequence

    String Indexing for Patterns with Wildcards

    Get PDF
    We consider the problem of indexing a string tt of length nn to report the occurrences of a query pattern pp containing mm characters and jj wildcards. Let occocc be the number of occurrences of pp in tt, and σ\sigma the size of the alphabet. We obtain the following results. - A linear space index with query time O(m+σjloglogn+occ)O(m+\sigma^j \log \log n + occ). This significantly improves the previously best known linear space index by Lam et al. [ISAAC 2007], which requires query time Θ(jn)\Theta(jn) in the worst case. - An index with query time O(m+j+occ)O(m+j+occ) using space O(σk2nlogklogn)O(\sigma^{k^2} n \log^k \log n), where kk is the maximum number of wildcards allowed in the pattern. This is the first non-trivial bound with this query time. - A time-space trade-off, generalizing the index by Cole et al. [STOC 2004]. We also show that these indexes can be generalized to allow variable length gaps in the pattern. Our results are obtained using a novel combination of well-known and new techniques, which could be of independent interest

    Forces from noncommutative geometry

    Get PDF
    Einstein derived general relativity from Riemannian geometry. Connes extends this derivation to noncommutative geometry and obtains electro-magnetic, weak and strong forces. These are pseudo forces, that accompany the gravitational force just as in Minkowskian geometry the magnetic force accompanies the electric force. The main physical input of Connes' derivation is parity violation. His main output is the Higgs boson which breaks the gauge symmetry spontaneously and gives masses to gauge and Higgs bosons.Comment: 15 p. LaTeX, talk at the annuel meeting of the French Physical Society, Strasbourg, july 0
    corecore