5,190 research outputs found
A Generalization of the Trie Data Structure
Tries, a form of string-indexed look-up structure, are generalized to permit indexing by terms built according to an arbitrary signature. The construction is parametric with respect to the type of data to be stored as values; this is essential, because the recursion which defines tries appeals from one value type to others. Trie (for any fixed signature) is then a functor, and the corresponding look-up function is a natural isomorphism. The trie functor is in principle definable by the initial fixed point semantics of Smyth and Plotkin. We simplify the construction, however, by introducing the category-cpo , a class of category within which calculations can retain some domain-theoretic flavor. Our construction of tries extends easily to many-sorted signatures
The Wavelet Trie: Maintaining an Indexed Sequence of Strings in Compressed Space
An indexed sequence of strings is a data structure for storing a string
sequence that supports random access, searching, range counting and analytics
operations, both for exact matches and prefix search. String sequences lie at
the core of column-oriented databases, log processing, and other storage and
query tasks. In these applications each string can appear several times and the
order of the strings in the sequence is relevant. The prefix structure of the
strings is relevant as well: common prefixes are sought in strings to extract
interesting features from the sequence. Moreover, space-efficiency is highly
desirable as it translates directly into higher performance, since more data
can fit in fast memory.
We introduce and study the problem of compressed indexed sequence of strings,
representing indexed sequences of strings in nearly-optimal compressed space,
both in the static and dynamic settings, while preserving provably good
performance for the supported operations.
We present a new data structure for this problem, the Wavelet Trie, which
combines the classical Patricia Trie with the Wavelet Tree, a succinct data
structure for storing a compressed sequence. The resulting Wavelet Trie
smoothly adapts to a sequence of strings that changes over time. It improves on
the state-of-the-art compressed data structures by supporting a dynamic
alphabet (i.e. the set of distinct strings) and prefix queries, both crucial
requirements in the aforementioned applications, and on traditional indexes by
reducing space occupancy to close to the entropy of the sequence
String Indexing for Patterns with Wildcards
We consider the problem of indexing a string of length to report the
occurrences of a query pattern containing characters and wildcards.
Let be the number of occurrences of in , and the size of
the alphabet. We obtain the following results.
- A linear space index with query time .
This significantly improves the previously best known linear space index by Lam
et al. [ISAAC 2007], which requires query time in the worst case.
- An index with query time using space , where is the maximum number of wildcards allowed in the pattern.
This is the first non-trivial bound with this query time.
- A time-space trade-off, generalizing the index by Cole et al. [STOC 2004].
We also show that these indexes can be generalized to allow variable length
gaps in the pattern. Our results are obtained using a novel combination of
well-known and new techniques, which could be of independent interest
Forces from noncommutative geometry
Einstein derived general relativity from Riemannian geometry. Connes extends
this derivation to noncommutative geometry and obtains electro-magnetic, weak
and strong forces. These are pseudo forces, that accompany the gravitational
force just as in Minkowskian geometry the magnetic force accompanies the
electric force. The main physical input of Connes' derivation is parity
violation. His main output is the Higgs boson which breaks the gauge symmetry
spontaneously and gives masses to gauge and Higgs bosons.Comment: 15 p. LaTeX, talk at the annuel meeting of the French Physical
Society, Strasbourg, july 0
- …