57,277 research outputs found
String Comparison in -Order: New Lexicographic Properties & On-line Applications
-order is a global order on strings related to Unique Maximal
Factorization Families (UMFFs), which are themselves generalizations of Lyndon
words. -order has recently been proposed as an alternative to
lexicographical order in the computation of suffix arrays and in the
suffix-sorting induced by the Burrows-Wheeler transform. Efficient -ordering
of strings thus becomes a matter of considerable interest. In this paper we
present new and surprising results on -order in strings, then go on to
explore the algorithmic consequences
Cosmic structure formation in Hybrid Inflation models
A wide class of inflationary models, known as Hybrid Inflation models, may
produce topological defects during a phase transition at the end of the
inflationary epoch. We point out that, if the energy scale of these defects is
close to that of Grand Unification, then their effect on cosmic structure
formation and the generation of microwave background anisotropies cannot be
ignored. Therefore, it is possible for structure to be seeded by a combination
of the adiabatic perturbations produced during inflation and active
isocurvature perturbations produced by defects. Since the two mechanisms are
uncorrelated the power spectra can be computed by a weighted average of the
individual contributions. We investigate the possible observational
consequences of this with reference to general Hybrid Inflation models and also
a specific model based on Supergravity. These mixed perturbation scenarios have
some novel observational consequences and these are discussed qualitatively.Comment: 22 Page
SiGMa: Simple Greedy Matching for Aligning Large Knowledge Bases
The Internet has enabled the creation of a growing number of large-scale
knowledge bases in a variety of domains containing complementary information.
Tools for automatically aligning these knowledge bases would make it possible
to unify many sources of structured knowledge and answer complex queries.
However, the efficient alignment of large-scale knowledge bases still poses a
considerable challenge. Here, we present Simple Greedy Matching (SiGMa), a
simple algorithm for aligning knowledge bases with millions of entities and
facts. SiGMa is an iterative propagation algorithm which leverages both the
structural information from the relationship graph as well as flexible
similarity measures between entity properties in a greedy local search, thus
making it scalable. Despite its greedy nature, our experiments indicate that
SiGMa can efficiently match some of the world's largest knowledge bases with
high precision. We provide additional experiments on benchmark datasets which
demonstrate that SiGMa can outperform state-of-the-art approaches both in
accuracy and efficiency.Comment: 10 pages + 2 pages appendix; 5 figures -- initial preprin
Succinct Dictionary Matching With No Slowdown
The problem of dictionary matching is a classical problem in string matching:
given a set S of d strings of total length n characters over an (not
necessarily constant) alphabet of size sigma, build a data structure so that we
can match in a any text T all occurrences of strings belonging to S. The
classical solution for this problem is the Aho-Corasick automaton which finds
all occ occurrences in a text T in time O(|T| + occ) using a data structure
that occupies O(m log m) bits of space where m <= n + 1 is the number of states
in the automaton. In this paper we show that the Aho-Corasick automaton can be
represented in just m(log sigma + O(1)) + O(d log(n/d)) bits of space while
still maintaining the ability to answer to queries in O(|T| + occ) time. To the
best of our knowledge, the currently fastest succinct data structure for the
dictionary matching problem uses space O(n log sigma) while answering queries
in O(|T|log log n + occ) time. In this paper we also show how the space
occupancy can be reduced to m(H0 + O(1)) + O(d log(n/d)) where H0 is the
empirical entropy of the characters appearing in the trie representation of the
set S, provided that sigma < m^epsilon for any constant 0 < epsilon < 1. The
query time remains unchanged.Comment: Corrected typos and other minor error
- …