Search CORE

57,277 research outputs found

String Comparison in $V$ -Order: New Lexicographic Properties & On-line Applications

Author: Alatabbi Ali
Daykin Jacqueline W.
Rahman M. Sohel
Smyth W. F.
Publication venue
Publication date: 01/01/2015
Field of study

V

-order is a global order on strings related to Unique Maximal Factorization Families (UMFFs), which are themselves generalizations of Lyndon words.

V

-order has recently been proposed as an alternative to lexicographical order in the computation of suffix arrays and in the suffix-sorting induced by the Burrows-Wheeler transform. Efficient

V

-ordering of strings thus becomes a matter of considerable interest. In this paper we present new and surprising results on

V

-order in strings, then go on to explore the algorithmic consequences

arXiv.org e-Print Archive

Research Repository

Cosmic structure formation in Hybrid Inflation models

A wide class of inflationary models, known as Hybrid Inflation models, may produce topological defects during a phase transition at the end of the inflationary epoch. We point out that, if the energy scale of these defects is close to that of Grand Unification, then their effect on cosmic structure formation and the generation of microwave background anisotropies cannot be ignored. Therefore, it is possible for structure to be seeded by a combination of the adiabatic perturbations produced during inflation and active isocurvature perturbations produced by defects. Since the two mechanisms are uncorrelated the power spectra can be computed by a weighted average of the individual contributions. We investigate the possible observational consequences of this with reference to general Hybrid Inflation models and also a specific model based on Supergravity. These mixed perturbation scenarios have some novel observational consequences and these are discussed qualitatively.Comment: 22 Page

arXiv.org e-Print Archive

SiGMa: Simple Greedy Matching for Aligning Large Knowledge Bases

Author: Davies A
Ghahramani Z
Graepel T
Kasneci G
Lacoste-Julien S
Palla K
Publication venue
Publication date: 01/01/2012
Field of study

The Internet has enabled the creation of a growing number of large-scale knowledge bases in a variety of domains containing complementary information. Tools for automatically aligning these knowledge bases would make it possible to unify many sources of structured knowledge and answer complex queries. However, the efficient alignment of large-scale knowledge bases still poses a considerable challenge. Here, we present Simple Greedy Matching (SiGMa), a simple algorithm for aligning knowledge bases with millions of entities and facts. SiGMa is an iterative propagation algorithm which leverages both the structural information from the relationship graph as well as flexible similarity measures between entity properties in a greedy local search, thus making it scalable. Despite its greedy nature, our experiments indicate that SiGMa can efficiently match some of the world's largest knowledge bases with high precision. We provide additional experiments on benchmark datasets which demonstrate that SiGMa can outperform state-of-the-art approaches both in accuracy and efficiency.Comment: 10 pages + 2 pages appendix; 5 figures -- initial preprin

arXiv.org e-Print Archive

CiteSeerX

INRIA a CCSD electronic archive server

Succinct Dictionary Matching With No Slowdown

Author: A.V. Aho
J.I. Munro
K. Sadakane
P. Elias
R.M. Fano
S. Dori
W.-K. Hon
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2010
Field of study

The problem of dictionary matching is a classical problem in string matching: given a set S of d strings of total length n characters over an (not necessarily constant) alphabet of size sigma, build a data structure so that we can match in a any text T all occurrences of strings belonging to S. The classical solution for this problem is the Aho-Corasick automaton which finds all occ occurrences in a text T in time O(|T| + occ) using a data structure that occupies O(m log m) bits of space where m <= n + 1 is the number of states in the automaton. In this paper we show that the Aho-Corasick automaton can be represented in just m(log sigma + O(1)) + O(d log(n/d)) bits of space while still maintaining the ability to answer to queries in O(|T| + occ) time. To the best of our knowledge, the currently fastest succinct data structure for the dictionary matching problem uses space O(n log sigma) while answering queries in O(|T|log log n + occ) time. In this paper we also show how the space occupancy can be reduced to m(H0 + O(1)) + O(d log(n/d)) where H0 is the empirical entropy of the characters appearing in the trie representation of the set S, provided that sigma < m^epsilon for any constant 0 < epsilon < 1. The query time remains unchanged.Comment: Corrected typos and other minor error

arXiv.org e-Print Archive

CiteSeerX