852 research outputs found

    Optimal Substring-Equality Queries with Applications to Sparse Text Indexing

    Full text link
    We consider the problem of encoding a string of length nn from an integer alphabet of size σ\sigma so that access and substring equality queries (that is, determining the equality of any two substrings) can be answered efficiently. Any uniquely-decodable encoding supporting access must take nlogσ+Θ(log(nlogσ))n\log\sigma + \Theta(\log (n\log\sigma)) bits. We describe a new data structure matching this lower bound when σnO(1)\sigma\leq n^{O(1)} while supporting both queries in optimal O(1)O(1) time. Furthermore, we show that the string can be overwritten in-place with this structure. The redundancy of Θ(logn)\Theta(\log n) bits and the constant query time break exponentially a lower bound that is known to hold in the read-only model. Using our new string representation, we obtain the first in-place subquadratic (indeed, even sublinear in some cases) algorithms for several string-processing problems in the restore model: the input string is rewritable and must be restored before the computation terminates. In particular, we describe the first in-place subquadratic Monte Carlo solutions to the sparse suffix sorting, sparse LCP array construction, and suffix selection problems. With the sole exception of suffix selection, our algorithms are also the first running in sublinear time for small enough sets of input suffixes. Combining these solutions, we obtain the first sublinear-time Monte Carlo algorithm for building the sparse suffix tree in compact space. We also show how to derandomize our algorithms using small space. This leads to the first Las Vegas in-place algorithm computing the full LCP array in O(nlogn)O(n\log n) time and to the first Las Vegas in-place algorithms solving the sparse suffix sorting and sparse LCP array construction problems in O(n1.5logσ)O(n^{1.5}\sqrt{\log \sigma}) time. Running times of these Las Vegas algorithms hold in the worst case with high probability.Comment: Refactored according to TALG's reviews. New w.h.p. bounds and Las Vegas algorithm

    Fast Scalable Construction of (Minimal Perfect Hash) Functions

    Full text link
    Recent advances in random linear systems on finite fields have paved the way for the construction of constant-time data structures representing static functions and minimal perfect hash functions using less space with respect to existing techniques. The main obstruction for any practical application of these results is the cubic-time Gaussian elimination required to solve these linear systems: despite they can be made very small, the computation is still too slow to be feasible. In this paper we describe in detail a number of heuristics and programming techniques to speed up the resolution of these systems by several orders of magnitude, making the overall construction competitive with the standard and widely used MWHC technique, which is based on hypergraph peeling. In particular, we introduce broadword programming techniques for fast equation manipulation and a lazy Gaussian elimination algorithm. We also describe a number of technical improvements to the data structure which further reduce space usage and improve lookup speed. Our implementation of these techniques yields a minimal perfect hash function data structure occupying 2.24 bits per element, compared to 2.68 for MWHC-based ones, and a static function data structure which reduces the multiplicative overhead from 1.23 to 1.03

    The resolved star-formation relation in nearby active galactic nuclei

    Get PDF
    We present an analysis of the relation between star formation rate (SFR) surface density (sigmasfr) and mass surface density of molecular gas (sigmahtwo), commonly referred to as the Kennicutt-Schmidt (K-S) relation, at its intrinsic spatial scale, i.e. the size of giant molecular clouds (10-150 pc), in the central, high-density regions of four nearby low-luminosity active galactic nuclei (AGN). We used interferometric IRAM CO(1-0) and CO(2-1), and SMA CO(3-2) emission line maps to derive sigmahtwo and HST-Halpha images to estimate sigmasfr. Each galaxy is characterized by a distinct molecular SF relation at spatial scales between 20 to 200 pc. The K-S relations can be sub-linear, but also super-linear, with slopes ranging from 0.5 to 1.3. Depletion times range from 1 and 2Gyr, compatible with results for nearby normal galaxies. These findings are valid independently of which transition, CO(1-0), CO(2-1), or CO(3-2), is used to derive sigmahtwo. Because of star-formation feedback, life-time of clouds, turbulent cascade, or magnetic fields, the K-S relation might be expected to degrade on small spatial scales (<100 pc). However, we find no clear evidence for this, even on scales as small as 20 pc, and this might be because of the higher density of GMCs in galaxy centers which have to resist higher shear forces. The proportionality between sigmahtwo and sigmasfr found between 10 and 100 Msun/pc2 is valid even at high densities, 10^3 Msun/pc2. However, by adopting a common CO-to-H2 conversion factor (alpha_CO), the central regions of the galaxies have higher sigmasfr for a given gas column than those expected from the models, with a behavior that lies between the mergers/high-redshift starburst systems and the more quiescent star-forming galaxies, assuming that the first ones require a lower value of alpha_CO.Comment: 22 pages, 8 figures, Accepted for publication in Astronomy and Astrophysic
    corecore