139,569 research outputs found

    String Matching and 1d Lattice Gases

    Full text link
    We calculate the probability distributions for the number of occurrences nn of a given ll letter word in a random string of kk letters. Analytical expressions for the distribution are known for the asymptotic regimes (i) k≫rl≫1k \gg r^l \gg 1 (Gaussian) and k,l→∞k,l \to \infty such that k/rlk/r^l is finite (Compound Poisson). However, it is known that these distributions do now work well in the intermediate regime k≳rl≳1k \gtrsim r^l \gtrsim 1. We show that the problem of calculating the string matching probability can be cast into a determining the configurational partition function of a 1d lattice gas with interacting particles so that the matching probability becomes the grand-partition sum of the lattice gas, with the number of particles corresponding to the number of matches. We perform a virial expansion of the effective equation of state and obtain the probability distribution. Our result reproduces the behavior of the distribution in all regimes. We are also able to show analytically how the limiting distributions arise. Our analysis builds on the fact that the effective interactions between the particles consist of a relatively strong core of size ll, the word length, followed by a weak, exponentially decaying tail. We find that the asymptotic regimes correspond to the case where the tail of the interactions can be neglected, while in the intermediate regime they need to be kept in the analysis. Our results are readily generalized to the case where the random strings are generated by more complicated stochastic processes such as a non-uniform letter probability distribution or Markov chains. We show that in these cases the tails of the effective interactions can be made even more dominant rendering thus the asymptotic approximations less accurate in such a regime.Comment: 44 pages and 8 figures. Major revision of previous version. The lattice gas analogy has been worked out in full, including virial expansion and equation of state. This constitutes the main part of the paper now. Connections with existing work is made and references should be up to date now. To be submitted for publicatio

    Approximating the Spectrum of a Graph

    Full text link
    The spectrum of a network or graph G=(V,E)G=(V,E) with adjacency matrix AA, consists of the eigenvalues of the normalized Laplacian L=I−D−1/2AD−1/2L= I - D^{-1/2} A D^{-1/2}. This set of eigenvalues encapsulates many aspects of the structure of the graph, including the extent to which the graph posses community structures at multiple scales. We study the problem of approximating the spectrum λ=(λ1,…,λ∣V∣)\lambda = (\lambda_1,\dots,\lambda_{|V|}), 0≤λ1,≤…,≤λ∣V∣≤20 \le \lambda_1,\le \dots, \le \lambda_{|V|}\le 2 of GG in the regime where the graph is too large to explicitly calculate the spectrum. We present a sublinear time algorithm that, given the ability to query a random node in the graph and select a random neighbor of a given node, computes a succinct representation of an approximation λ~=(λ~1,…,λ~∣V∣)\widetilde \lambda = (\widetilde \lambda_1,\dots,\widetilde \lambda_{|V|}), 0≤λ~1,≤…,≤λ~∣V∣≤20 \le \widetilde \lambda_1,\le \dots, \le \widetilde \lambda_{|V|}\le 2 such that ∥λ~−λ∥1≤ϵ∣V∣\|\widetilde \lambda - \lambda\|_1 \le \epsilon |V|. Our algorithm has query complexity and running time exp(O(1/ϵ))exp(O(1/\epsilon)), independent of the size of the graph, ∣V∣|V|. We demonstrate the practical viability of our algorithm on 15 different real-world graphs from the Stanford Large Network Dataset Collection, including social networks, academic collaboration graphs, and road networks. For the smallest of these graphs, we are able to validate the accuracy of our algorithm by explicitly calculating the true spectrum; for the larger graphs, such a calculation is computationally prohibitive. In addition we study the implications of our algorithm to property testing in the bounded degree graph model

    The Low Column Density Lyman-alpha Forest

    Get PDF
    We develop an analytical method based on the lognormal approximation to compute the column density distribution of the Lyman-alpha forest in the low column density limit. We compute the column density distributions for six different cosmological models and found that the standard, COBE-normalized CDM model cannot fit the observations of the Lyman-alpha forest at z=3. The amplitude of the fluctuations in that model has to be lowered by a factor of almost 3 to match observations. However, the currently viable cosmological models like the lightly tilted COBE-normalized CDM+Lambda model, the CHDM model with 20% neutrinos, and the low-amplitude Standard CDM model are all in agreement with observations, to within the accuracy of our approximation, for the value of the cosmological baryon density at or higher than the old Standard Bing Bang Nucleosynthesis value of 0.0125 for the currently favored value of the ionizing radiation intensity. With the low value for the baryon density inferred by Hogan & Rugers (1996), the models can only marginally match observations.Comment: three postscript figures included, submitted to ApJ

    The oscillating behavior of the pair correlation function in galaxies

    Full text link
    The pair correlation function (PCF) for galaxies presents typical oscillations in the range 20-200 Mpc/h which are named baryon acoustic oscillation (BAO). We first review and test the oscillations of the PCF when the 2D/3D vertexes of the Poissonian Voronoi Tessellation (PVT) are considered. We then model the behavior of the PCF at a small scale in the presence of an auto gravitating medium having a line/plane of symmetry in 2D/3D. The analysis of the PCF in an astrophysical context was split into two, adopting a non-Poissonian Voronoi Tessellation (NPVT). We first analyzed the case of a 2D cut which covers few voids and a 2D cut which covers approximately 50 voids. The obtained PCF in the case of many voids was then discussed in comparison to the bootstrap predictions for a PVT process and the observed PCF for an astronomical catalog. An approximated formula which connects the averaged radius of the cosmic voids to the first minimum of the PCF is given.Comment: 19 pages 14 figure
    • …
    corecore