139,569 research outputs found
String Matching and 1d Lattice Gases
We calculate the probability distributions for the number of occurrences
of a given letter word in a random string of letters. Analytical
expressions for the distribution are known for the asymptotic regimes (i) (Gaussian) and such that is finite
(Compound Poisson). However, it is known that these distributions do now work
well in the intermediate regime . We show that the
problem of calculating the string matching probability can be cast into a
determining the configurational partition function of a 1d lattice gas with
interacting particles so that the matching probability becomes the
grand-partition sum of the lattice gas, with the number of particles
corresponding to the number of matches. We perform a virial expansion of the
effective equation of state and obtain the probability distribution. Our result
reproduces the behavior of the distribution in all regimes. We are also able to
show analytically how the limiting distributions arise. Our analysis builds on
the fact that the effective interactions between the particles consist of a
relatively strong core of size , the word length, followed by a weak,
exponentially decaying tail. We find that the asymptotic regimes correspond to
the case where the tail of the interactions can be neglected, while in the
intermediate regime they need to be kept in the analysis. Our results are
readily generalized to the case where the random strings are generated by more
complicated stochastic processes such as a non-uniform letter probability
distribution or Markov chains. We show that in these cases the tails of the
effective interactions can be made even more dominant rendering thus the
asymptotic approximations less accurate in such a regime.Comment: 44 pages and 8 figures. Major revision of previous version. The
lattice gas analogy has been worked out in full, including virial expansion
and equation of state. This constitutes the main part of the paper now.
Connections with existing work is made and references should be up to date
now. To be submitted for publicatio
Approximating the Spectrum of a Graph
The spectrum of a network or graph with adjacency matrix ,
consists of the eigenvalues of the normalized Laplacian . This set of eigenvalues encapsulates many aspects of the structure
of the graph, including the extent to which the graph posses community
structures at multiple scales. We study the problem of approximating the
spectrum , of in the regime where the graph is too
large to explicitly calculate the spectrum. We present a sublinear time
algorithm that, given the ability to query a random node in the graph and
select a random neighbor of a given node, computes a succinct representation of
an approximation , such that . Our algorithm has query complexity and running time ,
independent of the size of the graph, . We demonstrate the practical
viability of our algorithm on 15 different real-world graphs from the Stanford
Large Network Dataset Collection, including social networks, academic
collaboration graphs, and road networks. For the smallest of these graphs, we
are able to validate the accuracy of our algorithm by explicitly calculating
the true spectrum; for the larger graphs, such a calculation is computationally
prohibitive.
In addition we study the implications of our algorithm to property testing in
the bounded degree graph model
The Low Column Density Lyman-alpha Forest
We develop an analytical method based on the lognormal approximation to
compute the column density distribution of the Lyman-alpha forest in the low
column density limit. We compute the column density distributions for six
different cosmological models and found that the standard, COBE-normalized CDM
model cannot fit the observations of the Lyman-alpha forest at z=3. The
amplitude of the fluctuations in that model has to be lowered by a factor of
almost 3 to match observations. However, the currently viable cosmological
models like the lightly tilted COBE-normalized CDM+Lambda model, the CHDM model
with 20% neutrinos, and the low-amplitude Standard CDM model are all in
agreement with observations, to within the accuracy of our approximation, for
the value of the cosmological baryon density at or higher than the old Standard
Bing Bang Nucleosynthesis value of 0.0125 for the currently favored value of
the ionizing radiation intensity. With the low value for the baryon density
inferred by Hogan & Rugers (1996), the models can only marginally match
observations.Comment: three postscript figures included, submitted to ApJ
The oscillating behavior of the pair correlation function in galaxies
The pair correlation function (PCF) for galaxies presents typical
oscillations in the range 20-200 Mpc/h which are named baryon acoustic
oscillation (BAO). We first review and test the oscillations of the PCF when
the 2D/3D vertexes of the Poissonian Voronoi Tessellation (PVT) are considered.
We then model the behavior of the PCF at a small scale in the presence of an
auto gravitating medium having a line/plane of symmetry in 2D/3D. The analysis
of the PCF in an astrophysical context was split into two, adopting a
non-Poissonian Voronoi Tessellation (NPVT). We first analyzed the case of a 2D
cut which covers few voids and a 2D cut which covers approximately 50 voids.
The obtained PCF in the case of many voids was then discussed in comparison to
the bootstrap predictions for a PVT process and the observed PCF for an
astronomical catalog. An approximated formula which connects the averaged
radius of the cosmic voids to the first minimum of the PCF is given.Comment: 19 pages 14 figure
- …