3,883 research outputs found
Benford's Law, Values of L-functions and the 3x+1 Problem
We show the leading digits of a variety of systems satisfying certain
conditions follow Benford's Law. For each system proving this involves two main
ingredients. One is a structure theorem of the limiting distribution, specific
to the system. The other is a general technique of applying Poisson Summation
to the limiting distribution. We show the distribution of values of L-functions
near the central line and (in some sense) the iterates of the 3x+1 Problem are
Benford.Comment: 25 pages, 1 figure; replacement of earlier draft (corrected some
typos, added more exposition, added results for characteristic polynomials of
unitary matrices
Monte Carlo Methods for Top-k Personalized PageRank Lists and Name Disambiguation
We study a problem of quick detection of top-k Personalized PageRank lists.
This problem has a number of important applications such as finding local cuts
in large graphs, estimation of similarity distance and name disambiguation. In
particular, we apply our results to construct efficient algorithms for the
person name disambiguation problem. We argue that when finding top-k
Personalized PageRank lists two observations are important. Firstly, it is
crucial that we detect fast the top-k most important neighbours of a node,
while the exact order in the top-k list as well as the exact values of PageRank
are by far not so crucial. Secondly, a little number of wrong elements in top-k
lists do not really degrade the quality of top-k lists, but it can lead to
significant computational saving. Based on these two key observations we
propose Monte Carlo methods for fast detection of top-k Personalized PageRank
lists. We provide performance evaluation of the proposed methods and supply
stopping criteria. Then, we apply the methods to the person name disambiguation
problem. The developed algorithm for the person name disambiguation problem has
achieved the second place in the WePS 2010 competition
Combining domain knowledge and statistical models in time series analysis
This paper describes a new approach to time series modeling that combines
subject-matter knowledge of the system dynamics with statistical techniques in
time series analysis and regression. Applications to American option pricing
and the Canadian lynx data are given to illustrate this approach.Comment: Published at http://dx.doi.org/10.1214/074921706000001049 in the IMS
Lecture Notes Monograph Series
(http://www.imstat.org/publications/lecnotes.htm) by the Institute of
Mathematical Statistics (http://www.imstat.org
On the variance of the number of occupied boxes
We consider the occupancy problem where balls are thrown independently at
infinitely many boxes with fixed positive frequencies. It is well known that
the random number of boxes occupied by the first n balls is asymptotically
normal if its variance V_n tends to infinity. In this work, we mainly focus on
the opposite case where V_n is bounded, and derive a simple necessary and
sufficient condition for convergence of V_n to a finite limit, thus settling a
long-standing question raised by Karlin in the seminal paper of 1967. One
striking consequence of our result is that the possible limit may only be a
positive integer number. Some new conditions for other types of behavior of the
variance, like boundedness or convergence to infinity, are also obtained. The
proofs are based on the poissonization techniques.Comment: 34 page
Efficient implementation of Markov chain Monte Carlo when using an unbiased likelihood estimator
When an unbiased estimator of the likelihood is used within a
Metropolis--Hastings chain, it is necessary to trade off the number of Monte
Carlo samples used to construct this estimator against the asymptotic variances
of averages computed under this chain. Many Monte Carlo samples will typically
result in Metropolis--Hastings averages with lower asymptotic variances than
the corresponding Metropolis--Hastings averages using fewer samples. However,
the computing time required to construct the likelihood estimator increases
with the number of Monte Carlo samples. Under the assumption that the
distribution of the additive noise introduced by the log-likelihood estimator
is Gaussian with variance inversely proportional to the number of Monte Carlo
samples and independent of the parameter value at which it is evaluated, we
provide guidelines on the number of samples to select. We demonstrate our
results by considering a stochastic volatility model applied to stock index
returns.Comment: 34 pages, 9 figures, 3 table
Assessing consistency of fish survey data : uncertainties in the estimation of mackerel icefish (Champsocephalus gunnari) abundance at South Georgia
Acknowledgments The authors wish to thank the crews, fishermen and scientists who conducted the various surveys from which data were obtained, and Mark Belchier and Simeon Hill for their contributions. This work was supported by the Government of South Georgia and South Sandwich Islands. Additional logistical support provided by The South Atlantic Environmental Research Institute with thanks to Paul Brickle. Thanks to Stephen Smith of Fisheries and Oceans Canada (DFO) for help in constructing bootstrap confidence limits. Paul Fernandes receives funding from the MASTS pooling initiative (The Marine Alliance for Science and Technology for Scotland), and their support is gratefully acknowledged. MASTS is funded by the Scottish Funding Council (grant reference HR09011) and contributing institutions. We also wish to thank two anonymous referees for their helpful suggestions on earlier versions of this manuscript.Peer reviewedPostprin
- âŠ