1,803 research outputs found
Statistical thinking: From Tukey to Vardi and beyond
Data miners (minors?) and neural networkers tend to eschew modelling, misled
perhaps by misinterpretation of strongly expressed views of John Tukey. I
discuss Vardi's views of these issues as well as other aspects of Vardi's work
in emision tomography and in sampling bias.Comment: Published at http://dx.doi.org/10.1214/074921707000000210 in the IMS
Lecture Notes Monograph Series
(http://www.imstat.org/publications/lecnotes.htm) by the Institute of
Mathematical Statistics (http://www.imstat.org
Permutation graphs, fast forward permutations, and sampling the cycle structure of a permutation
A permutation P on {1,..,N} is a_fast_forward_permutation_ if for each m the
computational complexity of evaluating P^m(x)$ is small independently of m and
x. Naor and Reingold constructed fast forward pseudorandom cycluses and
involutions. By studying the evolution of permutation graphs, we prove that the
number of queries needed to distinguish a random cyclus from a random
permutation on {1,..,N} is Theta(N) if one does not use queries of the form
P^m(x), but is only Theta(1) if one is allowed to make such queries.
We construct fast forward permutations which are indistinguishable from
random permutations even when queries of the form P^m(x) are allowed. This is
done by introducing an efficient method to sample the cycle structure of a
random permutation, which in turn solves an open problem of Naor and Reingold.Comment: Corrected a small erro
Distinct counting with a self-learning bitmap
Counting the number of distinct elements (cardinality) in a dataset is a
fundamental problem in database management. In recent years, due to many of its
modern applications, there has been significant interest to address the
distinct counting problem in a data stream setting, where each incoming data
can be seen only once and cannot be stored for long periods of time. Many
probabilistic approaches based on either sampling or sketching have been
proposed in the computer science literature, that only require limited
computing and memory resources. However, the performances of these methods are
not scale-invariant, in the sense that their relative root mean square
estimation errors (RRMSE) depend on the unknown cardinalities. This is not
desirable in many applications where cardinalities can be very dynamic or
inhomogeneous and many cardinalities need to be estimated. In this paper, we
develop a novel approach, called self-learning bitmap (S-bitmap) that is
scale-invariant for cardinalities in a specified range. S-bitmap uses a binary
vector whose entries are updated from 0 to 1 by an adaptive sampling process
for inferring the unknown cardinality, where the sampling rates are reduced
sequentially as more and more entries change from 0 to 1. We prove rigorously
that the S-bitmap estimate is not only unbiased but scale-invariant. We
demonstrate that to achieve a small RRMSE value of or less, our
approach requires significantly less memory and consumes similar or less
operations than state-of-the-art methods for many common practice cardinality
scales. Both simulation and experimental studies are reported.Comment: Journal of the American Statistical Association (accepted
Chain Plot: A Tool for Exploiting Bivariate Temporal Structures
In this paper we present a graphical tool useful for visualizing the cyclic behaviour of bivariate time series. We investigate its properties and link it to the asymmetry of the two variables concerned. We also suggest adding approximate confidence bounds to the points on the plot and investigate the effect of lagging to the chain plot. We conclude our paper by some standard Fourier analysis, relating and comparing this to the chain plot
Drift rate control of a Brownian processing system
A system manager dynamically controls a diffusion process Z that lives in a
finite interval [0,b]. Control takes the form of a negative drift rate \theta
that is chosen from a fixed set A of available values. The controlled process
evolves according to the differential relationship dZ=dX-\theta(Z) dt+dL-dU,
where X is a (0,\sigma) Brownian motion, and L and U are increasing processes
that enforce a lower reflecting barrier at Z=0 and an upper reflecting barrier
at Z=b, respectively. The cumulative cost process increases according to the
differential relationship d\xi =c(\theta(Z)) dt+p dU, where c(\cdot) is a
nondecreasing cost of control and p>0 is a penalty rate associated with
displacement at the upper boundary. The objective is to minimize long-run
average cost. This problem is solved explicitly, which allows one to also solve
the following, essentially equivalent formulation: minimize the long-run
average cost of control subject to an upper bound constraint on the average
rate at which U increases. The two special problem features that allow an
explicit solution are the use of a long-run average cost criterion, as opposed
to a discounted cost criterion, and the lack of state-related costs other than
boundary displacement penalties. The application of this theory to power
control in wireless communication is discussed.Comment: Published at http://dx.doi.org/10.1214/105051604000000855 in the
Annals of Applied Probability (http://www.imstat.org/aap/) by the Institute
of Mathematical Statistics (http://www.imstat.org
ROC and the bounds on tail probabilities via theorems of Dubins and F. Riesz
For independent and in the inequality , we give sharp
lower bounds for unimodal distributions having finite variance, and sharp upper
bounds assuming symmetric densities bounded by a finite constant. The lower
bounds depend on a result of Dubins about extreme points and the upper bounds
depend on a symmetric rearrangement theorem of F. Riesz. The inequality was
motivated by medical imaging: find bounds on the area under the Receiver
Operating Characteristic curve (ROC).Comment: Published in at http://dx.doi.org/10.1214/08-AAP536 the Annals of
Applied Probability (http://www.imstat.org/aap/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Spatial methods for event reconstruction in CLEAN
In CLEAN (Cryogenic Low Energy Astrophysics with Noble gases), a proposed
neutrino and dark matter detector, background discrimination is possible if one
can determine the location of an ionizing radiation event with high accuracy.
We simulate ionizing radiation events that produce multiple scintillation
photons within a spherical detection volume filled with liquid neon. We
estimate the radial location of a particular ionizing radiation event based on
the observed count data corresponding to that event. The count data are
collected by detectors mounted at the spherical boundary of the detection
volume. We neglect absorption, but account for Rayleigh scattering. To account
for wavelength-shifting of the scintillation light, we assume that photons are
absorbed and re-emitted at the detectors. Here, we develop spatial Maximum
Likelihood methods for event reconstruction, and study their performance in
computer simulation experiments. We also study a method based on the centroid
of the observed count data. We calibrate our estimates based on training data
- …
