False discovery rate: setting the probability of false claim of detection
When testing multiple hypothesis in a survey --e.g. many different source
locations, template waveforms, and so on-- the final result consists in a set
of confidence intervals, each one at a desired confidence level. But the
probability that at least one of these intervals does not cover the true value
increases with the number of trials. With a sufficiently large array of
confidence intervals, one can be sure that at least one is missing the true
value. In particular, the probability of false claim of detection becomes not
negligible. In order to compensate for this, one should increase the confidence
level, at the price of a reduced detection power. False discovery rate control
is a relatively new statistical procedure that bounds the number of mistakes
made when performing multiple hypothesis tests. We shall review this method,
discussing exercise applications to the field of gravitational wave surveys.Comment: 7 pages, 3 table, 3 figures. Prepared for the Proceedings of GWDAW 9
(http://lappc-in39.in2p3.fr/GWDAW9) A new section was added with a numerical
example, along with two tables and a figure related to the new section. Many
smaller revisions to improve readibilit
Detection of Anomalous Reactor Activity Using Antineutrino Count Rate Evolution Over the Course of a Reactor Cycle
This paper analyzes the sensitivity of antineutrino count rate measurements
to changes in the fissile content of civil power reactors. Such measurements
may be useful in IAEA reactor safeguards applications. We introduce a
hypothesis testing procedure to identify statistically significant differences
between the antineutrino count rate evolution of a standard 'baseline' fuel
cycle and that of an anomalous cycle, in which plutonium is removed and
replaced with an equivalent fissile worth of uranium. The test would allow an
inspector to detect anomalous reactor activity, or to positively confirm that
the reactor is operating in a manner consistent with its declared fuel
inventory and power level. We show that with a reasonable choice of detector
parameters, the test can detect replacement of 73 kg of plutonium in 90 days
with 95% probability, while controlling the false positive rate at 5%. We show
that some improvement on this level of sensitivity may be expected by various
means, including use of the method in conjunction with existing reactor
safeguards methods. We also identify a necessary and sufficient daily
antineutrino count rate to achieve the quoted sensitivity, and list examples of
detectors in which such rates have been attained.Comment: 9 pages, 7 figures, submitted to J. Appl. Phy
Detecting the Baryons in Matter Power Spectra
We examine power spectra from the Abell/ACO rich cluster survey and the 2dF
Galaxy Redshift Survey (2dfGRS) for observational evidence of features produced
by the baryons. A non-negligible baryon fraction produces relatively sharp
oscillatory features at specific wavenumbers in the matter power spectrum.
However, the mere existence of baryons will also produce a global suppression
of the power spectrum. We look for both of these features using the false
discovery rate (FDR) statistic. We show that the window effects on the
Abell/ACO power spectrum are minimal, which has allowed for the discovery of
discrete oscillatory features in the power spectrum. On the other hand, there
are no statistically significant oscillatory features in the 2dFGRS power
spectrum, which is expected from the survey's broad window function. After
accounting for window effects, we apply a scale-independent bias to the 2dFGRS
power spectrum, P_{Abell}(k) = b^2P_{2dF}(k) and b = 3.2. We find that the
overall shapes of the Abell/ACO and the biased 2dFGRS power spectra are
entirely consistent over the range 0.02 <= k <= 0.15hMpc^-1. We examine the
range of Omega_{matter} and baryon fraction for which these surveys could
detect significant suppression in power. The reported baryon fractions for both
the Abell/ACO and 2dFGRS surveys are high enough to cause a detectable
suppression in power (after accounting for errors, windows and k-space
sampling). Using the same technique, we also examine, given the best fit baryon
density obtained from BBN, whether it is possible to detect additional
suppression due to dark matter-baryon interaction. We find that the limit on
dark matter cross section/mass derived from these surveys are the same as those
ruled out in a recent study by Chen, Hannestad and Scherrer.Comment: 11 pages of text, 6 figures. Submitted to Ap
ExplainIt! -- A declarative root-cause analysis engine for time series data (extended version)
We present ExplainIt!, a declarative, unsupervised root-cause analysis engine
that uses time series monitoring data from large complex systems such as data
centres. ExplainIt! empowers operators to succinctly specify a large number of
causal hypotheses to search for causes of interesting events. ExplainIt! then
ranks these hypotheses, reducing the number of causal dependencies from
hundreds of thousands to a handful for human understanding. We show how a
declarative language, such as SQL, can be effective in declaratively
enumerating hypotheses that probe the structure of an unknown probabilistic
graphical causal model of the underlying system. Our thesis is that databases
are in a unique position to enable users to rapidly explore the possible causal
mechanisms in data collected from diverse sources. We empirically demonstrate
how ExplainIt! had helped us resolve over 30 performance issues in a commercial
product since late 2014, of which we discuss a few cases in detail.Comment: SIGMOD Industry Track 201
Prediction of gene expression in human using rat in vivo gene expression in Japanese Toxicogenomics Project
On a random walk with memory and its relation to Markovian processes
We study a one-dimensional random walk with memory in which the step lengths
to the left and to the right evolve at each step in order to reduce the
wandering of the walker. The feedback is quite efficient and lead to a
non-diffusive walk. The time evolution of the displacement is given by an
equivalent Markovian dynamical process. The probability density for the
position of the walker is the same at any time as for a random walk with
shrinking steps, although the two-time correlation functions are quite
different.Comment: 10 pages, 4 figure
The Inverse Shapley Value Problem
For a weighted voting scheme used by voters to choose between two
candidates, the \emph{Shapley-Shubik Indices} (or {\em Shapley values}) of
provide a measure of how much control each voter can exert over the overall
outcome of the vote. Shapley-Shubik indices were introduced by Lloyd Shapley
and Martin Shubik in 1954 \cite{SS54} and are widely studied in social choice
theory as a measure of the "influence" of voters. The \emph{Inverse Shapley
Value Problem} is the problem of designing a weighted voting scheme which
(approximately) achieves a desired input vector of values for the
Shapley-Shubik indices. Despite much interest in this problem no provably
correct and efficient algorithm was known prior to our work.
We give the first efficient algorithm with provable performance guarantees
for the Inverse Shapley Value Problem. For any constant \eps > 0 our
algorithm runs in fixed poly time (the degree of the polynomial is
independent of \eps) and has the following performance guarantee: given as
input a vector of desired Shapley values, if any "reasonable" weighted voting
scheme (roughly, one in which the threshold is not too skewed) approximately
matches the desired vector of values to within some small error, then our
algorithm explicitly outputs a weighted voting scheme that achieves this vector
of Shapley values to within error \eps. If there is a "reasonable" voting
scheme in which all voting weights are integers at most \poly(n) that
approximately achieves the desired Shapley values, then our algorithm runs in
time \poly(n) and outputs a weighted voting scheme that achieves the target
vector of Shapley values to within error $\eps=n^{-1/8}.
