2,807 research outputs found
A Framework for Monte Carlo based Multiple Testing
We are concerned with a situation in which we would like to test multiple
hypotheses with tests whose p-values cannot be computed explicitly but can be
approximated using Monte Carlo simulation. This scenario occurs widely in
practice. We are interested in obtaining the same rejections and non-rejections
as the ones obtained if the p-values for all hypotheses had been available. The
present article introduces a framework for this scenario by providing a generic
algorithm for a general multiple testing procedure. We establish conditions
which guarantee that the rejections and non-rejections obtained through Monte
Carlo simulations are identical to the ones obtained with the p-values. Our
framework is applicable to a general class of step-up and step-down procedures
which includes many established multiple testing corrections such as the ones
of Bonferroni, Holm, Sidak, Hochberg or Benjamini-Hochberg. Moreover, we show
how to use our framework to improve algorithms available in the literature in
such a way as to yield theoretical guarantees on their results. These
modifications can easily be implemented in practice and lead to a particular
way of reporting multiple testing results as three sets together with an error
bound on their correctness, demonstrated exemplarily using a real biological
dataset
QuickMMCTest - Quick Multiple Monte Carlo Testing
Multiple hypothesis testing is widely used to evaluate scientific studies
involving statistical tests. However, for many of these tests, p-values are not
available and are thus often approximated using Monte Carlo tests such as
permutation tests or bootstrap tests. This article presents a simple algorithm
based on Thompson Sampling to test multiple hypotheses. It works with arbitrary
multiple testing procedures, in particular with step-up and step-down
procedures. Its main feature is to sequentially allocate Monte Carlo effort,
generating more Monte Carlo samples for tests whose decisions are so far less
certain. A simulation study demonstrates that for a low computational effort,
the new approach yields a higher power and a higher degree of reproducibility
of its results than previously suggested methods
Statistical Methods for Monte-Carlo based Multiple Hypothesis Testing
Statistical hypothesis testing is a key technique to perform statistical inference. The main focus of this work is to investigate multiple testing under the assumption that the analytical p-values underlying the tests for all hypotheses are unknown. Instead, we assume that they can be approximated by drawing Monte Carlo samples under the null.
The first part of this thesis focuses on the computation of test results with a guarantee on their correctness, that is decisions on multiple hypotheses which are identical to the ones obtained with the unknown p-values. We present MMCTest, an algorithm to implement a multiple testing procedure which yields correct decisions on all hypotheses (up to a pre-specified error probability) based solely on Monte Carlo simulation. MMCTest offers novel ways to evaluate multiple hypotheses as it allows to obtain the (previously unknown) correct decision on hypotheses (for instance, genes) in real data studies (again up to an error probability pre-specified by the user).
The ideas behind MMCTest are generalised in a framework for Monte Carlo based multiple testing, demonstrating that existing methods giving no guarantees on their test results can be modified to yield certain theoretical guarantees on the correctness of their outputs.
The second part deals with multiple testing from a practical perspective. We assume that in practice, it might also be desired to sacrifice the additional computational effort needed to obtain guaranteed decisions and to invest it instead in the computation of a more accurate ad-hoc test result. This is attempted by QuickMMCTest, an algorithm which adaptively allocates more samples to hypotheses whose decisions are more prone to random fluctuations, thereby achieving an improved accuracy.
This work also derives the optimal allocation of a finite number of samples to finitely many hypotheses under a normal approximation, where the optimal allocation is understood as the one minimising the expected number of erroneously classified hypotheses (with respect to the classification based on the analytical p-values). An empirical comparison of the optimal allocation of samples to the one computed by QuickMMCTest indicates that the behaviour of QuickMMCTest might not be too far away from being optimal.Open Acces
Penalized Principal Component Analysis using Nesterov Smoothing
Principal components computed via PCA (principal component analysis) are
traditionally used to reduce dimensionality in genomic data or to correct for
population stratification. In this paper, we explore the penalized eigenvalue
problem (PEP) which reformulates the computation of the first eigenvector as an
optimization problem and adds an L1 penalty constraint. The contribution of our
article is threefold. First, we extend PEP by applying Nesterov smoothing to
the original LASSO-type L1 penalty. This allows one to compute analytical
gradients which enable faster and more efficient minimization of the objective
function associated with the optimization problem. Second, we demonstrate how
higher order eigenvectors can be calculated with PEP using established results
from singular value decomposition (SVD). Third, using data from the 1000 Genome
Project dataset, we empirically demonstrate that our proposed smoothed PEP
allows one to increase numerical stability and obtain meaningful eigenvectors.
We further investigate the utility of the penalized eigenvector approach over
traditional PCA.Comment: 14 pages, 3 figures (10 files
Initial state encoding via reverse quantum annealing and h-gain features
Quantum annealing is a specialized type of quantum computation that aims to
use quantum fluctuations in order to obtain global minimum solutions of
combinatorial optimization problems. D-Wave Systems, Inc., manufactures quantum
annealers, which are available as cloud computing resources, and allow users to
program the anneal schedules used in the annealing computation. In this paper,
we are interested in improving the quality of the solutions returned by a
quantum annealer by encoding an initial state. We explore two D-Wave features
allowing one to encode such an initial state: the reverse annealing and the
h-gain features. Reverse annealing (RA) aims to refine a known solution
following an anneal path starting with a classical state representing a good
solution, going backwards to a point where a transverse field is present, and
then finishing the annealing process with a forward anneal. The h-gain (HG)
feature allows one to put a time-dependent weighting scheme on linear ()
biases of the Hamiltonian, and we demonstrate that this feature likewise can be
used to bias the annealing to start from an initial state. We also consider a
hybrid method consisting of a backward phase resembling RA, and a forward phase
using the HG initial state encoding. Importantly, we investigate the idea of
iteratively applying RA and HG to a problem, with the goal of monotonically
improving on an initial state that is not optimal. The HG encoding technique is
evaluated on a variety of input problems including the weighted Maximum Cut
problem and the weighted Maximum Clique problem, demonstrating that the HG
technique is a viable alternative to RA for some problems. We also investigate
how the iterative procedures perform for both RA and HG initial state encoding
on random spin glasses with the native connectivity of the D-Wave Chimera and
Pegasus chips.Comment: arXiv admin note: substantial text overlap with arXiv:2009.0500
Advanced unembedding techniques for quantum annealers
The D-Wave quantum annealers make it possible to obtain high quality
solutions of NP-hard problems by mapping a problem in a QUBO (quadratic
unconstrained binary optimization) or Ising form to the physical qubit
connectivity structure on the D-Wave chip. However, the latter is restricted in
that only a fraction of all pairwise couplers between physical qubits exists.
Modeling the connectivity structure of a given problem instance thus
necessitates the computation of a minor embedding of the variables in the
problem specification onto the logical qubits, which consist of several
physical qubits "chained" together to act as a logical one. After annealing, it
is however not guaranteed that all chained qubits get the same value (-1 or +1
for an Ising model, and 0 or 1 for a QUBO), and several approaches exist to
assign a final value to each logical qubit (a process called "unembedding"). In
this work, we present tailored unembedding techniques for four important
NP-hard problems: the Maximum Clique, Maximum Cut, Minimum Vertex Cover, and
Graph Partitioning problems. Our techniques are simple and yet make use of
structural properties of the problem being solved. Using Erd\H{o}s-R\'enyi
random graphs as inputs, we compare our unembedding techniques to three popular
ones (majority vote, random weighting, and minimize energy). We demonstrate
that our proposed algorithms outperform the currently available ones in that
they yield solutions of better quality, while being computationally equally
efficient
Inferring the Dynamics of the State Evolution During Quantum Annealing
To solve an optimization problem using a commercial quantum annealer, one has
to represent the problem of interest as an Ising or a quadratic unconstrained
binary optimization (QUBO) problem and submit its coefficients to the annealer,
which then returns a user-specified number of low-energy solutions. It would be
useful to know what happens in the quantum processor during the anneal process
so that one could design better algorithms or suggest improvements to the
hardware. However, existing quantum annealers are not able to directly extract
such information from the processor. Hence, in this work we propose to use
advanced features of D-Wave 2000Q to indirectly infer information about the
dynamics of the state evolution during the anneal process. Specifically, D-Wave
2000Q allows the user to customize the anneal schedule, that is, the schedule
with which the anneal fraction is changed from the start to the end of the
anneal. Using this feature, we design a set of modified anneal schedules whose
outputs can be used to generate information about the states of the system at
user-defined time points during a standard anneal. With this process, called
"slicing", we obtain approximate distributions of lowest-energy anneal
solutions as the anneal time evolves. We use our technique to obtain a variety
of insights into the annealer, such as the state evolution during annealing,
when individual bits in an evolving solution flip during the anneal process and
when they stabilize, and we introduce a technique to estimate the freeze-out
point of both the system as well as of individual qubits
- …