6,293 research outputs found
Reverse-Safe Data Structures for Text Indexing
We introduce the notion of reverse-safe data structures. These are data structures that prevent the reconstruction of the data they encode (i.e., they cannot be easily reversed). A data structure D is called z-reverse-safe when there exist at least z datasets with the same set of answers as the ones stored by D. The main challenge is to ensure that D stores as many answers to useful queries as possible, is constructed efficiently, and has size close to the size of the original dataset it encodes. Given a text of length n and an integer z, we propose an algorithm which constructs a z-reverse-safe data structure that has size O(n) and answers pattern matching queries of length at most d optimally, where d is maximal for any such z-reverse-safe data structure. The construction algorithm takes O(n ω log d) time, where ω is the matrix multiplication exponent. We show that, despite the n ω factor, our engineered implementation takes only a few minutes to finish for million-letter texts. We further show that plugging our method in data analysis applications gives insignificant or no data utility loss. Finally, we show how our technique can be extended to support applications under a realistic adversary model
Exact reconstruction with directional wavelets on the sphere
A new formalism is derived for the analysis and exact reconstruction of
band-limited signals on the sphere with directional wavelets. It represents an
evolution of the wavelet formalism developed by Antoine & Vandergheynst (1999)
and Wiaux et al. (2005). The translations of the wavelets at any point on the
sphere and their proper rotations are still defined through the continuous
three-dimensional rotations. The dilations of the wavelets are directly defined
in harmonic space through a new kernel dilation, which is a modification of an
existing harmonic dilation. A family of factorized steerable functions with
compact harmonic support which are suitable for this kernel dilation is firstly
identified. A scale discretized wavelet formalism is then derived, relying on
this dilation. The discrete nature of the analysis scales allows the exact
reconstruction of band-limited signals. A corresponding exact multi-resolution
algorithm is finally described and an implementation is tested. The formalism
is of interest notably for the denoising or the deconvolution of signals on the
sphere with a sparse expansion in wavelets. In astrophysics, it finds a
particular application for the identification of localized directional features
in the cosmic microwave background (CMB) data, such as the imprint of
topological defects, in particular cosmic strings, and for their reconstruction
after separation from the other signal components.Comment: 22 pages, 2 figures. Version 2 matches version accepted for
publication in MNRAS. Version 3 (identical to version 2) posted for code
release announcement - "Steerable scale discretised wavelets on the sphere" -
S2DW code available for download at
http://www.mrao.cam.ac.uk/~jdm57/software.htm
Safe and complete contig assembly via omnitigs
Contig assembly is the first stage that most assemblers solve when
reconstructing a genome from a set of reads. Its output consists of contigs --
a set of strings that are promised to appear in any genome that could have
generated the reads. From the introduction of contigs 20 years ago, assemblers
have tried to obtain longer and longer contigs, but the following question was
never solved: given a genome graph (e.g. a de Bruijn, or a string graph),
what are all the strings that can be safely reported from as contigs? In
this paper we finally answer this question, and also give a polynomial time
algorithm to find them. Our experiments show that these strings, which we call
omnitigs, are 66% to 82% longer on average than the popular unitigs, and 29% of
dbSNP locations have more neighbors in omnitigs than in unitigs.Comment: Full version of the paper in the proceedings of RECOMB 201
Optimal Error Rates for Interactive Coding II: Efficiency and List Decoding
We study coding schemes for error correction in interactive communications.
Such interactive coding schemes simulate any -round interactive protocol
using rounds over an adversarial channel that corrupts up to
transmissions. Important performance measures for a coding scheme are its
maximum tolerable error rate , communication complexity , and
computational complexity.
We give the first coding scheme for the standard setting which performs
optimally in all three measures: Our randomized non-adaptive coding scheme has
a near-linear computational complexity and tolerates any error rate with a linear communication complexity. This improves over
prior results which each performed well in two of these measures.
We also give results for other settings of interest, namely, the first
computationally and communication efficient schemes that tolerate adaptively, if only one party is required to
decode, and if list decoding is allowed. These are the
optimal tolerable error rates for the respective settings. These coding schemes
also have near linear computational and communication complexity.
These results are obtained via two techniques: We give a general black-box
reduction which reduces unique decoding, in various settings, to list decoding.
We also show how to boost the computational and communication efficiency of any
list decoder to become near linear.Comment: preliminary versio
Randomized compiling for scalable quantum computing on a noisy superconducting quantum processor
The successful implementation of algorithms on quantum processors relies on
the accurate control of quantum bits (qubits) to perform logic gate operations.
In this era of noisy intermediate-scale quantum (NISQ) computing, systematic
miscalibrations, drift, and crosstalk in the control of qubits can lead to a
coherent form of error which has no classical analog. Coherent errors severely
limit the performance of quantum algorithms in an unpredictable manner, and
mitigating their impact is necessary for realizing reliable quantum
computations. Moreover, the average error rates measured by randomized
benchmarking and related protocols are not sensitive to the full impact of
coherent errors, and therefore do not reliably predict the global performance
of quantum algorithms, leaving us unprepared to validate the accuracy of future
large-scale quantum computations. Randomized compiling is a protocol designed
to overcome these performance limitations by converting coherent errors into
stochastic noise, dramatically reducing unpredictable errors in quantum
algorithms and enabling accurate predictions of algorithmic performance from
error rates measured via cycle benchmarking. In this work, we demonstrate
significant performance gains under randomized compiling for the four-qubit
quantum Fourier transform algorithm and for random circuits of variable depth
on a superconducting quantum processor. Additionally, we accurately predict
algorithm performance using experimentally-measured error rates. Our results
demonstrate that randomized compiling can be utilized to maximally-leverage and
predict the capabilities of modern-day noisy quantum processors, paving the way
forward for scalable quantum computing
Optimal Codes Detecting Deletions in Concatenated Binary Strings Applied to Trace Reconstruction
Consider two or more strings that are
concatenated to form . Suppose that up to deletions occur in each of the
concatenated strings. Since deletions alter the lengths of the strings, a
fundamental question to ask is: how much redundancy do we need to introduce in
in order to recover the boundaries of
? This boundary problem is equivalent to the
problem of designing codes that can detect the exact number of deletions in
each concatenated string. In this work, we answer the question above by first
deriving converse results that give lower bounds on the redundancy of
deletion-detecting codes. Then, we present a marker-based code construction
whose redundancy is asymptotically optimal in among all families of
deletion-detecting codes, and exactly optimal among all block-by-block
decodable codes. To exemplify the usefulness of such deletion-detecting codes,
we apply our code to trace reconstruction and design an efficient coded
reconstruction scheme that requires a constant number of traces.Comment: Accepted for publication in the IEEE Transactions on Information
Theory. arXiv admin note: substantial text overlap with arXiv:2207.05126,
arXiv:2105.0021
Reconstructing Polyatomic Structures from Discrete X-Rays: NP-Completeness Proof for Three Atoms
We address a discrete tomography problem that arises in the study of the
atomic structure of crystal lattices. A polyatomic structure T can be defined
as an integer lattice in dimension D>=2, whose points may be occupied by
distinct types of atoms. To ``analyze'' T, we conduct ell measurements that we
call_discrete X-rays_. A discrete X-ray in direction xi determines the number
of atoms of each type on each line parallel to xi. Given ell such non-parallel
X-rays, we wish to reconstruct T.
The complexity of the problem for c=1 (one atom type) has been completely
determined by Gardner, Gritzmann and Prangenberg, who proved that the problem
is NP-complete for any dimension D>=2 and ell>=3 non-parallel X-rays, and that
it can be solved in polynomial time otherwise.
The NP-completeness result above clearly extends to any c>=2, and therefore
when studying the polyatomic case we can assume that ell=2. As shown in another
article by the same authors, this problem is also NP-complete for c>=6 atoms,
even for dimension D=2 and axis-parallel X-rays. They conjecture that the
problem remains NP-complete for c=3,4,5, although, as they point out, the proof
idea does not seem to extend to c<=5.
We resolve the conjecture by proving that the problem is indeed NP-complete
for c>=3 in 2D, even for axis-parallel X-rays. Our construction relies heavily
on some structure results for the realizations of 0-1 matrices with given row
and column sums
High Energy Cosmic Neutrinos Astronomy: The ANTARES Project
Neutrinos may offer a unique opportunity to explore the far Universe at high
energy. The ANTARES collaboration aims at building a large undersea neutrino
detector able to observe astrophysical sources (AGNs, X-ray binary systems,
...) and to study particle physics topics (neutrino oscillation, ...). After
a description of the research opportunities of such a detector, a status report
of the experiment will be made.Comment: Talk given at the 19th Texas Symposium, Paris, December 199
Synchronization Strings: Explicit Constructions, Local Decoding, and Applications
This paper gives new results for synchronization strings, a powerful
combinatorial object that allows to efficiently deal with insertions and
deletions in various communication settings:
We give a deterministic, linear time synchronization string
construction, improving over an time randomized construction.
Independently of this work, a deterministic time
construction was just put on arXiv by Cheng, Li, and Wu. We also give a
deterministic linear time construction of an infinite synchronization string,
which was not known to be computable before. Both constructions are highly
explicit, i.e., the symbol can be computed in time.
This paper also introduces a generalized notion we call
long-distance synchronization strings that allow for local and very fast
decoding. In particular, only time and access to logarithmically
many symbols is required to decode any index.
We give several applications for these results:
For any we provide an insdel correcting
code with rate which can correct any fraction
of insdel errors in time. This near linear computational
efficiency is surprising given that we do not even know how to compute the
(edit) distance between the decoding input and output in sub-quadratic time. We
show that such codes can not only efficiently recover from fraction of
insdel errors but, similar to [Schulman, Zuckerman; TransInf'99], also from any
fraction of block transpositions and replications.
We show that highly explicitness and local decoding allow for
infinite channel simulations with exponentially smaller memory and decoding
time requirements. These simulations can be used to give the first near linear
time interactive coding scheme for insdel errors
- …