6,293 research outputs found

    Reverse-Safe Data Structures for Text Indexing

    Get PDF
    We introduce the notion of reverse-safe data structures. These are data structures that prevent the reconstruction of the data they encode (i.e., they cannot be easily reversed). A data structure D is called z-reverse-safe when there exist at least z datasets with the same set of answers as the ones stored by D. The main challenge is to ensure that D stores as many answers to useful queries as possible, is constructed efficiently, and has size close to the size of the original dataset it encodes. Given a text of length n and an integer z, we propose an algorithm which constructs a z-reverse-safe data structure that has size O(n) and answers pattern matching queries of length at most d optimally, where d is maximal for any such z-reverse-safe data structure. The construction algorithm takes O(n ω log d) time, where ω is the matrix multiplication exponent. We show that, despite the n ω factor, our engineered implementation takes only a few minutes to finish for million-letter texts. We further show that plugging our method in data analysis applications gives insignificant or no data utility loss. Finally, we show how our technique can be extended to support applications under a realistic adversary model

    Exact reconstruction with directional wavelets on the sphere

    Get PDF
    A new formalism is derived for the analysis and exact reconstruction of band-limited signals on the sphere with directional wavelets. It represents an evolution of the wavelet formalism developed by Antoine & Vandergheynst (1999) and Wiaux et al. (2005). The translations of the wavelets at any point on the sphere and their proper rotations are still defined through the continuous three-dimensional rotations. The dilations of the wavelets are directly defined in harmonic space through a new kernel dilation, which is a modification of an existing harmonic dilation. A family of factorized steerable functions with compact harmonic support which are suitable for this kernel dilation is firstly identified. A scale discretized wavelet formalism is then derived, relying on this dilation. The discrete nature of the analysis scales allows the exact reconstruction of band-limited signals. A corresponding exact multi-resolution algorithm is finally described and an implementation is tested. The formalism is of interest notably for the denoising or the deconvolution of signals on the sphere with a sparse expansion in wavelets. In astrophysics, it finds a particular application for the identification of localized directional features in the cosmic microwave background (CMB) data, such as the imprint of topological defects, in particular cosmic strings, and for their reconstruction after separation from the other signal components.Comment: 22 pages, 2 figures. Version 2 matches version accepted for publication in MNRAS. Version 3 (identical to version 2) posted for code release announcement - "Steerable scale discretised wavelets on the sphere" - S2DW code available for download at http://www.mrao.cam.ac.uk/~jdm57/software.htm

    Safe and complete contig assembly via omnitigs

    Full text link
    Contig assembly is the first stage that most assemblers solve when reconstructing a genome from a set of reads. Its output consists of contigs -- a set of strings that are promised to appear in any genome that could have generated the reads. From the introduction of contigs 20 years ago, assemblers have tried to obtain longer and longer contigs, but the following question was never solved: given a genome graph GG (e.g. a de Bruijn, or a string graph), what are all the strings that can be safely reported from GG as contigs? In this paper we finally answer this question, and also give a polynomial time algorithm to find them. Our experiments show that these strings, which we call omnitigs, are 66% to 82% longer on average than the popular unitigs, and 29% of dbSNP locations have more neighbors in omnitigs than in unitigs.Comment: Full version of the paper in the proceedings of RECOMB 201

    Optimal Error Rates for Interactive Coding II: Efficiency and List Decoding

    Full text link
    We study coding schemes for error correction in interactive communications. Such interactive coding schemes simulate any nn-round interactive protocol using NN rounds over an adversarial channel that corrupts up to ρN\rho N transmissions. Important performance measures for a coding scheme are its maximum tolerable error rate ρ\rho, communication complexity NN, and computational complexity. We give the first coding scheme for the standard setting which performs optimally in all three measures: Our randomized non-adaptive coding scheme has a near-linear computational complexity and tolerates any error rate δ<1/4\delta < 1/4 with a linear N=Θ(n)N = \Theta(n) communication complexity. This improves over prior results which each performed well in two of these measures. We also give results for other settings of interest, namely, the first computationally and communication efficient schemes that tolerate ρ<27\rho < \frac{2}{7} adaptively, ρ<13\rho < \frac{1}{3} if only one party is required to decode, and ρ<12\rho < \frac{1}{2} if list decoding is allowed. These are the optimal tolerable error rates for the respective settings. These coding schemes also have near linear computational and communication complexity. These results are obtained via two techniques: We give a general black-box reduction which reduces unique decoding, in various settings, to list decoding. We also show how to boost the computational and communication efficiency of any list decoder to become near linear.Comment: preliminary versio

    Randomized compiling for scalable quantum computing on a noisy superconducting quantum processor

    Full text link
    The successful implementation of algorithms on quantum processors relies on the accurate control of quantum bits (qubits) to perform logic gate operations. In this era of noisy intermediate-scale quantum (NISQ) computing, systematic miscalibrations, drift, and crosstalk in the control of qubits can lead to a coherent form of error which has no classical analog. Coherent errors severely limit the performance of quantum algorithms in an unpredictable manner, and mitigating their impact is necessary for realizing reliable quantum computations. Moreover, the average error rates measured by randomized benchmarking and related protocols are not sensitive to the full impact of coherent errors, and therefore do not reliably predict the global performance of quantum algorithms, leaving us unprepared to validate the accuracy of future large-scale quantum computations. Randomized compiling is a protocol designed to overcome these performance limitations by converting coherent errors into stochastic noise, dramatically reducing unpredictable errors in quantum algorithms and enabling accurate predictions of algorithmic performance from error rates measured via cycle benchmarking. In this work, we demonstrate significant performance gains under randomized compiling for the four-qubit quantum Fourier transform algorithm and for random circuits of variable depth on a superconducting quantum processor. Additionally, we accurately predict algorithm performance using experimentally-measured error rates. Our results demonstrate that randomized compiling can be utilized to maximally-leverage and predict the capabilities of modern-day noisy quantum processors, paving the way forward for scalable quantum computing

    Optimal Codes Detecting Deletions in Concatenated Binary Strings Applied to Trace Reconstruction

    Full text link
    Consider two or more strings x1,x2,,\mathbf{x}^1,\mathbf{x}^2,\ldots, that are concatenated to form x=x1,x2,\mathbf{x}=\langle \mathbf{x}^1,\mathbf{x}^2,\ldots \rangle. Suppose that up to δ\delta deletions occur in each of the concatenated strings. Since deletions alter the lengths of the strings, a fundamental question to ask is: how much redundancy do we need to introduce in x\mathbf{x} in order to recover the boundaries of x1,x2,\mathbf{x}^1,\mathbf{x}^2,\ldots? This boundary problem is equivalent to the problem of designing codes that can detect the exact number of deletions in each concatenated string. In this work, we answer the question above by first deriving converse results that give lower bounds on the redundancy of deletion-detecting codes. Then, we present a marker-based code construction whose redundancy is asymptotically optimal in δ\delta among all families of deletion-detecting codes, and exactly optimal among all block-by-block decodable codes. To exemplify the usefulness of such deletion-detecting codes, we apply our code to trace reconstruction and design an efficient coded reconstruction scheme that requires a constant number of traces.Comment: Accepted for publication in the IEEE Transactions on Information Theory. arXiv admin note: substantial text overlap with arXiv:2207.05126, arXiv:2105.0021

    Reconstructing Polyatomic Structures from Discrete X-Rays: NP-Completeness Proof for Three Atoms

    Full text link
    We address a discrete tomography problem that arises in the study of the atomic structure of crystal lattices. A polyatomic structure T can be defined as an integer lattice in dimension D>=2, whose points may be occupied by cc distinct types of atoms. To ``analyze'' T, we conduct ell measurements that we call_discrete X-rays_. A discrete X-ray in direction xi determines the number of atoms of each type on each line parallel to xi. Given ell such non-parallel X-rays, we wish to reconstruct T. The complexity of the problem for c=1 (one atom type) has been completely determined by Gardner, Gritzmann and Prangenberg, who proved that the problem is NP-complete for any dimension D>=2 and ell>=3 non-parallel X-rays, and that it can be solved in polynomial time otherwise. The NP-completeness result above clearly extends to any c>=2, and therefore when studying the polyatomic case we can assume that ell=2. As shown in another article by the same authors, this problem is also NP-complete for c>=6 atoms, even for dimension D=2 and axis-parallel X-rays. They conjecture that the problem remains NP-complete for c=3,4,5, although, as they point out, the proof idea does not seem to extend to c<=5. We resolve the conjecture by proving that the problem is indeed NP-complete for c>=3 in 2D, even for axis-parallel X-rays. Our construction relies heavily on some structure results for the realizations of 0-1 matrices with given row and column sums

    High Energy Cosmic Neutrinos Astronomy: The ANTARES Project

    Get PDF
    Neutrinos may offer a unique opportunity to explore the far Universe at high energy. The ANTARES collaboration aims at building a large undersea neutrino detector able to observe astrophysical sources (AGNs, X-ray binary systems, ...) and to study particle physics topics (neutrino oscillation, ...). After a description of the research opportunities of such a detector, a status report of the experiment will be made.Comment: Talk given at the 19th Texas Symposium, Paris, December 199

    Synchronization Strings: Explicit Constructions, Local Decoding, and Applications

    Full text link
    This paper gives new results for synchronization strings, a powerful combinatorial object that allows to efficiently deal with insertions and deletions in various communication settings: \bullet We give a deterministic, linear time synchronization string construction, improving over an O(n5)O(n^5) time randomized construction. Independently of this work, a deterministic O(nlog2logn)O(n\log^2\log n) time construction was just put on arXiv by Cheng, Li, and Wu. We also give a deterministic linear time construction of an infinite synchronization string, which was not known to be computable before. Both constructions are highly explicit, i.e., the ithi^{th} symbol can be computed in O(logi)O(\log i) time. \bullet This paper also introduces a generalized notion we call long-distance synchronization strings that allow for local and very fast decoding. In particular, only O(log3n)O(\log^3 n) time and access to logarithmically many symbols is required to decode any index. We give several applications for these results: \bullet For any δ0\delta0 we provide an insdel correcting code with rate 1δϵ1-\delta-\epsilon which can correct any O(δ)O(\delta) fraction of insdel errors in O(nlog3n)O(n\log^3n) time. This near linear computational efficiency is surprising given that we do not even know how to compute the (edit) distance between the decoding input and output in sub-quadratic time. We show that such codes can not only efficiently recover from δ\delta fraction of insdel errors but, similar to [Schulman, Zuckerman; TransInf'99], also from any O(δ/logn)O(\delta/\log n) fraction of block transpositions and replications. \bullet We show that highly explicitness and local decoding allow for infinite channel simulations with exponentially smaller memory and decoding time requirements. These simulations can be used to give the first near linear time interactive coding scheme for insdel errors
    corecore