Search CORE

6,293 research outputs found

Reverse-Safe Data Structures for Text Indexing

Author: Gabriele Fici
Giulia Bernardini
Grigorios Loukides
Huiping Chen
Solon P. Pissis
Publication venue: 'Society for Industrial & Applied Mathematics (SIAM)'
Publication date: 01/01/2020
Field of study

We introduce the notion of reverse-safe data structures. These are data structures that prevent the reconstruction of the data they encode (i.e., they cannot be easily reversed). A data structure D is called z-reverse-safe when there exist at least z datasets with the same set of answers as the ones stored by D. The main challenge is to ensure that D stores as many answers to useful queries as possible, is constructed efficiently, and has size close to the size of the original dataset it encodes. Given a text of length n and an integer z, we propose an algorithm which constructs a z-reverse-safe data structure that has size O(n) and answers pattern matching queries of length at most d optimally, where d is maximal for any such z-reverse-safe data structure. The construction algorithm takes O(n ω log d) time, where ω is the matrix multiplication exponent. We show that, despite the n ω factor, our engineered implementation takes only a few minutes to finish for million-letter texts. We further show that plugging our method in data analysis applications gives insignificant or no data utility loss. Finally, we show how our technique can be extended to support applications under a realistic adversary model

Archivio istituzionale della ricerca - Università di Trieste

Crossref

CWI's Institutional Repository

University of Birmingham Research Portal

Archivio istituzionale della ricerca - Università di Palermo

Exact reconstruction with directional wavelets on the sphere

Author: Abramowitz
Antoine
Antoine
Antoine
Antoine
Antoine
Barker
Bennett
Bevis
Bogdanova
Brink
Canny
Contaldi
Cruz
Daubechies
Davis
Demanet
Doroshkevich
Driscoll
Duval-Destin
Frazier
Freeden
Freeden
Freeman
Górski
Healy
Healy
Hindmarsh
Hinshaw
Holschneider
J. D. McEwen
Jeong
Jones
Kaiser
Komatsu
Kosowsky
Mallat
Maslen
McEwen
McEwen
Muschietti
Narcowich
O. Blanc
P. Vandergheynst
Ruhl
Simoncelli
Spergel
Spergel
Starck
Starck
Turok
Vandergheynst
Varshalovich
Vilenkin
Wandelt
Wiaux
Wiaux
Wiaux
Wyman
Wyman
Y. Wiaux
Publication venue: 'Wiley'
Publication date: 20/12/2007
Field of study

A new formalism is derived for the analysis and exact reconstruction of band-limited signals on the sphere with directional wavelets. It represents an evolution of the wavelet formalism developed by Antoine & Vandergheynst (1999) and Wiaux et al. (2005). The translations of the wavelets at any point on the sphere and their proper rotations are still defined through the continuous three-dimensional rotations. The dilations of the wavelets are directly defined in harmonic space through a new kernel dilation, which is a modification of an existing harmonic dilation. A family of factorized steerable functions with compact harmonic support which are suitable for this kernel dilation is firstly identified. A scale discretized wavelet formalism is then derived, relying on this dilation. The discrete nature of the analysis scales allows the exact reconstruction of band-limited signals. A corresponding exact multi-resolution algorithm is finally described and an implementation is tested. The formalism is of interest notably for the denoising or the deconvolution of signals on the sphere with a sparse expansion in wavelets. In astrophysics, it finds a particular application for the identification of localized directional features in the cosmic microwave background (CMB) data, such as the imprint of topological defects, in particular cosmic strings, and for their reconstruction after separation from the other signal components.Comment: 22 pages, 2 figures. Version 2 matches version accepted for publication in MNRAS. Version 3 (identical to version 2) posted for code release announcement - "Steerable scale discretised wavelets on the sphere" - S2DW code available for download at http://www.mrao.cam.ac.uk/~jdm57/software.htm

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

Crossref

Heriot Watt Pure

Safe and complete contig assembly via omnitigs

Author: A Bankevich
A Guénoche
AR Rubinov
AS Motahari
C Kingsford
D Haussler
DR Zerbino
E Kapun
E Kapun
ES Lander
G Bresler
G Narzisi
I Lysov
JD Kececioglu
JR Miller
JT Simpson
JT Simpson
K Lam
K Sahlin
L Salmela
M Boetzer
M Boetzer
N Nagarajan
N Nagarajan
N Vyahhi
P Medvedev
P Medvedev
P Medvedev
PA Pevzner
PA Pevzner
R Chikhi
R Chikhi
R Luo
R Uricaru
RM Idury
SL Salzberg
Publication venue
Publication date: 16/08/2016
Field of study

Contig assembly is the first stage that most assemblers solve when reconstructing a genome from a set of reads. Its output consists of contigs -- a set of strings that are promised to appear in any genome that could have generated the reads. From the introduction of contigs 20 years ago, assemblers have tried to obtain longer and longer contigs, but the following question was never solved: given a genome graph

G

(e.g. a de Bruijn, or a string graph), what are all the strings that can be safely reported from

G

as contigs? In this paper we finally answer this question, and also give a polynomial time algorithm to find them. Our experiments show that these strings, which we call omnitigs, are 66% to 82% longer on average than the popular unitigs, and 29% of dbSNP locations have more neighbors in omnitigs than in unitigs.Comment: Full version of the paper in the proceedings of RECOMB 201

arXiv.org e-Print Archive

Crossref

Optimal Error Rates for Interactive Coding II: Efficiency and List Decoding

Author: Ghaffari Mohsen
Haeupler Bernhard
Publication venue
Publication date: 15/04/2014
Field of study

We study coding schemes for error correction in interactive communications. Such interactive coding schemes simulate any

n

-round interactive protocol using

N

rounds over an adversarial channel that corrupts up to

\rho N

transmissions. Important performance measures for a coding scheme are its maximum tolerable error rate

\rho

, communication complexity

N

, and computational complexity. We give the first coding scheme for the standard setting which performs optimally in all three measures: Our randomized non-adaptive coding scheme has a near-linear computational complexity and tolerates any error rate

\delta < 1/4

with a linear

N = \Theta(n)

communication complexity. This improves over prior results which each performed well in two of these measures. We also give results for other settings of interest, namely, the first computationally and communication efficient schemes that tolerate

\rho < \frac{2}{7}

adaptively,

\rho < \frac{1}{3}

if only one party is required to decode, and

\rho < \frac{1}{2}

if list decoding is allowed. These are the optimal tolerable error rates for the respective settings. These coding schemes also have near linear computational and communication complexity. These results are obtained via two techniques: We give a general black-box reduction which reduces unique decoding, in various settings, to list decoding. We also show how to boost the computational and communication efficiency of any list decoder to become near linear.Comment: preliminary versio

arXiv.org e-Print Archive

Crossref

Randomized compiling for scalable quantum computing on a noisy superconducting quantum processor

Author: Davis Marc
Emerson Joseph
Hashim Akel
Hincks Ian
Iancu Costin
Kreikebaum John Mark
Mitchell Bradley
Morvan Alexis
Naik Ravi K.
O'Brien Kevin P.
Siddiqi Irfan
Smith Ethan
Ville Jean-Loup
Wallman Joel J.
Publication venue
Publication date: 01/10/2020
Field of study

The successful implementation of algorithms on quantum processors relies on the accurate control of quantum bits (qubits) to perform logic gate operations. In this era of noisy intermediate-scale quantum (NISQ) computing, systematic miscalibrations, drift, and crosstalk in the control of qubits can lead to a coherent form of error which has no classical analog. Coherent errors severely limit the performance of quantum algorithms in an unpredictable manner, and mitigating their impact is necessary for realizing reliable quantum computations. Moreover, the average error rates measured by randomized benchmarking and related protocols are not sensitive to the full impact of coherent errors, and therefore do not reliably predict the global performance of quantum algorithms, leaving us unprepared to validate the accuracy of future large-scale quantum computations. Randomized compiling is a protocol designed to overcome these performance limitations by converting coherent errors into stochastic noise, dramatically reducing unpredictable errors in quantum algorithms and enabling accurate predictions of algorithmic performance from error rates measured via cycle benchmarking. In this work, we demonstrate significant performance gains under randomized compiling for the four-qubit quantum Fourier transform algorithm and for random circuits of variable depth on a superconducting quantum processor. Additionally, we accurately predict algorithm performance using experimentally-measured error rates. Our results demonstrate that randomized compiling can be utilized to maximally-leverage and predict the capabilities of modern-day noisy quantum processors, paving the way forward for scalable quantum computing

arXiv.org e-Print Archive

DSpace@MIT

Directory of Open Access Journals

eScholarship - University of California

Optimal Codes Detecting Deletions in Concatenated Binary Strings Applied to Trace Reconstruction

Author: Hanna Serge Kas
Publication venue
Publication date: 19/04/2023
Field of study

Consider two or more strings

\mathbf{x}^1,\mathbf{x}^2,\ldots,

that are concatenated to form

\mathbf{x}=\langle \mathbf{x}^1,\mathbf{x}^2,\ldots \rangle

. Suppose that up to

\delta

deletions occur in each of the concatenated strings. Since deletions alter the lengths of the strings, a fundamental question to ask is: how much redundancy do we need to introduce in

\mathbf{x}

in order to recover the boundaries of

\mathbf{x}^1,\mathbf{x}^2,\ldots

? This boundary problem is equivalent to the problem of designing codes that can detect the exact number of deletions in each concatenated string. In this work, we answer the question above by first deriving converse results that give lower bounds on the redundancy of deletion-detecting codes. Then, we present a marker-based code construction whose redundancy is asymptotically optimal in

\delta

among all families of deletion-detecting codes, and exactly optimal among all block-by-block decodable codes. To exemplify the usefulness of such deletion-detecting codes, we apply our code to trace reconstruction and design an efficient coded reconstruction scheme that requires a constant number of traces.Comment: Accepted for publication in the IEEE Transactions on Information Theory. arXiv admin note: substantial text overlap with arXiv:2207.05126, arXiv:2105.0021

arXiv.org e-Print Archive

Reconstructing Polyatomic Structures from Discrete X-Rays: NP-Completeness Proof for Three Atoms

Author: Chrobak Marek
Durr Christoph
Publication venue
Publication date: 01/01/1998
Field of study

We address a discrete tomography problem that arises in the study of the atomic structure of crystal lattices. A polyatomic structure T can be defined as an integer lattice in dimension D>=2, whose points may be occupied by

c

distinct types of atoms. To ``analyze'' T, we conduct ell measurements that we call_discrete X-rays_. A discrete X-ray in direction xi determines the number of atoms of each type on each line parallel to xi. Given ell such non-parallel X-rays, we wish to reconstruct T. The complexity of the problem for c=1 (one atom type) has been completely determined by Gardner, Gritzmann and Prangenberg, who proved that the problem is NP-complete for any dimension D>=2 and ell>=3 non-parallel X-rays, and that it can be solved in polynomial time otherwise. The NP-completeness result above clearly extends to any c>=2, and therefore when studying the polyatomic case we can assume that ell=2. As shown in another article by the same authors, this problem is also NP-complete for c>=6 atoms, even for dimension D=2 and axis-parallel X-rays. They conjecture that the problem remains NP-complete for c=3,4,5, although, as they point out, the proof idea does not seem to extend to c<=5. We resolve the conjecture by proving that the problem is indeed NP-complete for c>=3 in 2D, even for axis-parallel X-rays. Our construction relies heavily on some structure results for the realizations of 0-1 matrices with given row and column sums

arXiv.org e-Print Archive

CiteSeerX

High Energy Cosmic Neutrinos Astronomy: The ANTARES Project

Author: Basa S.
Publication venue
Publication date: 14/12/1998
Field of study

Neutrinos may offer a unique opportunity to explore the far Universe at high energy. The ANTARES collaboration aims at building a large undersea neutrino detector able to observe astrophysical sources (AGNs, X-ray binary systems, ...) and to study particle physics topics (neutrino oscillation, ...). After a description of the research opportunities of such a detector, a status report of the experiment will be made.Comment: Talk given at the 19th Texas Symposium, Paris, December 199

arXiv.org e-Print Archive

HAL-IN2P3

HAL AMU

CERN Document Server

Synchronization Strings: Explicit Constructions, Local Decoding, and Applications

Author: An
Fast
Guruswami Venkatesan
Haeupler Bernhard
Haeupler Bernhard
Haeupler Bernhard
Hemenway Brett
Sherstov Alexander A
Publication venue
Publication date: 09/11/2017
Field of study

This paper gives new results for synchronization strings, a powerful combinatorial object that allows to efficiently deal with insertions and deletions in various communication settings:

\bullet

We give a deterministic, linear time synchronization string construction, improving over an

O(n^5)

time randomized construction. Independently of this work, a deterministic

O(n\log^2\log n)

time construction was just put on arXiv by Cheng, Li, and Wu. We also give a deterministic linear time construction of an infinite synchronization string, which was not known to be computable before. Both constructions are highly explicit, i.e., the

i^{th}

symbol can be computed in

O(\log i)

time.

\bullet

This paper also introduces a generalized notion we call long-distance synchronization strings that allow for local and very fast decoding. In particular, only

O(\log^3 n)

time and access to logarithmically many symbols is required to decode any index. We give several applications for these results:

\bullet

For any

\delta0

we provide an insdel correcting code with rate

1-\delta-\epsilon

which can correct any

O(\delta)

fraction of insdel errors in

O(n\log^3n)

time. This near linear computational efficiency is surprising given that we do not even know how to compute the (edit) distance between the decoding input and output in sub-quadratic time. We show that such codes can not only efficiently recover from

\delta

fraction of insdel errors but, similar to [Schulman, Zuckerman; TransInf'99], also from any

O(\delta/\log n)

fraction of block transpositions and replications.

\bullet

We show that highly explicitness and local decoding allow for infinite channel simulations with exponentially smaller memory and decoding time requirements. These simulations can be used to give the first near linear time interactive coding scheme for insdel errors

arXiv.org e-Print Archive

Crossref