Search CORE

10,858 research outputs found

Circular Trace Reconstruction

Author: Narayanan Shyam
Ren Michael
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 12th Innovations in Theoretical Computer Science Conference (ITCS 2021)
Publication date: 13/12/2020
Field of study

Trace reconstruction is the problem of learning an unknown string

x

from independent traces of

x

, where traces are generated by independently deleting each bit of

x

with some deletion probability

q

. In this paper, we initiate the study of Circular trace reconstruction, where the unknown string

x

is circular and traces are now rotated by a random cyclic shift. Trace reconstruction is related to many computational biology problems studying DNA, which is a primary motivation for this problem as well, as many types of DNA are known to be circular. Our main results are as follows. First, we prove that we can reconstruct arbitrary circular strings of length

n

using

\exp\big(\tilde{O}(n^{1/3})\big)

traces for any constant deletion probability

q

, as long as

n

is prime or the product of two primes. For

n

of this form, this nearly matches what was the best known bound of

\exp\big(O(n^{1/3})\big)

for standard trace reconstruction when this paper was initially released. We note, however, that Chase very recently improved the standard trace reconstruction bound to

\exp\big(\tilde{O}(n^{1/5})\big)

. Next, we prove that we can reconstruct random circular strings with high probability using

n^{O(1)}

traces for any constant deletion probability

q

. Finally, we prove a lower bound of

\tilde{\Omega}(n^3)

traces for arbitrary circular strings, which is greater than the best known lower bound of

\tilde{\Omega}(n^{3/2})

in standard trace reconstruction.Comment: 25 pages, 1 figure. To appear in Innovations in Theoretical Computer Science (ITCS), 202

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

Trace Reconstruction: Generalized and Parameterized

Author: Krishnamurthy Akshay
Mazumdar Arya
McGregor Andrew
Pal Soumyabrata
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 27th Annual European Symposium on Algorithms (ESA 2019)
Publication date: 01/01/2019
Field of study

In the beautifully simple-to-state problem of trace reconstruction, the goal is to reconstruct an unknown binary string x given random "traces" of x where each trace is generated by deleting each coordinate of x independently with probability p<1. The problem is well studied both when the unknown string is arbitrary and when it is chosen uniformly at random. For both settings, there is still an exponential gap between upper and lower sample complexity bounds and our understanding of the problem is still surprisingly limited. In this paper, we consider natural parameterizations and generalizations of this problem in an effort to attain a deeper and more comprehensive understanding. Perhaps our most surprising results are: 1) We prove that exp(O(n^(1/4) sqrt{log n})) traces suffice for reconstructing arbitrary matrices. In the matrix version of the problem, each row and column of an unknown sqrt{n} x sqrt{n} matrix is deleted independently with probability p. Our results contrasts with the best known results for sequence reconstruction where the best known upper bound is exp(O(n^(1/3))). 2) An optimal result for random matrix reconstruction: we show that Theta(log n) traces are necessary and sufficient. This is in contrast to the problem for random sequences where there is a super-logarithmic lower bound and the best known upper bound is exp({O}(log^(1/3) n)). 3) We show that exp(O(k^(1/3) log^(2/3) n)) traces suffice to reconstruct k-sparse strings, providing an improvement over the best known sequence reconstruction results when k = o(n/log^2 n). 4) We show that poly(n) traces suffice if x is k-sparse and we additionally have a "separation" promise, specifically that the indices of 1\u27s in x all differ by Omega(k log n)

Dagstuhl Research Online Publication Server

Efficient Average-Case Population Recovery in the Presence of Insertions and Deletions

Author: Ban Frank
Chen Xi
Servedio Rocco A.
Sinha Sandip
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2019)
Publication date: 01/01/2019
Field of study

A number of recent works have considered the trace reconstruction problem, in which an unknown source string x in {0,1}^n is transmitted through a probabilistic channel which may randomly delete coordinates or insert random bits, resulting in a trace of x. The goal is to reconstruct the original string x from independent traces of x. While the asymptotically best algorithms known for worst-case strings use exp(O(n^{1/3})) traces [De et al., 2017; Fedor Nazarov and Yuval Peres, 2017], several highly efficient algorithms are known [Yuval Peres and Alex Zhai, 2017; Nina Holden et al., 2018] for the average-case version of the problem, in which the source string x is chosen uniformly at random from {0,1}^n. In this paper we consider a generalization of the above-described average-case trace reconstruction problem, which we call average-case population recovery in the presence of insertions and deletions. In this problem, rather than a single unknown source string there is an unknown distribution over s unknown source strings x^1,...,x^s in {0,1}^n, and each sample given to the algorithm is independently generated by drawing some x^i from this distribution and returning an independent trace of x^i. Building on the results of [Yuval Peres and Alex Zhai, 2017] and [Nina Holden et al., 2018], we give an efficient algorithm for the average-case population recovery problem in the presence of insertions and deletions. For any support size 1 <= s <= exp(Theta(n^{1/3})), for a 1-o(1) fraction of all s-element support sets {x^1,...,x^s} subset {0,1}^n, for every distribution D supported on {x^1,...,x^s}, our algorithm can efficiently recover D up to total variation distance at most epsilon with high probability, given access to independent traces of independent draws from D. The running time of our algorithm is poly(n,s,1/epsilon) and its sample complexity is poly (s,1/epsilon,exp(log^{1/3} n)). This polynomial dependence on the support size s is in sharp contrast with the worst-case version of the problem (when x^1,...,x^s may be any strings in {0,1}^n), in which the sample complexity of the most efficient known algorithm [Frank Ban et al., 2019] is doubly exponential in s

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server