10,731 research outputs found
String Reconstruction from Substring Compositions
Motivated by mass-spectrometry protein sequencing, we consider a
simply-stated problem of reconstructing a string from the multiset of its
substring compositions. We show that all strings of length 7, one less than a
prime, or one less than twice a prime, can be reconstructed uniquely up to
reversal. For all other lengths we show that reconstruction is not always
possible and provide sometimes-tight bounds on the largest number of strings
with given substring compositions. The lower bounds are derived by
combinatorial arguments and the upper bounds by algebraic considerations that
precisely characterize the set of strings with the same substring compositions
in terms of the factorization of bivariate polynomials. The problem can be
viewed as a combinatorial simplification of the turnpike problem, and its
solution may shed light on this long-standing problem as well. Using well known
results on transience of multi-dimensional random walks, we also provide a
reconstruction algorithm that reconstructs random strings over alphabets of
size in optimal near-quadratic time
- …