Due to its higher data density, longevity, energy efficiency, and ease of
generating copies, DNA is considered a promising storage technology for
satisfying future needs. However, a diverse set of errors including deletions,
insertions, duplications, and substitutions may arise in DNA at different
stages of data storage and retrieval. The current paper constructs
error-correcting codes for simultaneously correcting short (tandem)
duplications and at most p edits, where a short duplication generates a copy
of a substring with length ≤3 and inserts the copy following the original
substring, and an edit is a substitution, deletion, or insertion. Compared to
the state-of-the-art codes for duplications only, the proposed codes correct up
to p edits (in addition to duplications) at the additional cost of roughly
8p(logqn)(1+o(1)) symbols of redundancy, thus achieving the same
asymptotic rate, where q≥4 is the alphabet size and p is a constant.
Furthermore, the time complexities of both the encoding and decoding processes
are polynomial when p is a constant with respect to the code length.Comment: 21 pages. The paper has been submitted to IEEE Transaction on
Information Theory. Furthermore, the paper was presented in part at the
ISIT2021 and ISIT202