Search CORE

22 research outputs found

String Reconstruction from Substring Compositions

Author: Acharya Jayadev
Das Hirakendu
Milenkovic Olgica
Orlitsky Alon
Pan Shengjun
Publication venue
Publication date: 10/03/2014
Field of study

Motivated by mass-spectrometry protein sequencing, we consider a simply-stated problem of reconstructing a string from the multiset of its substring compositions. We show that all strings of length 7, one less than a prime, or one less than twice a prime, can be reconstructed uniquely up to reversal. For all other lengths we show that reconstruction is not always possible and provide sometimes-tight bounds on the largest number of strings with given substring compositions. The lower bounds are derived by combinatorial arguments and the upper bounds by algebraic considerations that precisely characterize the set of strings with the same substring compositions in terms of the factorization of bivariate polynomials. The problem can be viewed as a combinatorial simplification of the turnpike problem, and its solution may shed light on this long-standing problem as well. Using well known results on transience of multi-dimensional random walks, we also provide a reconstruction algorithm that reconstructs random strings over alphabets of size

\ge4

in optimal near-quadratic time

arXiv.org e-Print Archive

CiteSeerX

DSpace@MIT

Crossref

A New Algebraic Approach for String Reconstruction from Substring Compositions

Author: Gupta Utkarsh
Mahdavifar Hessam
Publication venue
Publication date: 01/06/2023
Field of study

We consider the problem of binary string reconstruction from the multiset of its substring compositions, i.e., referred to as the substring composition multiset, first introduced and studied by Acharya et al. We introduce a new algorithm for the problem of string reconstruction from its substring composition multiset which relies on the algebraic properties of the equivalent bivariate polynomial formulation of the problem. We then characterize specific algebraic conditions for the binary string to be reconstructed that guarantee the algorithm does not require any backtracking through the reconstruction, and, consequently, the time complexity is bounded polynomially. More specifically, in the case of no backtracking, our algorithm has a time complexity of

O(n^2)

compared to the algorithm by Acharya et al., which has a time complexity of

O(n^2\log(n))

, where

n

is the length of the binary string. Furthermore, it is shown that larger sets of binary strings are uniquely reconstructable by the new algorithm and without the need for backtracking leading to codebooks of reconstruction codes that are larger, by a linear factor in size, compared to the previously known construction by Pattabiraman et al., while having

O(n^2)

reconstruction complexity

arXiv.org e-Print Archive

On the Parikh-de-Bruijn grid

Author: Burcsi Péter
Lipták Zsuzsanna
Smyth W. F.
Publication venue
Publication date: 01/01/2017
Field of study

We introduce the Parikh-de-Bruijn grid, a graph whose vertices are fixed-order Parikh vectors, and whose edges are given by a simple shift operation. This graph gives structural insight into the nature of sets of Parikh vectors as well as that of the Parikh set of a given string. We show its utility by proving some results on Parikh-de-Bruijn strings, the abelian analog of de-Bruijn sequences.Comment: 18 pages, 3 figures, 1 tabl

arXiv.org e-Print Archive

Catalogo dei prodotti della ricerca

Reconstruction of Rooted Directed Trees

Author: Bartha Denes
Publication venue: 'University of Szeged'
Publication date: 01/01/2018
Field of study

Let T be a rooted directed tree on n vertices, rooted at v. The rooted subtree frequency vector (RSTF-vector) of T with root v, denoted by rstf(T, v) is a vector of length n whose entry at position k is the number of subtrees of T that contain v and have exactly k vertices. In this paper we present an algorithm for reconstructing rooted directed trees from their rooted subtree frequencies (up to isomorphism). We show that there are examples of nonisomorphic pairs of rooted directed trees that are RSTF-equivalent, that is they share the same rooted subtree frequency vectors. We have found all such pairs (groups) for small sizes by using exhaustive computer search. We show that infinitely many nonisomorphic RSTF-equivalent pairs of trees exist by constructing infinite families of examples

University of Szeged

ELTE Digital Institutional Repository (EDIT)

Reconstruction Codes for DNA Sequences with Uniform Tandem-Duplication Errors

Author: Schwartz Moshe
Yehezkeally Yonatan
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 05/09/2019
Field of study

DNA as a data storage medium has several advantages, including far greater data density compared to electronic media. We propose that schemes for data storage in the DNA of living organisms may benefit from studying the reconstruction problem, which is applicable whenever multiple reads of noisy data are available. This strategy is uniquely suited to the medium, which inherently replicates stored data in multiple distinct ways, caused by mutations. We consider noise introduced solely by uniform tandem-duplication, and utilize the relation to constant-weight integer codes in the Manhattan metric. By bounding the intersection of the cross-polytope with hyperplanes, we prove the existence of reconstruction codes with greater capacity than known error-correcting codes, which we can determine analytically for any set of parameters.Comment: 11 pages, 2 figures, Latex; version accepted for publicatio

arXiv.org e-Print Archive

Crossref

Reconstruction of Trees from Jumbled and Weighted Subtrees

Author: Bartha D\ue9nes
Burcsi P\ue9ter
Liptak Zsuzsanna
Publication venue
Publication date: 01/01/2016
Field of study

Let T be an edge-labeled graph, where the labels are from a finite alphabet Sigma. For a subtree U of T the Parikh vector of U is a vector of length |Sigma| which specifies the multiplicity of each label in U. We ask when T can be reconstructed from the multiset of Parikh vectors of all its subtrees, or all of its paths, or all of its maximal paths. We consider the analogous problems for weighted trees. We show how several well-known reconstruction problems on labeled strings, weighted strings and point sets on a line can be included in this framework. We present reconstruction algorithms and non-reconstructibility results, and extend the polynomial method, previously applied to jumbled strings [Acharya et al., SIAM J. on Discr. Math, 2015] and weighted strings [Bansal et al., CPM 2004], to deal with general trees and special tree classes

Catalogo dei prodotti della ricerca