research

Rates of DNA Sequence Profiles for Practical Values of Read Lengths

Abstract

A recent study by one of the authors has demonstrated the importance of profile vectors in DNA-based data storage. We provide exact values and lower bounds on the number of profile vectors for finite values of alphabet size qq, read length ℓ\ell, and word length nn.Consequently, we demonstrate that for q≥2q\ge 2 and n≤qℓ/2−1n\le q^{\ell/2-1}, the number of profile vectors is at least qκnq^{\kappa n} with κ\kappa very close to one.In addition to enumeration results, we provide a set of efficient encoding and decoding algorithms for each of two particular families of profile vectors

    Similar works