Search CORE

2,789 research outputs found

On DNA Codes Over the Non-Chain Ring $\mathbb{Z}_4+u\mathbb{Z}_4+u^2\mathbb{Z}_4$ with $u^3=1$

Author: Banerjee Adrish
Benerjee Krishna Gopal
Das Shibsankar
Publication venue
Publication date: 25/11/2022
Field of study

In this paper, we present a novel design strategy of DNA codes with length

3n

over the non-chain ring

R=\mathbb{Z}_4+u\mathbb{Z}_4+u^2\mathbb{Z}_4

with

64

elements and

u^3=1

, where

n

denotes the length of a code over

R

. We first study and analyze a distance conserving map defined over the ring

R

into the length-

3

DNA sequences. Then, we derive some conditions on the generator matrix of a linear code over

R

, which leads to a DNA code with reversible, reversible-complement, homopolymer

2

-run-length, and

\frac{w}{3n}

-GC-content constraints for integer

w

(

0\leq w\leq 3n

). Finally, we propose a new construction of DNA codes using Reed-Muller type generator matrices. This allows us to obtain DNA codes with reversible, reversible-complement, homopolymer

2

-run-length, and

\frac{2}{3}

-GC-content constraints.Comment: This paper has been presented in IEEE Information Theory Workshop (ITW) 2022, Mumbai, INDI

arXiv.org e-Print Archive

Achievable Rates of Concatenated Codes in DNA Storage under Substitution Errors

Author: Lenz Andreas
Puchinger Sven
Welter Lorenz
Publication venue
Publication date: 30/04/2020
Field of study

In this paper, we study achievable rates of concatenated coding schemes over a deoxyribonucleic acid (DNA) storage channel. Our channel model incorporates the main features of DNA-based data storage. First, information is stored on many, short DNA strands. Second, the strands are stored in an unordered fashion inside the storage medium and each strand is replicated many times. Third, the data is accessed in an uncontrollable manner, i.e., random strands are drawn from the medium and received, possibly with errors. As one of our results, we show that there is a significant gap between the channel capacity and the achievable rate of a standard concatenated code in which one strand corresponds to an inner block. This is in fact surprising as for other channels, such as

q

-ary symmetric channels, concatenated codes are known to achieve the capacity. We further propose a modified concatenated coding scheme by combining several strands into one inner block, which allows to narrow the gap and achieve rates that are close to the capacity.Comment: Extended version of a paper submitted to International Symposium on Information Theory and Its Applications (ISITA) 202

arXiv.org e-Print Archive

Online Research Database In Technology

On Conflict Free DNA Codes

Author: Benerjee Krishna Gopal
Deb Sourav
Gupta Manish K
Publication venue
Publication date: 08/07/2019
Field of study

DNA storage has emerged as an important area of research. The reliability of DNA storage system depends on designing the DNA strings (called DNA codes) that are sufficiently dissimilar. In this work, we introduce DNA codes that satisfy a special constraint. Each codeword of the DNA code has a specific property that any two consecutive sub-strings of the DNA codeword will not be the same (a generalization of homo-polymers constraint). This is in addition to the usual constraints such as Hamming, reverse, reverse-complement and

GC

-content. We believe that the new constraint will help further in reducing the errors during reading and writing data into the synthetic DNA strings. We also present a construction (based on a variant of stochastic local search algorithm) to calculate the size of the DNA codes with all the above constraints, which improves the lower bounds from the existing literature, for some specific cases. Moreover, a recursive isometric map between binary vectors and DNA strings is proposed. Using the map and the well known binary codes we obtain few classes of DNA codes with all the constraints including the property that the constructed DNA codewords are free from the hairpin-like secondary structures.Comment: 12 pages, Draft (Table VI and Table VII are updated

arXiv.org e-Print Archive

In-Vitro Validated Methods for Encoding Digital Data in Deoxyribonucleic Acid (DNA)

Author: Andersen Tim
Dickinson George D.
Guerrero Jorge
Hughes William L.
Llewellyn Shoshanna
Mortuza Golam Md
Tobiason Michael D.
Zadegan Reza
Publication venue: 'IUScholarWorks'
Publication date: 21/04/2023
Field of study

Deoxyribonucleic acid (DNA) is emerging as an alternative archival memory technology. Recent advancements in DNA synthesis and sequencing have both increased the capacity and decreased the cost of storing information in de novo synthesized DNA pools. In this survey, we review methods for translating digital data to and/or from DNA molecules. An emphasis is placed on methods which have been validated by storing and retrieving real-world data via in-vitro experiments

Boise State University - ScholarWorks

Protecting the Future of Information: LOCO Coding With Error Detection for DNA Data Storage

Author: Hareedy Ahmed
İrimağzı Canberk
Uslan Yusuf
Publication venue
Publication date: 14/11/2023
Field of study

DNA strands serve as a storage medium for

4

-ary data over the alphabet

\{A,T,G,C\}

. DNA data storage promises formidable information density, long-term durability, and ease of replicability. However, information in this intriguing storage technology might be corrupted. Experiments have revealed that DNA sequences with long homopolymers and/or with low

GC

-content are notably more subject to errors upon storage. This paper investigates the utilization of the recently-introduced method for designing lexicographically-ordered constrained (LOCO) codes in DNA data storage. This paper introduces DNA LOCO (D-LOCO) codes, over the alphabet

\{A,T,G,C\}

with limited runs of identical symbols. These codes come with an encoding-decoding rule we derive, which provides affordable encoding-decoding algorithms. In terms of storage overhead, the proposed encoding-decoding algorithms outperform those in the existing literature. Our algorithms are readily reconfigurable. D-LOCO codes are intrinsically balanced, which allows us to achieve balancing over the entire DNA strand with minimal rate penalty. Moreover, we propose four schemes to bridge consecutive codewords, three of which guarantee single substitution error detection per codeword. We examine the probability of undetecting errors. We also show that D-LOCO codes are capacity-achieving and that they offer remarkably high rates at moderate lengths.Comment: 14 pages (double column), 3 figures, submitted to the IEEE Transactions on Molecular, Biological and Multi-scale Communications (TMBMC

arXiv.org e-Print Archive