Search CORE

5 research outputs found

The capacity of some Pólya string models

Author: Bruck Jehoshua
Elishco Ohad
Farnoud Farzad
Schwartz Moshe
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/07/2016
Field of study

We study random string-duplication systems, called Pólya string models, motivated by certain random mutation processes in the genome of living organisms. Unlike previous works that study the combinatorial capacity of string-duplication systems, or peripheral properties such as symbol frequency, this work provides exact capacity or bounds on it, for several probabilistic models. In particular, we give the exact capacity of the random tandem-duplication system, and the end-duplication system, and bound the capacity of the complement tandem-duplication system. Interesting connections are drawn between the former and the beta distribution common to population genetics, as well as between the latter system and signatures of random permutations

Caltech Authors

Evolution of k-mer Frequencies and Entropy in Duplication and Substitution Mutation Systems

Author: Bruck Jehoshua
Farnoud (Hassanzadeh) Farzad
Lou Hao
Schwartz Moshe
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/05/2020
Field of study

Genomic evolution can be viewed as string-editing processes driven by mutations. An understanding of the statistical properties resulting from these mutation processes is of value in a variety of tasks related to biological sequence data, e.g., estimation of model parameters and compression. At the same time, due to the complexity of these processes, designing tractable stochastic models and analyzing them are challenging. In this paper, we study two kinds of systems, each representing a set of mutations. In the first system, tandem duplications and substitution mutations are allowed and in the other, interspersed duplications. We provide stochastic models and, via stochastic approximation, study the evolution of substring frequencies for these two systems separately. Specifically, we show that k-mer frequencies converge almost surely and determine the limit set. Furthermore, we present a method for finding upper bounds on entropy for such systems

Caltech Authors

Reconstruction Codes for DNA Sequences with Uniform Tandem-Duplication Errors

Author: Schwartz Moshe
Yehezkeally Yonatan
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 05/09/2019
Field of study

DNA as a data storage medium has several advantages, including far greater data density compared to electronic media. We propose that schemes for data storage in the DNA of living organisms may benefit from studying the reconstruction problem, which is applicable whenever multiple reads of noisy data are available. This strategy is uniquely suited to the medium, which inherently replicates stored data in multiple distinct ways, caused by mutations. We consider noise introduced solely by uniform tandem-duplication, and utilize the relation to constant-weight integer codes in the Manhattan metric. By bounding the intersection of the cross-polytope with hyperplanes, we prove the existence of reconstruction codes with greater capacity than known error-correcting codes, which we can determine analytically for any set of parameters.Comment: 11 pages, 2 figures, Latex; version accepted for publicatio

arXiv.org e-Print Archive

Crossref

Evolution of k-mer Frequencies and Entropy in Duplication and Substitution Mutation Systems

Author: Bruck Jehoshua
Farnoud (Hassanzadeh) Farzad
Lou Hao
Schwartz Moshe
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/05/2020
Field of study