Search CORE

37,932 research outputs found

Coding over Sets for DNA Storage

Author: Lenz Andreas
Siegel Paul H.
Wachter-Zeh Antonia
Yaakobi Eitan
Publication venue
Publication date: 09/05/2018
Field of study

In this paper, we study error-correcting codes for the storage of data in synthetic deoxyribonucleic acid (DNA). We investigate a storage model where data is represented by an unordered set of

M

sequences, each of length

L

. Errors within that model are losses of whole sequences and point errors inside the sequences, such as substitutions, insertions and deletions. We propose code constructions which can correct these errors with efficient encoders and decoders. By deriving upper bounds on the cardinalities of these codes using sphere packing arguments, we show that many of our codes are close to optimal.Comment: 5 page

arXiv.org e-Print Archive

Crossref

Reconstruction Codes for DNA Sequences with Uniform Tandem-Duplication Errors

Author: Schwartz Moshe
Yehezkeally Yonatan
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 05/09/2019
Field of study

DNA as a data storage medium has several advantages, including far greater data density compared to electronic media. We propose that schemes for data storage in the DNA of living organisms may benefit from studying the reconstruction problem, which is applicable whenever multiple reads of noisy data are available. This strategy is uniquely suited to the medium, which inherently replicates stored data in multiple distinct ways, caused by mutations. We consider noise introduced solely by uniform tandem-duplication, and utilize the relation to constant-weight integer codes in the Manhattan metric. By bounding the intersection of the cross-polytope with hyperplanes, we prove the existence of reconstruction codes with greater capacity than known error-correcting codes, which we can determine analytically for any set of parameters.Comment: 11 pages, 2 figures, Latex; version accepted for publicatio

arXiv.org e-Print Archive

Crossref

The impact of different DNA extraction kits and laboratories upon the assessment of human gut microbiota composition by 16S rRNA gene sequencing

Author: A Salonen
AL McOrist
Alan W. Walker
AW Walker
B Lawley
B Smith
C Manichanh
C Milani
C Quince
C Ramirez-Farias
CA Lozupone
Charlie W. Lees
CL Lauber
EG Zoetendal
EG Zoetendal
EK Costello
Freda M. Farquarson
G Musso
GD Wu
Georgina L. Hold
Harry J. Flint
HJ Flint
I Letunic
I Sekirov
J Maukonen
J Qin
Jack Satsangi
JC Yue
JM Nechvatal
John M. Thomson
Julian Parkhill
M Simrén
MW Ariefdjohan
N Fierer
Nicholas A. Kennedy
PB Eckburg
PD Schloss
Petra Louis
RB Sartor
RE Ley
RI Amann
S Claassen
S Persson
S Yuan
Susan H. Berry
Sylvia H. Duncan
WB Whitman
Yolanda Sanz
Z Yu
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 24/02/2014
Field of study

Peer reviewedPublisher PD

Aberdeen University Research

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Edinburgh Research Explorer

The University of Manchester - Institutional Repository

FigShare

Exploring single-sample SNP and INDEL calling with whole-genome de novo assembly

Author: Depristo
Durbin
Gingeras
H. Li
Homer
Idury
Iqbal
Lam
Levy
Myers
Myers
Myers
Peltola
Pevzner
Staden
Zerbino
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2012
Field of study

Motivation: Eugene Myers in his string graph paper (Myers, 2005) suggested that in a string graph or equivalently a unitig graph, any path spells a valid assembly. As a string/unitig graph also encodes every valid assembly of reads, such a graph, provided that it can be constructed correctly, is in fact a lossless representation of reads. In principle, every analysis based on whole-genome shotgun sequencing (WGS) data, such as SNP and insertion/deletion (INDEL) calling, can also be achieved with unitigs. Results: To explore the feasibility of using de novo assembly in the context of resequencing, we developed a de novo assembler, fermi, that assembles Illumina short reads into unitigs while preserving most of information of the input reads. SNPs and INDELs can be called by mapping the unitigs against a reference genome. By applying the method on 35-fold human resequencing data, we showed that in comparison to the standard pipeline, our approach yields similar accuracy for SNP calling and better results for INDEL calling. It has higher sensitivity than other de novo assembly based methods for variant calling. Our work suggests that variant calling with de novo assembly be a beneficial complement to the standard variant calling pipeline for whole-genome resequencing. In the methodological aspects, we proposed FMD-index for forward-backward extension of DNA sequences, a fast algorithm for finding all super-maximal exact matches and one-pass construction of unitigs from an FMD-index. Availability: http://github.com/lh3/fermi Contact: [email protected]: Rev2: submitted version with minor improvements; 7 page

arXiv.org e-Print Archive

CiteSeerX

Crossref