Search CORE

20,437 research outputs found

Coding over Sets for DNA Storage

Author: Lenz Andreas
Siegel Paul H.
Wachter-Zeh Antonia
Yaakobi Eitan
Publication venue
Publication date: 09/05/2018
Field of study

In this paper, we study error-correcting codes for the storage of data in synthetic deoxyribonucleic acid (DNA). We investigate a storage model where data is represented by an unordered set of

M

sequences, each of length

L

. Errors within that model are losses of whole sequences and point errors inside the sequences, such as substitutions, insertions and deletions. We propose code constructions which can correct these errors with efficient encoders and decoders. By deriving upper bounds on the cardinalities of these codes using sphere packing arguments, we show that many of our codes are close to optimal.Comment: 5 page

arXiv.org e-Print Archive

Crossref

Malleable Coding with Fixed Reuse

Author: Goyal Vivek K
Kusuma Julius
Varshney Lav R.
Publication venue
Publication date: 01/01/2011
Field of study

In cloud computing, storage area networks, remote backup storage, and similar settings, stored data is modified with updates from new versions. Representing information and modifying the representation are both expensive. Therefore it is desirable for the data to not only be compressed but to also be easily modified during updates. A malleable coding scheme considers both compression efficiency and ease of alteration, promoting codeword reuse. We examine the trade-off between compression efficiency and malleability cost-the difficulty of synchronizing compressed versions-measured as the length of a reused prefix portion. Through a coding theorem, the region of achievable rates and malleability is expressed as a single-letter optimization. Relationships to common information problems are also described

arXiv.org e-Print Archive

Boston University Institutional Repository (OpenBU)

Conserved noncoding sequences highlight shared components of regulatory networks in dicotyledonous plants

Author: Barrington Christopher
Baxter Laura
Beynon Jim
Buchanan-Wollaston Vicky
Denby Katherine J.
Dyer Nigel
Hickman R. D. G.
Jironkin Aleksey
Krusche Peter
Moore Jonathan D.
Ott Sascha
Tiskin Alexander
Publication venue: 'American Society of Plant Biologists (ASPB)'
Publication date: 01/10/2012
Field of study

Conserved noncoding sequences (CNSs) in DNA are reliable pointers to regulatory elements controlling gene expression. Using a comparative genomics approach with four dicotyledonous plant species (Arabidopsis thaliana, papaya [Carica papaya], poplar [Populus trichocarpa], and grape [Vitis vinifera]), we detected hundreds of CNSs upstream of Arabidopsis genes. Distinct positioning, length, and enrichment for transcription factor binding sites suggest these CNSs play a functional role in transcriptional regulation. The enrichment of transcription factors within the set of genes associated with CNS is consistent with the hypothesis that together they form part of a conserved transcriptional network whose function is to regulate other transcription factors and control development. We identified a set of promoters where regulatory mechanisms are likely to be shared between the model organism Arabidopsis and other dicots, providing areas of focus for further research

Crossref

PubMed Central

Warwick Research Archives Portal Repository

Investigation of the length distributions of coding and noncoding sequences in relation to gene architecture, function, and expression

Author: Caldwell Rachel Amber
Publication venue: School of Biological Sciences and School of Mathematics and Applied Statistics
Publication date: 01/01/2015
Field of study

The last 20 years has seen the birth of bioinformatics, and is defined as the combination of mathematics, biology, and computational approaches. This discipline has led to the era of ontology, extensive databases including sequences, structures, expression profiles, and genomes and database cross-referencing, (Ouzounis, 2012). Before this discipline, scientists referenced atlas books, such as Margret Dayhoff’s protein sequence collection (Strasser, 2010) which required long hours of letter counting. Through the development of sequencing technology over the past forty years, a tremendous amount of genomic sequencing data has already been collected. With a surge of such data increasing, so does the challenges of data organisation, accessibility and interpretation, with interpretation being the most challenging (Ouzounis, 2012)

Research Online