20,437 research outputs found
Coding over Sets for DNA Storage
In this paper, we study error-correcting codes for the storage of data in
synthetic deoxyribonucleic acid (DNA). We investigate a storage model where
data is represented by an unordered set of sequences, each of length .
Errors within that model are losses of whole sequences and point errors inside
the sequences, such as substitutions, insertions and deletions. We propose code
constructions which can correct these errors with efficient encoders and
decoders. By deriving upper bounds on the cardinalities of these codes using
sphere packing arguments, we show that many of our codes are close to optimal.Comment: 5 page
Malleable Coding with Fixed Reuse
In cloud computing, storage area networks, remote backup storage, and similar
settings, stored data is modified with updates from new versions. Representing
information and modifying the representation are both expensive. Therefore it
is desirable for the data to not only be compressed but to also be easily
modified during updates. A malleable coding scheme considers both compression
efficiency and ease of alteration, promoting codeword reuse. We examine the
trade-off between compression efficiency and malleability cost-the difficulty
of synchronizing compressed versions-measured as the length of a reused prefix
portion. Through a coding theorem, the region of achievable rates and
malleability is expressed as a single-letter optimization. Relationships to
common information problems are also described
Conserved noncoding sequences highlight shared components of regulatory networks in dicotyledonous plants
Conserved noncoding sequences (CNSs) in DNA are reliable pointers to regulatory elements controlling gene expression. Using a comparative genomics approach with four dicotyledonous plant species (Arabidopsis thaliana, papaya [Carica papaya], poplar [Populus trichocarpa], and grape [Vitis vinifera]), we detected hundreds of CNSs upstream of Arabidopsis genes. Distinct positioning, length, and enrichment for transcription factor binding sites suggest these CNSs play a functional role in transcriptional regulation. The enrichment of transcription factors within the set of genes associated with CNS is consistent with the hypothesis that together they form part of a conserved transcriptional network whose function is to regulate other transcription factors and control development. We identified a set of promoters where regulatory mechanisms are likely to be shared between the model organism Arabidopsis and other dicots, providing areas of focus for further research
Investigation of the length distributions of coding and noncoding sequences in relation to gene architecture, function, and expression
The last 20 years has seen the birth of bioinformatics, and is defined as the combination of mathematics, biology, and computational approaches. This discipline has led to the era of ontology, extensive databases including sequences, structures, expression profiles, and genomes and database cross-referencing, (Ouzounis, 2012). Before this discipline, scientists referenced atlas books, such as Margret Dayhoff’s protein sequence collection (Strasser, 2010) which required long hours of letter counting. Through the development of sequencing technology over the past forty years, a tremendous amount of genomic sequencing data has already been collected. With a surge of such data increasing, so does the challenges of data organisation, accessibility and interpretation, with interpretation being the most challenging (Ouzounis, 2012)
- …