20,437 research outputs found

    Coding over Sets for DNA Storage

    Full text link
    In this paper, we study error-correcting codes for the storage of data in synthetic deoxyribonucleic acid (DNA). We investigate a storage model where data is represented by an unordered set of MM sequences, each of length LL. Errors within that model are losses of whole sequences and point errors inside the sequences, such as substitutions, insertions and deletions. We propose code constructions which can correct these errors with efficient encoders and decoders. By deriving upper bounds on the cardinalities of these codes using sphere packing arguments, we show that many of our codes are close to optimal.Comment: 5 page

    Malleable Coding with Fixed Reuse

    Full text link
    In cloud computing, storage area networks, remote backup storage, and similar settings, stored data is modified with updates from new versions. Representing information and modifying the representation are both expensive. Therefore it is desirable for the data to not only be compressed but to also be easily modified during updates. A malleable coding scheme considers both compression efficiency and ease of alteration, promoting codeword reuse. We examine the trade-off between compression efficiency and malleability cost-the difficulty of synchronizing compressed versions-measured as the length of a reused prefix portion. Through a coding theorem, the region of achievable rates and malleability is expressed as a single-letter optimization. Relationships to common information problems are also described

    Conserved noncoding sequences highlight shared components of regulatory networks in dicotyledonous plants

    Get PDF
    Conserved noncoding sequences (CNSs) in DNA are reliable pointers to regulatory elements controlling gene expression. Using a comparative genomics approach with four dicotyledonous plant species (Arabidopsis thaliana, papaya [Carica papaya], poplar [Populus trichocarpa], and grape [Vitis vinifera]), we detected hundreds of CNSs upstream of Arabidopsis genes. Distinct positioning, length, and enrichment for transcription factor binding sites suggest these CNSs play a functional role in transcriptional regulation. The enrichment of transcription factors within the set of genes associated with CNS is consistent with the hypothesis that together they form part of a conserved transcriptional network whose function is to regulate other transcription factors and control development. We identified a set of promoters where regulatory mechanisms are likely to be shared between the model organism Arabidopsis and other dicots, providing areas of focus for further research

    Investigation of the length distributions of coding and noncoding sequences in relation to gene architecture, function, and expression

    Get PDF
    The last 20 years has seen the birth of bioinformatics, and is defined as the combination of mathematics, biology, and computational approaches. This discipline has led to the era of ontology, extensive databases including sequences, structures, expression profiles, and genomes and database cross-referencing, (Ouzounis, 2012). Before this discipline, scientists referenced atlas books, such as Margret Dayhoff’s protein sequence collection (Strasser, 2010) which required long hours of letter counting. Through the development of sequencing technology over the past forty years, a tremendous amount of genomic sequencing data has already been collected. With a surge of such data increasing, so does the challenges of data organisation, accessibility and interpretation, with interpretation being the most challenging (Ouzounis, 2012)
    • …
    corecore