Search CORE

813,558 research outputs found

Encoding DNA sequences by integer chaos game representation

Author: Yin Changchuan
Publication venue
Publication date: 19/12/2017
Field of study

DNA sequences are fundamental for encoding genetic information. The genetic information may not only be understood by symbolic sequences but also from the hidden signals inside the sequences. The symbolic sequences need to be transformed into numerical sequences so the hidden signals can be revealed by signal processing techniques. All current transformation methods encode DNA sequences into numerical values of the same length. These representations have limitations in the applications of genomic signal compression, encryption, and steganography. We propose an integer chaos game representation (iCGR) of DNA sequences and a lossless encoding method DNA sequences by the iCGR. In the iCGR method, a DNA sequence is represented by the iterated function of the nucleotides and their positions in the sequence. Then the DNA sequence can be uniquely encoded and recovered using three integers from iCGR. One integer is the sequence length and the other two integers represent the accumulated distributions of nucleotides in the sequence. The integer encoding scheme can compress a DNA sequence by 2 bits per nucleotide. The integer representation of DNA sequences provides a prospective tool for sequence compression, encryption, and steganography. The Python programs in this study are freely available to the public at https://github.com/cyinbox/iCG

arXiv.org e-Print Archive

University of Illinois at Chicago: UIC INDIGO (INtellectual property in DIGital form available online in an Open environment)

Entropy concepts and DNA investigations

Author: Bell
Eigen
Elton
Gatlin
Gusev
Herzel
Herzel
Ivanitski
Kolmogorov
Li
Lipman
Lobzin
Nussinov
Olga V Kirillova
Rowe
Schmitt
Schmitt
Publication venue: 'Elsevier BV'
Publication date: 17/08/2000
Field of study

Topological and metric entropies of the DNA sequences from different organisms were calculated. Obtained results were compared each other and with ones of corresponding artificial sequences. For all envisaged DNA sequences there is a maximum of heterogeneity. It falls in the block length interval [5,7]. Maximum distinction between natural and artificial sequences is shifted on 1-3 position from the maximum of heterogeneity to the right as for metric as for topological entropy. This point on the specificity of real DNA sequences in the interval.Comment: 10 pages 7 figures submitted to PL

arXiv.org e-Print Archive

Crossref

Construction of a novel phagemid to produce custom DNA origami scaffolds.

Author: Aksel Tural
Douglas Shawn M
Nafisi Parsa M
Publication venue: eScholarship, University of California
Publication date: 01/01/2018
Field of study

DNA origami, a method for constructing nanoscale objects, relies on a long single strand of DNA to act as the 'scaffold' to template assembly of numerous short DNA oligonucleotide 'staples'. The ability to generate custom scaffold sequences can greatly benefit DNA origami design processes. Custom scaffold sequences can provide better control of the overall size of the final object and better control of low-level structural details, such as locations of specific base pairs within an object. Filamentous bacteriophages and related phagemids can work well as sources of custom scaffold DNA. However, scaffolds derived from phages require inclusion of multi-kilobase DNA sequences in order to grow in host bacteria, and those sequences cannot be altered or removed. These fixed-sequence regions constrain the design possibilities of DNA origami. Here, we report the construction of a novel phagemid, pScaf, to produce scaffolds that have a custom sequence with a much smaller fixed region of 393 bases. We used pScaf to generate new scaffolds ranging in size from 1512 to 10 080 bases and demonstrated their use in various DNA origami shapes and assemblies. We anticipate our pScaf phagemid will enhance development of the DNA origami method and its future applications

Crossref

eScholarship - University of California

Mapping the Space of Genomic Signatures

Author: Bryans Nathaniel
Dattani Nikesh S.
Davis Katelyn
Hill Kathleen A.
Karamichalis Rallis
Kari Lila
Sayem Abu S.
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 09/10/2014
Field of study

We propose a computational method to measure and visualize interrelationships among any number of DNA sequences allowing, for example, the examination of hundreds or thousands of complete mitochondrial genomes. An "image distance" is computed for each pair of graphical representations of DNA sequences, and the distances are visualized as a Molecular Distance Map: Each point on the map represents a DNA sequence, and the spatial proximity between any two points reflects the degree of structural similarity between the corresponding sequences. The graphical representation of DNA sequences utilized, Chaos Game Representation (CGR), is genome- and species-specific and can thus act as a genomic signature. Consequently, Molecular Distance Maps could inform species identification, taxonomic classifications and, to a certain extent, evolutionary history. The image distance employed, Structural Dissimilarity Index (DSSIM), implicitly compares the occurrences of oligomers of length up to

k

(herein

k=9

) in DNA sequences. We computed DSSIM distances for more than 5 million pairs of complete mitochondrial genomes, and used Multi-Dimensional Scaling (MDS) to obtain Molecular Distance Maps that visually display the sequence relatedness in various subsets, at different taxonomic levels. This general-purpose method does not require DNA sequence homology and can thus be used to compare similar or vastly different DNA sequences, genomic or computer-generated, of the same or different lengths. We illustrate potential uses of this approach by applying it to several taxonomic subsets: phylum Vertebrata, (super)kingdom Protista, classes Amphibia-Insecta-Mammalia, class Amphibia, and order Primates. This analysis of an extensive dataset confirms that the oligomer composition of full mtDNA sequences can be a source of taxonomic information.Comment: 14 pages, 7 figures. arXiv admin note: substantial text overlap with arXiv:1307.375

arXiv.org e-Print Archive

Directory of Open Access Journals

Information decomposition of symbolic sequences

Author: Adams
Almirantis
Audi
Benson
Benson
Chaley
Chechetkin
Cole
Conway
Coward
Dodin
Dodin
E.V. Korotkov
Fraser
Glaser
Grosse
Heringa
Herren
Hertz
Herzel
Jackson
Junker
Korotkova
Kullback
Lobzin
Lotman
M.A. Korotkova
Margot
Marple
McLachlan
N.A. Kudryashov
Ng
Pennisi
Presta
Rackovsky
Ramakrishna
Rashid
Silverman
Stoesser
Tiwari
Tomb
Trifonov
Venter
Voss
Wang
Weiss
Yaglom
Zirmunsky
Publication venue: 'Elsevier BV'
Publication date: 17/02/2003
Field of study

We developed a non-parametric method of Information Decomposition (ID) of a content of any symbolical sequence. The method is based on the calculation of Shannon mutual information between analyzed and artificial symbolical sequences, and allows the revealing of latent periodicity in any symbolical sequence. We show the stability of the ID method in the case of a large number of random letter changes in an analyzed symbolic sequence. We demonstrate the possibilities of the method, analyzing both poems, and DNA and protein sequences. In DNA and protein sequences we show the existence of many DNA and amino acid sequences with different types and lengths of latent periodicity. The possible origin of latent periodicity for different symbolical sequences is discussed.Comment: 18 pages, 8 figure

arXiv.org e-Print Archive

Crossref

University of Groningen

Tomato protoplast DNA transformation: physical linkage and recombination of exogenous DNA sequences

Author: A Crossway
A Deshayes
B Reiss
BR Thomas
C-L Hsiao
CP Meredith
DT Stinchcomb
EA Shahin
FA Krens
G Scangos
HC Birnboim
HE Ruley
I Potrykus
J Messing
J Paszkowski
Jacques Hille
JD Rochaix
K Struhl
K Wernars
KC Reed
KR Folger
M Koornneef
M Perucho
Maarten Jongsma
Maarten Koornneef
MG Murray
MW Bevan
O Smithies
P Zabel
Pim Zabel
PW Rigby
R Hain
R Peerbolte
RA Anderson
RD Shillito
S Kato
SN Cohen
T Maniatis
TL Adams
Publication venue
Publication date: 01/01/1987
Field of study

Tomato protoplasts have been transformed with plasmid DNA's, containing a chimeric kanamycin resistance gene and putative tomato origins of replication. A calcium phosphate-DNA mediated transformation procedure was employed in combination with either polyethylene glycol or polyvinyl alcohol. There were no indications that the tomato DNA inserts conferred autonomous replication on the plasmids. Instead, Southern blot hybridization analysis of seven kanamycin resistant calli revealed the presence of at least one kanamycin resistance locus per transformant integrated in the tomato nuclear DNA. Generally one to three truncated plasmid copies were found integrated into the tomato nuclear DNA, often physically linked to each other. For one transformant we have been able to use the bacterial ampicillin resistance marker of the vector plasmid pUC9 to 'rescue' a recombinant plasmid from the tomato genome. Analysis of the foreign sequences included in the rescued plasmid showed that integration had occurred in a non-repetitive DNA region. Calf-thymus DNA, used as a carrier in transformation procedure, was found to be covalently linked to plasmid DNA sequences in the genomic DNA of one transformant. A model is presented describing the fate of exogenously added DNA during the transformation of a plant cell. The results are discussed in reference to the possibility of isolating DNA sequences responsible for autonomous replication in tomato.

Crossref

University of Groningen

ARTS repository - University of Groningen

Wageningen University & Research Publications

University of Groningen Digital Archive

Investigations into the molecular effects of single nucleotide polymorphism

Author: Lohrer Horst D.
Tangen Uwe
Publication venue: 'S. Karger AG'
Publication date: 01/01/2000
Field of study

Objectives: DNA sequences are very rich in short repeats and their pattern can be altered by point mutations. We wanted to investigate the effect of single nucleotide polymorphism (SNP) on the pattern of short DNA repeats and its biological consequences. Methods: Analysis of the pattern of short DNA repeats of the Thy-1 sequence with and without SNP. Searching for DNA-binding factors in any region of significance. Results: Comparing the pattern of short repeats in the Thy-1 gene sequences of Turkish patients with ataxia telangiectasia (AT) with the `wild type' sequence from the DNA database, we identified a missing 8-bp repeat element due to an SNP in position 1271 (intron II) in AT-DNA sequences. Only the mutated sequence had the potential for the formation of a stem loop in DNA or pre-mRNA. In super-shift experiments we found that DNA oligomers covering the area of this SNP formed a complex with proteins amongst which we identified the proliferating cell nuclear antigen (PCNA) protein. Conclusion: SNPs have the potential to alter DNA or pre-mRNA conformation. Although no SNP-depeding formation of the DNA-protein complex was evident, future investigations could reveal differential molecular mechanisms of cellular regulation. Copyright (C) 2001 S. Karger AG, Basel

Fraunhofer-Publica

Open Access LMU ( Ludwig-Maximilians-Univ. München)

Google matrix analysis of DNA sequences

Author: Kandiah Vivek
Shepelyansky Dima L.
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 08/01/2013
Field of study

For DNA sequences of various species we construct the Google matrix G of Markov transitions between nearby words composed of several letters. The statistical distribution of matrix elements of this matrix is shown to be described by a power law with the exponent being close to those of outgoing links in such scale-free networks as the World Wide Web (WWW). At the same time the sum of ingoing matrix elements is characterized by the exponent being significantly larger than those typical for WWW networks. This results in a slow algebraic decay of the PageRank probability determined by the distribution of ingoing elements. The spectrum of G is characterized by a large gap leading to a rapid relaxation process on the DNA sequence networks. We introduce the PageRank proximity correlator between different species which determines their statistical similarity from the view point of Markov chains. The properties of other eigenstates of the Google matrix are also discussed. Our results establish scale-free features of DNA sequence networks showing their similarities and distinctions with the WWW and linguistic networks.Comment: latex, 11 fig

arXiv.org e-Print Archive

Crossref

Directory of Open Access Journals

PubMed Central

HAL-INSA Toulouse

HAL: Hyper Article en Ligne

The Francis Crick Institute