813,558 research outputs found
Encoding DNA sequences by integer chaos game representation
DNA sequences are fundamental for encoding genetic information. The genetic
information may not only be understood by symbolic sequences but also from the
hidden signals inside the sequences. The symbolic sequences need to be
transformed into numerical sequences so the hidden signals can be revealed by
signal processing techniques. All current transformation methods encode DNA
sequences into numerical values of the same length. These representations have
limitations in the applications of genomic signal compression, encryption, and
steganography. We propose an integer chaos game representation (iCGR) of DNA
sequences and a lossless encoding method DNA sequences by the iCGR. In the iCGR
method, a DNA sequence is represented by the iterated function of the
nucleotides and their positions in the sequence. Then the DNA sequence can be
uniquely encoded and recovered using three integers from iCGR. One integer is
the sequence length and the other two integers represent the accumulated
distributions of nucleotides in the sequence. The integer encoding scheme can
compress a DNA sequence by 2 bits per nucleotide. The integer representation of
DNA sequences provides a prospective tool for sequence compression, encryption,
and steganography. The Python programs in this study are freely available to
the public at https://github.com/cyinbox/iCG
Entropy concepts and DNA investigations
Topological and metric entropies of the DNA sequences from different
organisms were calculated. Obtained results were compared each other and with
ones of corresponding artificial sequences. For all envisaged DNA sequences
there is a maximum of heterogeneity. It falls in the block length interval
[5,7].
Maximum distinction between natural and artificial sequences is shifted on
1-3 position from the maximum of heterogeneity to the right as for metric as
for topological entropy. This point on the specificity of real DNA sequences in
the interval.Comment: 10 pages 7 figures submitted to PL
Construction of a novel phagemid to produce custom DNA origami scaffolds.
DNA origami, a method for constructing nanoscale objects, relies on a long single strand of DNA to act as the 'scaffold' to template assembly of numerous short DNA oligonucleotide 'staples'. The ability to generate custom scaffold sequences can greatly benefit DNA origami design processes. Custom scaffold sequences can provide better control of the overall size of the final object and better control of low-level structural details, such as locations of specific base pairs within an object. Filamentous bacteriophages and related phagemids can work well as sources of custom scaffold DNA. However, scaffolds derived from phages require inclusion of multi-kilobase DNA sequences in order to grow in host bacteria, and those sequences cannot be altered or removed. These fixed-sequence regions constrain the design possibilities of DNA origami. Here, we report the construction of a novel phagemid, pScaf, to produce scaffolds that have a custom sequence with a much smaller fixed region of 393 bases. We used pScaf to generate new scaffolds ranging in size from 1512 to 10 080 bases and demonstrated their use in various DNA origami shapes and assemblies. We anticipate our pScaf phagemid will enhance development of the DNA origami method and its future applications
Mapping the Space of Genomic Signatures
We propose a computational method to measure and visualize interrelationships
among any number of DNA sequences allowing, for example, the examination of
hundreds or thousands of complete mitochondrial genomes. An "image distance" is
computed for each pair of graphical representations of DNA sequences, and the
distances are visualized as a Molecular Distance Map: Each point on the map
represents a DNA sequence, and the spatial proximity between any two points
reflects the degree of structural similarity between the corresponding
sequences. The graphical representation of DNA sequences utilized, Chaos Game
Representation (CGR), is genome- and species-specific and can thus act as a
genomic signature. Consequently, Molecular Distance Maps could inform species
identification, taxonomic classifications and, to a certain extent,
evolutionary history. The image distance employed, Structural Dissimilarity
Index (DSSIM), implicitly compares the occurrences of oligomers of length up to
(herein ) in DNA sequences. We computed DSSIM distances for more than
5 million pairs of complete mitochondrial genomes, and used Multi-Dimensional
Scaling (MDS) to obtain Molecular Distance Maps that visually display the
sequence relatedness in various subsets, at different taxonomic levels. This
general-purpose method does not require DNA sequence homology and can thus be
used to compare similar or vastly different DNA sequences, genomic or
computer-generated, of the same or different lengths. We illustrate potential
uses of this approach by applying it to several taxonomic subsets: phylum
Vertebrata, (super)kingdom Protista, classes Amphibia-Insecta-Mammalia, class
Amphibia, and order Primates. This analysis of an extensive dataset confirms
that the oligomer composition of full mtDNA sequences can be a source of
taxonomic information.Comment: 14 pages, 7 figures. arXiv admin note: substantial text overlap with
arXiv:1307.375
Information decomposition of symbolic sequences
We developed a non-parametric method of Information Decomposition (ID) of a
content of any symbolical sequence. The method is based on the calculation of
Shannon mutual information between analyzed and artificial symbolical
sequences, and allows the revealing of latent periodicity in any symbolical
sequence. We show the stability of the ID method in the case of a large number
of random letter changes in an analyzed symbolic sequence. We demonstrate the
possibilities of the method, analyzing both poems, and DNA and protein
sequences. In DNA and protein sequences we show the existence of many DNA and
amino acid sequences with different types and lengths of latent periodicity.
The possible origin of latent periodicity for different symbolical sequences is
discussed.Comment: 18 pages, 8 figure
Tomato protoplast DNA transformation: physical linkage and recombination of exogenous DNA sequences
Tomato protoplasts have been transformed with plasmid DNA's, containing a chimeric kanamycin resistance gene and putative tomato origins of replication. A calcium phosphate-DNA mediated transformation procedure was employed in combination with either polyethylene glycol or polyvinyl alcohol. There were no indications that the tomato DNA inserts conferred autonomous replication on the plasmids. Instead, Southern blot hybridization analysis of seven kanamycin resistant calli revealed the presence of at least one kanamycin resistance locus per transformant integrated in the tomato nuclear DNA. Generally one to three truncated plasmid copies were found integrated into the tomato nuclear DNA, often physically linked to each other. For one transformant we have been able to use the bacterial ampicillin resistance marker of the vector plasmid pUC9 to 'rescue' a recombinant plasmid from the tomato genome. Analysis of the foreign sequences included in the rescued plasmid showed that integration had occurred in a non-repetitive DNA region. Calf-thymus DNA, used as a carrier in transformation procedure, was found to be covalently linked to plasmid DNA sequences in the genomic DNA of one transformant. A model is presented describing the fate of exogenously added DNA during the transformation of a plant cell. The results are discussed in reference to the possibility of isolating DNA sequences responsible for autonomous replication in tomato.
Investigations into the molecular effects of single nucleotide polymorphism
Objectives: DNA sequences are very rich in short repeats and their pattern can be altered by point mutations. We wanted to investigate the effect of single nucleotide polymorphism (SNP) on the pattern of short DNA repeats and its biological consequences. Methods: Analysis of the pattern of short DNA repeats of the Thy-1 sequence with and without SNP. Searching for DNA-binding factors in any region of significance. Results: Comparing the pattern of short repeats in the Thy-1 gene sequences of Turkish patients with ataxia telangiectasia (AT) with the `wild type' sequence from the DNA database, we identified a missing 8-bp repeat element due to an SNP in position 1271 (intron II) in AT-DNA sequences. Only the mutated sequence had the potential for the formation of a stem loop in DNA or pre-mRNA. In super-shift experiments we found that DNA oligomers covering the area of this SNP formed a complex with proteins amongst which we identified the proliferating cell nuclear antigen (PCNA) protein. Conclusion: SNPs have the potential to alter DNA or pre-mRNA conformation. Although no SNP-depeding formation of the DNA-protein complex was evident, future investigations could reveal differential molecular mechanisms of cellular regulation. Copyright (C) 2001 S. Karger AG, Basel
Google matrix analysis of DNA sequences
For DNA sequences of various species we construct the Google matrix G of
Markov transitions between nearby words composed of several letters. The
statistical distribution of matrix elements of this matrix is shown to be
described by a power law with the exponent being close to those of outgoing
links in such scale-free networks as the World Wide Web (WWW). At the same time
the sum of ingoing matrix elements is characterized by the exponent being
significantly larger than those typical for WWW networks. This results in a
slow algebraic decay of the PageRank probability determined by the distribution
of ingoing elements. The spectrum of G is characterized by a large gap leading
to a rapid relaxation process on the DNA sequence networks. We introduce the
PageRank proximity correlator between different species which determines their
statistical similarity from the view point of Markov chains. The properties of
other eigenstates of the Google matrix are also discussed. Our results
establish scale-free features of DNA sequence networks showing their
similarities and distinctions with the WWW and linguistic networks.Comment: latex, 11 fig
- …
