5 research outputs found

    Encoding DNA sequences by integer chaos game representation

    Full text link
    DNA sequences are fundamental for encoding genetic information. The genetic information may not only be understood by symbolic sequences but also from the hidden signals inside the sequences. The symbolic sequences need to be transformed into numerical sequences so the hidden signals can be revealed by signal processing techniques. All current transformation methods encode DNA sequences into numerical values of the same length. These representations have limitations in the applications of genomic signal compression, encryption, and steganography. We propose an integer chaos game representation (iCGR) of DNA sequences and a lossless encoding method DNA sequences by the iCGR. In the iCGR method, a DNA sequence is represented by the iterated function of the nucleotides and their positions in the sequence. Then the DNA sequence can be uniquely encoded and recovered using three integers from iCGR. One integer is the sequence length and the other two integers represent the accumulated distributions of nucleotides in the sequence. The integer encoding scheme can compress a DNA sequence by 2 bits per nucleotide. The integer representation of DNA sequences provides a prospective tool for sequence compression, encryption, and steganography. The Python programs in this study are freely available to the public at https://github.com/cyinbox/iCG

    On the DNA of eleven mammals

    Get PDF
    This paper studies the DNA code of eleven mammals from the perspective of fractional dynamics. The application of Fourier transform and power law trendlines leads to a categorical representation of species and chromosomes. The DNA information reveals long range memory characteristics

    DNA Sequence Representation by Use of Statistical Finite Automata

    Get PDF
    This project defines and intends to solve the problem of representing information carried by DNA sequences in terms of amino acids, through application of the theory of finite automata. Sequences can be compared against each other to find existing patterns, if any, which may include important genetic information. Comparison can state whether the DNA sequences belong to the same, related or entirely different species in the ‘Tree of Life’ (phylogeny). This is achieved by using extended and statistical finite automata. In order to solve this problem, the concepts of automata and their extension, i.e. Alergia algorithm have been used. In this specific case, we have used the chemical property - polarity of amino acids to analyze the DNA sequences

    The use of numerical representations in processing of nucleotide sequences

    Get PDF
    Převod sekvencí DNA na vhodnou reprezentaci je důležitou úkolem před samotným započetím analýzy a dalšího zpracování. Hlavním úkolem této práce bylo se seznámit s typy numerických a grafických reprezentací a jejich využitím pro analýzu DNA. Vzhledem k velkému množství metod a postupů, byly do této práce vybrány pouze některé. Některé metody nelze přímo klasifikovat jako numerické nebo grafické, protože obsahují možnost obojí reprezentace. Tyto metody byly zařazeny mezi grafické reprezentace. Pro vybrané metody byly vytvořeny fylogenetické stromy pro porovnání přesnosti. Závěrem práce je zhodnocení získaných výsledků.Conversion of DNA sequences for appropriate representation is important task before initiation of analyzes and further processing. The main goal of this work was to get familiar with types of numerical and graphical representations and their application for DNA analyzes. In consideration of great volume of methods and procedures, only a few were chosen for this work. Some methods can not be classified only as numerical or graphical representations, because of option allowing them to be converted into both of these representations. These methods were classified as graphical representations. Phylogenetic trees were programmed for chosen methods to compare its precision. Outcome of this work is summing up the results.

    Human Promoter Prediction Using DNA Numerical Representation

    Get PDF
    With the emergence of genomic signal processing, numerical representation techniques for DNA alphabet set {A, G, C, T} play a key role in applying digital signal processing and machine learning techniques for processing and analysis of DNA sequences. The choice of the numerical representation of a DNA sequence affects how well the biological properties can be reflected in the numerical domain for the detection and identification of the characteristics of special regions of interest within the DNA sequence. This dissertation presents a comprehensive study of various DNA numerical and graphical representation methods and their applications in processing and analyzing long DNA sequences. Discussions on the relative merits and demerits of the various methods, experimental results and possible future developments have also been included. Another area of the research focus is on promoter prediction in human (Homo Sapiens) DNA sequences with neural network based multi classifier system using DNA numerical representation methods. In spite of the recent development of several computational methods for human promoter prediction, there is a need for performance improvement. In particular, the high false positive rate of the feature-based approaches decreases the prediction reliability and leads to erroneous results in gene annotation.To improve the prediction accuracy and reliability, DigiPromPred a numerical representation based promoter prediction system is proposed to characterize DNA alphabets in different regions of a DNA sequence.The DigiPromPred system is found to be able to predict promoters with a sensitivity of 90.8% while reducing false prediction rate for non-promoter sequences with a specificity of 90.4%. The comparative study with state-of-the-art promoter prediction systems for human chromosome 22 shows that our proposed system maintains a good balance between prediction accuracy and reliability. To reduce the system architecture and computational complexity compared to the existing system, a simple feed forward neural network classifier known as SDigiPromPred is proposed. The SDigiPromPred system is found to be able to predict promoters with a sensitivity of 87%, 87%, 99% while reducing false prediction rate for non-promoter sequences with a specificity of 92%, 94%, 99% for Human, Drosophila, and Arabidopsis sequences respectively with reconfigurable capability compared to existing system
    corecore