4 research outputs found

    A Novel Signal Processing Measure to Identify Exact and Inexact Tandem Repeat Patterns in DNA Sequences

    Get PDF
    The identification and analysis of repetitive patterns are active areas of biological and computational research. Tandem repeats in telomeres play a role in cancer and hypervariable trinucleotide tandem repeats are linked to over a dozen major neurodegenerative genetic disorders. In this paper, we present an algorithm to identify the exact and inexact repeat patterns in DNA sequences based on orthogonal exactly periodic subspace decomposition technique. Using the new measure our algorithm resolves the problems like whether the repeat pattern is of period P or its multiple (i.e., 2P, 3P, etc.), and several other problems that were present in previous signal-processing-based algorithms. We present an efficient algorithm of O(NLw logLw), where N is the length of DNA sequence and Lw is the window length, for identifying repeats. The algorithm operates in two stages. In the first stage, each nucleotide is analyzed separately for periodicity, and in the second stage, the periodic information of each nucleotide is combined together to identify the tandem repeats. Datasets having exact and inexact repeats were taken up for the experimental purpose. The experimental result shows the effectiveness of the approach

    Hierarchical structure of cascade of primary and secondary periodicities in Fourier power spectrum of alphoid higher order repeats

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Identification of approximate tandem repeats is an important task of broad significance and still remains a challenging problem of computational genomics. Often there is no single best approach to periodicity detection and a combination of different methods may improve the prediction accuracy. Discrete Fourier transform (DFT) has been extensively used to study primary periodicities in DNA sequences. Here we investigate the application of DFT method to identify and study alphoid higher order repeats.</p> <p>Results</p> <p>We used method based on DFT with mapping of symbolic into numerical sequence to identify and study alphoid higher order repeats (HOR). For HORs the power spectrum shows equidistant frequency pattern, with characteristic two-level hierarchical organization as signature of HOR. Our case study was the 16 mer HOR tandem in AC017075.8 from human chromosome 7. Very long array of equidistant peaks at multiple frequencies (more than a thousand higher harmonics) is based on fundamental frequency of 16 mer HOR. Pronounced subset of equidistant peaks is based on multiples of the fundamental HOR frequency (multiplication factor <it>n </it>for <it>n</it>mer) and higher harmonics. In general, <it>n</it>mer HOR-pattern contains equidistant secondary periodicity peaks, having a pronounced subset of equidistant primary periodicity peaks. This hierarchical pattern as signature for HOR detection is robust with respect to monomer insertions and deletions, random sequence insertions etc. For a monomeric alphoid sequence only primary periodicity peaks are present. The 1/<it>f</it><sup><it>β </it></sup>– noise and periodicity three pattern are missing from power spectra in alphoid regions, in accordance with expectations.</p> <p>Conclusion</p> <p>DFT provides a robust detection method for higher order periodicity. Easily recognizable HOR power spectrum is characterized by hierarchical two-level equidistant pattern: higher harmonics of the fundamental HOR-frequency (secondary periodicity) and a subset of pronounced peaks corresponding to constituent monomers (primary periodicity). The number of lower frequency peaks (secondary periodicity) below the frequency of the first primary periodicity peak reveals the size of <it>n</it>mer HOR, i.e., the number <it>n </it>of monomers contained in consensus HOR.</p

    Detecting short adjacent repeats in multiple sequences: a Bayesian approach.

    Get PDF
    Li, Qiwei.Thesis (M.Phil.)--Chinese University of Hong Kong, 2010.Includes bibliographical references (p. 75-85).Abstracts in English and Chinese.Abstract --- p.iAcknowledgement --- p.ivChapter 1 --- Introduction --- p.1Chapter 1.1 --- Repetitive DNA Sequence --- p.3Chapter 1.1.1 --- Definition and Categorization of Repeti- tive DNA Sequence --- p.3Chapter 1.1.2 --- Definition and Categorization of Tandem Repeats --- p.4Chapter 1.1.3 --- Definition and Categorization of Interspersed Repeats --- p.6Chapter 1.2 --- Research Significance --- p.7Chapter 1.3 --- Contributions --- p.9Chapter 1.4 --- Thesis Organization --- p.11Chapter 2 --- Literature Review and Overview of Our Method --- p.13Chapter 2.1 --- Existing Methods --- p.14Chapter 2.2 --- Overview of Our Method --- p.17Chapter 3 --- Theoretical Background --- p.22Chapter 3.1 --- Multinomial Distributions --- p.23Chapter 3.2 --- Dirichlet Distribution --- p.23Chapter 3.3 --- Metropolis-Hastings Sampling --- p.25Chapter 3.4 --- Gibbs Sampling --- p.26Chapter 4 --- Problem Description --- p.28Chapter 4.1 --- Generative Model --- p.29Chapter 4.1.1 --- Input Data R --- p.31Chapter 4.1.2 --- Parameters A (Repeat Segment Starting Positions) --- p.32Chapter 4.1.3 --- Parameters S (Repeat Segment Structures) --- p.33Chapter 4.1.4 --- Parameters θ(Motif Matrix) --- p.35Chapter 4.1.5 --- Parameters Φ (Background Distribution) . --- p.36Chapter 4.1.6 --- An Example of the Model Schematic Di- agram --- p.37Chapter 4.2 --- Parameter Structure --- p.38Chapter 4.3 --- Posterior Distribution --- p.40Chapter 4.3.1 --- The Full Posterior Distribution --- p.41Chapter 4.3.2 --- The Collapsed Posterior Distribution --- p.42Chapter 4.4 --- Conclusion --- p.43Chapter 5 --- Methodology --- p.45Chapter 5.1 --- Schematic Procedure --- p.46Chapter 5.1.1 --- The Basic Schematic Procedure --- p.46Chapter 5.1.2 --- The Improved Schematic Procedure --- p.47Chapter 5.2 --- Initialization --- p.49Chapter 5.3 --- Predictive Update Step for θn and Φn --- p.50Chapter 5.4 --- Gibbs Sampling Step for an --- p.50Chapter 5.5 --- Metropolis-Hastings Sampling Step for sn --- p.51Chapter 5.5.1 --- Rear Indel Move --- p.53Chapter 5.5.2 --- Partial Shift Move --- p.56Chapter 5.5.3 --- Front Indel Move --- p.56Chapter 5.6 --- Phase Shifts --- p.57Chapter 5.7 --- Conclusion --- p.58Chapter 6 --- Results and Discussion --- p.60Chapter 6.1 --- Settings --- p.61Chapter 6.2 --- Experiment on Synthetic Data --- p.63Chapter 6.3 --- Experiment on Real Data --- p.69Chapter 7 --- Conclusion and Future Work --- p.72Chapter 7.1 --- Conclusion --- p.72Chapter 7.2 --- Future Work --- p.74Bibliography --- p.7
    corecore