Search CORE

4 research outputs found

Hypercomplex cross-correlation of DNA sequences

Author: Li Yajing
Shu Jian-Jun
Publication venue: 'World Scientific Pub Co Pte Lt'
Publication date: 28/02/2014
Field of study

A hypercomplex representation of DNA is proposed to facilitate comparing DNA sequences with fuzzy composition. With the hypercomplex number representation, the conventional sequence analysis method, such as, dot matrix analysis, dynamic programming, and cross-correlation method have been extended and improved to align DNA sequences with fuzzy composition. The hypercomplex dot matrix analysis can provide more control over the degree of alignment desired. A new scoring system has been proposed to accommodate the hypercomplex number representation of DNA and integrated with dynamic programming alignment method. By using hypercomplex cross-correlation, the match and mismatch alignment information between two aligned DNA sequences are separately stored in the resultant real part and imaginary parts respectively. The mismatch alignment information is very useful to refine consensus sequence based motif scanning

arXiv.org e-Print Archive

Hierarchical structure of cascade of primary and secondary periodicities in Fourier power spectrum of alphoid higher order repeats

Author: A Arneodo
A Arneodo
A Puente de la
A Som
A Weiss
AK Brodzik
AL Jorgensen
AM Lynn
AR Fuentes
B Borštnik
B Haubold
BD Silverman
BR Kim
C Lee
C Tyler-Smith
C Yin
CA Chatzidimitriou-Dreismann
CA Chatzidimitriou-Dreismann
CC Yin
CK Peng
CK Peng
D Anastassiou
D Holste
D Kotlar
D Larhammar
D Sharma
DC Benson
DD Mauresan
DG Arques
E Coward
E Coward
E Pizzi
EA Cleever
EN Trifonov
EN Trifonov
EPC Rocha
EV Korotkov
EV Korotkov
G Bernardi
G Dodin
GI Kutuzova
H Herzel
H Herzel
H Herzel
HE Stanley
HE Stanley
I Dunham
IA Alexandrov
Ivan Basar
J Felsenstein
J Gao
J Jin
J Widom
JH Jackson
JM Gutierez
JS Waye
JS Waye
JW Fickett
JW Fickett
KHA Cho
L Du
L Manuelidis
LQ Zhou
LY Romanova
M Rosandić
M Rosandić
M Sousa Vieira de
Marija Rosandić
Matko Glunčić
MK Rudd
MQ Zhang
MY Azbel
N Bouayanaya
N Nagai
Nenad Pavin
Nils Paar
P Bernaola-Galvan
P Bernaola-Galvan
PE Warburton
PG Pop
PP Vaidyanathan
PV O'Neil
R Gupta
R Hall
R Ramakrishna
R Wevrick
R Wevrick
R Zhang
RF Voss
S Guharay
S Karlin
S Nee
S Tiwari
SA Aghili
SV Buldyrev
SV Buldyrev
T Haaf
TR Gregory
TT Tran
V Afreixo
V Paar
V Paar
V Paar
VA Emanuele
Vladimir Paar
VP Turutina
VR Chechetkin
VR Chechetkin
VR Chechetkin
VR Chechetkin
VR Chechetkin
VV Lobzin
VV Pradbu
W Lee
W Li
W Li
W Li
YX Tian
Z-G Yu
Z-G Yu
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background Identification of approximate tandem repeats is an important task of broad significance and still remains a challenging problem of computational genomics. Often there is no single best approach to periodicity detection and a combination of different methods may improve the prediction accuracy. Discrete Fourier transform (DFT) has been extensively used to study primary periodicities in DNA sequences. Here we investigate the application of DFT method to identify and study alphoid higher order repeats. Results We used method based on DFT with mapping of symbolic into numerical sequence to identify and study alphoid higher order repeats (HOR). For HORs the power spectrum shows equidistant frequency pattern, with characteristic two-level hierarchical organization as signature of HOR. Our case study was the 16 mer HOR tandem in AC017075.8 from human chromosome 7. Very long array of equidistant peaks at multiple frequencies (more than a thousand higher harmonics) is based on fundamental frequency of 16 mer HOR. Pronounced subset of equidistant peaks is based on multiples of the fundamental HOR frequency (multiplication factor <it>n </it>for <it>n</it>mer) and higher harmonics. In general, <it>n</it>mer HOR-pattern contains equidistant secondary periodicity peaks, having a pronounced subset of equidistant primary periodicity peaks. This hierarchical pattern as signature for HOR detection is robust with respect to monomer insertions and deletions, random sequence insertions etc. For a monomeric alphoid sequence only primary periodicity peaks are present. The 1/<it>f</it><it>β </it>– noise and periodicity three pattern are missing from power spectra in alphoid regions, in accordance with expectations. Conclusion DFT provides a robust detection method for higher order periodicity. Easily recognizable HOR power spectrum is characterized by hierarchical two-level equidistant pattern: higher harmonics of the fundamental HOR-frequency (secondary periodicity) and a subset of pronounced peaks corresponding to constituent monomers (primary periodicity). The number of lower frequency peaks (secondary periodicity) below the frequency of the first primary periodicity peak reveals the size of <it>n</it>mer HOR, i.e., the number <it>n </it>of monomers contained in consensus HOR.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

MPG.PuRe

Fast Fourier Transform-Based Correlation Of Dna-Sequences Using Complex-Plane Encoding

Author: Cheever Erik Allen, , \u2782
Overton G. C.
Searls D. B.
Publication venue: 'Transformative Works and Cultures'
Publication date: 01/04/1991
Field of study

The detection of similarities between DNA sequences can be accomplished using the signal-processing technique of cross-correlation. An early method used the fast Fourier transform (FFT) to perform correlations on DNA sequences in O(n log n) time for any length sequence. However, this method requires many FFTs (nine), runs no faster if one sequence is much shorter than the other, and measures only global similarity, so that significant short local matches may be missed. We report that, through the use of alternative encodings of the DNA sequence in the complex plane, the number of FFTs performed can be traded off against (i) signal-to-noise ratio, and (ii) a certain degree of filtering for local similarity via k-tuple correlation. Also, when comparing probe sequences against much longer targets, the algorithm can be sped up by decomposing the target and performing multiple small FFTs in an overlap-save arrangement. Finally, by decomposing the probe sequence as well, the detection of local similarities can be further enhanced. With current advances in extremly fast hardware implementations of signal-processing operations, this approach may prove more practical than heretofore

Works

Recommended from our members

Algorithms for string matching with applications in molecular biology

Author: Holloway James Lee
Publication venue: 'Oregon State University'
Publication date
Field of study

As the volume of genetic sequence data increases due to improved sequencing techniques and increased interest, the computational tools available to analyze the data are becoming inadequate. This thesis seeks to improve a few of the computational methods available to access and analyze data in the genetic sequence databases. The first two results are parallel algorithms based on previously known sequential algorithms. The third result is a new approach, based on assumptions that we believe make sense in the biological context of the problem, to approximating an NP complete problem. The final result is a fundamentally new approach to approximate string matching using the divide and conquer paradigm instead of the dynamic programming approach that has been used almost exclusively in the past. Dynamic programming algorithms to measure the distance between sequences have been known since at least 1972. Recently there has been interest in developing parallel algorithms to measure the distance between two sequences. We have developed an optimal parallel algorithm to find the edit distance, a metric frequently used to measure distance, between two sequences. It is often interesting to find the substrings of length k that appear most frequently in a given string. We give a simple sequential algorithm to solve this problem and an efficient parallel version of the algorithm. The parallel algorithm uses an efficient novel parallel bucket sort. When sequencing a large segment of DNA, the original DNA sequence is reconstructed using the results of sequencing fragments, that may or may not contain errors, of many copies of the original DNA. New algorithms are given to solve the problem of reconstructing the original DNA sequence with and without errors introduced into the fragments. A program based on this algorithm is used to reconstruct the human beta globin region (HUMHBB) when given a set of 300 to 500 mers drawn randomly from the HUMHBB region. Approximate string matching is used in a biological context to model the steps of evolution. While such evolution may proceed base by base using the change, insert, or delete operators, there is also evidence that whole genes may be moved or inverted. We introduce a new problem, the string to string rearrangement problem, that allows movement and inversion of substrings. We give a divide and conquer algorithm for finding a rearrangement of one string within another

ScholarsArchive@OSU