Search CORE

5 research outputs found

Localizing triplet periodicity in DNA and cDNA sequences

Author: AA Tsonis
AWC Liew
D Anastassiou
DL Black
G Gutierrez
I Daubechies
J Epps
J Sanchez
J Tuqan
JK Pickrell
JP Mena-Chalco
K Okamura
Lincoln D Stein
Liya Wang
M Stanke
M Yan
R Lewis
S Tiwari
TP George
WG Fairbrother
WJ Kent
YT Chan
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background The protein-coding regions (coding exons) of a DNA sequence exhibit a triplet periodicity (TP) due to fact that coding exons contain a series of three nucleotide codons that encode specific amino acid residues. Such periodicity is usually not observed in introns and intergenic regions. If a DNA sequence is divided into small segments and a Fourier Transform is applied on each segment, a strong peak at frequency 1/3 is typically observed in the Fourier spectrum of coding segments, but not in non-coding regions. This property has been used in identifying the locations of protein-coding genes in unannotated sequence. The method is fast and requires no training. However, the need to compute the Fourier Transform across a segment (window) of arbitrary size affects the accuracy with which one can localize TP boundaries. Here, we report a technique that provides higher-resolution identification of these boundaries, and use the technique to explore the biological correlates of TP regions in the genome of the model organism <it>C. elegans</it>. Results Using both simulated TP signals and the real <it>C. elegans </it>sequence F56F11 as an example, we demonstrate that, (1) Modified Wavelet Transform (MWT) can better define the boundary of TP region than the conventional Short Time Fourier Transform (STFT); (2) The scale parameter (a) of MWT determines the precision of TP boundary localization: bigger values of a give sharper TP boundaries but result in a lower signal to noise ratio; (3) RNA splicing sites have weaker TP signals than coding region; (4) TP signals in coding region can be destroyed or recovered by frame-shift mutations; (5) 6 bp periodicities in introns and intergenic region can generate false positive signals and it can be removed with 6 bp MWT. Conclusions MWT can provide more precise TP boundaries than STFT and the boundaries can be further refined by bigger scale MWT. Subtraction of 6 bp periodicity signals reduces the number of false positives. Experimentally-introduced frame-shift mutations help recover TP signal that have been lost by possible ancient frame-shifts. More importantly, TP signal has the potential to be used to detect the splice junctions in fully spliced mRNA sequence.</p

Crossref

Cold Spring Harbor Laboratory Institutional Repository

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Identification of coding regions using DNA spectrogram analysis

Author: Spíchalová Barbora
Publication venue: Vysoké učení technické v Brně. Fakulta elektrotechniky a komunikačních technologií
Publication date: 01/01/2013
Field of study

Tato bakalářská práce se zabývá vyhledáváním kódujících úseků pomocí analýzy DNA spektrogramu. V teoretické části jsou popsány numerické reprezentace genomických dat, možnosti úprav sekvencí DNA a charakteristika metod pro vyhledávání kódujících úseků. Nejpoužívanější metodou pro zpracování DNA je diskrétní Fourierova transformace, díky které jsme schopni v sekvenci vyhledávat požadované úseky. Dále je uveden teoretický postup pro vytvoření spektrogramu a výčet vzorů z něj detekovatelných. Nabyté teoretické znalosti nám slouží k praktické realizaci konkrétních metod v programovém prostředí MATLAB. Vytvořili jsme program pro detekci kódujících úseků ze spektrogramu a nalezení jejich přesných pozic v sekvenci. Námi dosažené výsledky jsou v závěru porovnány s databází NCBI.The Bachelor’s Thesis deals with coding identification of coding regions using DNA spectrogram analysis. The theoretical part describes numerical representations of genomic data and methods for editing DNA sequences. The types of methods used for DNA spektrogram construction and characteristic patterns detected by spectrogram are described. The most used method for data processing is discrete Fourier transformation that enables us to scan sequences for required data. There is also a theoretical part about creating a spectrogram and a list of detected samples. Knowledge of this is used to program specific methods in Matlab. We created a program for detection of coding parts in Spectrogram and defining their accurate positions in the sequence. Acquired results are discussed and compared with the NCBI database at the end of this work.

Digital library of Brno University of Technology

National Repository of Grey Literature

Fourier transformation and spectrogram analysis of DNA sequences

Author: Krejčí Michal
Publication venue: Vysoké učení technické v Brně. Fakulta elektrotechniky a komunikačních technologií
Publication date: 01/01/2011
Field of study

V této diplomové práci jsou v teoretické části popsány metody úprav DNA sekvencí pro frekvenční analýzu a základní vlastnosti DNA. Využitím krátkodobé Fourierovy transformace jsou vytvořeny barevné spektrogramy, pomocí kterých můžeme rozpoznávat některé charakteristické vzory v DNA. V praktické části práce je popsán program sloužící k vytvoření spektrogramů a k následné analýze. Dále je vytvořena analýza vybraných úseků genomu C. elegans. Nalezené vzory jsou porovnány s daty z databáze NCBI. Je zde poukázáno na vztah vytvořených spektrogramů a oblastí kódujících proteiny. Jsou zde uvedeny spektrogramy dobře rozeznatelných vzorů tvořených tandemovými repeticemi složenými ze satelitů, mikrosatelitů a minisatelitů.Various methods of DNA sequences modifications for frequency analysis and basic characteristics of DNA are described in the theoretical part of this thesis. Tricolor spectrograms, created by short time Fourier transform help us to recognize some characteristic patterns in DNA sequences. Practical part of this work deals with developed programme which generates spectrograms and analyse them. Last part deals with the analysis of selected sequences of C. elegans genome. Some patterns are related to data of public databases such as NCBI. Various patterns are explained from the biological nature, which relates to chromosome structure and protein coding regions. Another well recognised patterns, tandem repetitions composed of satellites, microsatellites and minisatelites are described by spectrograms as well.

Digital library of Brno University of Technology

National Repository of Grey Literature