231 research outputs found
Hierarchical structure of cascade of primary and secondary periodicities in Fourier power spectrum of alphoid higher order repeats
<p>Abstract</p> <p>Background</p> <p>Identification of approximate tandem repeats is an important task of broad significance and still remains a challenging problem of computational genomics. Often there is no single best approach to periodicity detection and a combination of different methods may improve the prediction accuracy. Discrete Fourier transform (DFT) has been extensively used to study primary periodicities in DNA sequences. Here we investigate the application of DFT method to identify and study alphoid higher order repeats.</p> <p>Results</p> <p>We used method based on DFT with mapping of symbolic into numerical sequence to identify and study alphoid higher order repeats (HOR). For HORs the power spectrum shows equidistant frequency pattern, with characteristic two-level hierarchical organization as signature of HOR. Our case study was the 16 mer HOR tandem in AC017075.8 from human chromosome 7. Very long array of equidistant peaks at multiple frequencies (more than a thousand higher harmonics) is based on fundamental frequency of 16 mer HOR. Pronounced subset of equidistant peaks is based on multiples of the fundamental HOR frequency (multiplication factor <it>n </it>for <it>n</it>mer) and higher harmonics. In general, <it>n</it>mer HOR-pattern contains equidistant secondary periodicity peaks, having a pronounced subset of equidistant primary periodicity peaks. This hierarchical pattern as signature for HOR detection is robust with respect to monomer insertions and deletions, random sequence insertions etc. For a monomeric alphoid sequence only primary periodicity peaks are present. The 1/<it>f</it><sup><it>β </it></sup>– noise and periodicity three pattern are missing from power spectra in alphoid regions, in accordance with expectations.</p> <p>Conclusion</p> <p>DFT provides a robust detection method for higher order periodicity. Easily recognizable HOR power spectrum is characterized by hierarchical two-level equidistant pattern: higher harmonics of the fundamental HOR-frequency (secondary periodicity) and a subset of pronounced peaks corresponding to constituent monomers (primary periodicity). The number of lower frequency peaks (secondary periodicity) below the frequency of the first primary periodicity peak reveals the size of <it>n</it>mer HOR, i.e., the number <it>n </it>of monomers contained in consensus HOR.</p
Computational analysis of nucleosome positioning datasets
Chromatin is a complex of DNA and histone proteins that constitutes the elemental
material of eukaryotic chromosomes. The basic repeating sub-unit of chromatin, the
nucleosome core particle, is comprised of approximately 146 base pairs (bp) of DNA
wrapped around an octamer of core histones. Core particles are joined together by
variable lengths of linker DNA to form chains of nucleosomes that are folded into
higher-order structures. The specific distribution of nucleosomes along the DNA
fibre is known to influence this folding process. Furthermore, on a local level, the
positioning of nucleosomes can control access to DNA sequence motifs, and thus
plays a fundamental role in regulating gene expression. Despite considerable
experimental effort, neither the folding process nor the mechanisms for gene
regulation are currently well understood.Monomer extension (ME) is an established in vitro experimental technique which
maps the positions adopted by reconstituted core histone octamers on a defined DNA
sequence. It provides quantitative positioning information, at high resolution, over
long continuous stretches of DNA sequence. This technique has been employed to
map several genes: globin genes (8 kbp), the beta-lactoglobulin gene (10 kbp) and
various imprinting genes (4 kbp).This study explores and analyses this unique dataset, utilising computational and
stochastic techniques, to gain insight into the potential influence of nucleosomal
positioning on the structure and function of chromatin. The first section of this thesis
expands upon prior analyses, explores general features of the dataset using common
bioinformatics tools, and attempts to relate the quantitative positioning information
from ME to data from other commonly used competitive reconstitution protocols.
Finally, evidence of a correlation between the in vitro ME dataset and in vivo
nucleosome positions for the beta-lactoglobulin gene region is presented.The second section presents the development of a novel method for the analysis of
ME maps using Monte Carlo simulation methods. The goal was to use the ME
datasets to simulate a higher order chromatin fibre, taking advantage of the longrange and quantitative nature of the ME datasets.The Monte Carlo simulations have allowed new insights to be gleaned from the
datasets. Analysis of the beta-lactoglobulin positioning map indicates the potential
for discrete disruption of nucleosomal organisation, at specific physiological
nucleosome densities, over regions found to have unusual chromatin structure in
vivo. This suggests a correspondence between the quantitative histone octamer
positioning information in vitro and the positioning of nucleosomes in vivo.Further, the simulations demonstrate that histone density-dependent changes in
nucleosomal organisation, in both the beta-lactoglobulin and globin positioning
maps, often occur in regions involved in gene regulation. This implies that irregular
chromatin structures may form over certain biologically significant regions.Taken together, these studies lend weight to the hypothesis that nucleosome
positioning information encoded within DNA plays a fundamental role in directing
chromatin structure in vivo
The Bioinformatics Tools for Discovery of Genetic Diversity by Means of Elastic Net and Hurst Exponent
The genome era allowed us to evaluate different aspects on genetic variation, with a precise manner followed by a valuable tip to guide the improvement of knowledge and direct to upgrade to human life. In order to scrutinize these treasured resources, some bioinformatics tools permit us a deep exploration of these data. Among them, we show the importance of the discrete non-decimated wavelet transform (NDWT). The wavelets have a better ability to capture hidden components of biological data and an efficient link between biological systems and the mathematical objects used to describe them. The decomposition of signals/sequences at different levels of resolution allows obtaining distinct characteristics in each level. The analysis using technique of wavelets has been growing increasingly in the study of genomes. One of the great advantages associated to this method corresponds to the computational gain, that is, the analyses are processed almost in real time. The applicability is in several areas of science, such as physics, mathematics, engineering, and genetics, among others. In this context, we believe that using R software and applied NDWT coupled with elastic net domains and Hurst exponent will be of valuable guideline to researchers of genetics in the investigation of the genetic variability
Mapping Equivalence for Symbolic Sequences: Theory and Applications
Processing of symbolic sequences represented by mapping of symbolic data into
numerical signals is commonly used in various applications. It is a
particularly popular approach in genomic and proteomic sequence analysis.
Numerous mappings of symbolic sequences have been proposed for various
applications. It is unclear however whether the processing of symbolic data
provides an artifact of the numerical mapping or is an inherent property of the
symbolic data. This issue has been long ignored in the engineering and
scientific literature. It is possible that many of the results obtained in
symbolic signal processing could be a byproduct of the mapping and might not
shed any light on the underlying properties embedded in the data. Moreover, in
many applications, conflicting conclusions may arise due to the choice of the
mapping used for numerical representation of symbolic data. In this paper, we
present a novel framework for the analysis of the equivalence of the mappings
used for numerical representation of symbolic data. We present strong and weak
equivalence properties and rely on signal correlation to characterize
equivalent mappings. We derive theoretical results which establish conditions
for consistency among numerical mappings of symbolic data. Furthermore, we
introduce an abstract mapping model for symbolic sequences and extend the
notion of equivalence to an algebraic framework. Finally, we illustrate our
theoretical results by application to DNA sequence analysis
DNA-encoded nucleosome occupancy is associated with transcription levels in the human malaria parasite Plasmodium falciparum.
BackgroundIn eukaryotic organisms, packaging of DNA into nucleosomes controls gene expression by regulating access of the promoter to transcription factors. The human malaria parasite Plasmodium falciparum encodes relatively few transcription factors, while extensive nucleosome remodeling occurs during its replicative cycle in red blood cells. These observations point towards an important role of the nucleosome landscape in regulating gene expression. However, the relation between nucleosome positioning and transcriptional activity has thus far not been explored in detail in the parasite.ResultsHere, we analyzed nucleosome positioning in the asexual and sexual stages of the parasite's erythrocytic cycle using chromatin immunoprecipitation of MNase-digested chromatin, followed by next-generation sequencing. We observed a relatively open chromatin structure at the trophozoite and gametocyte stages, consistent with high levels of transcriptional activity in these stages. Nucleosome occupancy of genes and promoter regions were subsequently compared to steady-state mRNA expression levels. Transcript abundance showed a strong inverse correlation with nucleosome occupancy levels in promoter regions. In addition, AT-repeat sequences were strongly unfavorable for nucleosome binding in P. falciparum, and were overrepresented in promoters of highly expressed genes.ConclusionsThe connection between chromatin structure and gene expression in P. falciparum shares similarities with other eukaryotes. However, the remarkable nucleosome dynamics during the erythrocytic stages and the absence of a large variety of transcription factors may indicate that nucleosome binding and remodeling are critical regulators of transcript levels. Moreover, the strong dependency between chromatin structure and DNA sequence suggests that the P. falciparum genome may have been shaped by nucleosome binding preferences. Nucleosome remodeling mechanisms in this deadly parasite could thus provide potent novel anti-malarial targets
The bioinformatics tools for discovery of genetic diversity by means of elastic net and hurst exponent.
Abstract The genome era allowed us to evaluate different aspects on genetic variation, with a precise manner followed by a valuable tip to guide the improvement of knowledge and direct to upgrade to human life. In order to scrutinize these treasured resources, some bioinformatics tools permit us a deep exploration of these data. Among them, we show the importance of the discrete non-decimated wavelet transform (NDWT). The wavelets have a better ability to capture hidden components of biological data and an efficient link between biological systems and the mathematical objects used to describe them. The decomposition of signals/ sequences at different levels of resolution allows obtaining distinct characteristics in each level. The analysis using technique of wavelets has been growing increasingly in the study of genomes. One of the great advantages associated to this method corresponds to the computational gain, that is, the analyses are processed almost in real time. The applicability is in several areas of science, such as physics, mathematics, engineering, and genetics, among others. In this context, we believe that using R software and applied NDWT coupled with elastic net domains and Hurst exponent will be of valuable gu
- …