231 research outputs found

    Hierarchical structure of cascade of primary and secondary periodicities in Fourier power spectrum of alphoid higher order repeats

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Identification of approximate tandem repeats is an important task of broad significance and still remains a challenging problem of computational genomics. Often there is no single best approach to periodicity detection and a combination of different methods may improve the prediction accuracy. Discrete Fourier transform (DFT) has been extensively used to study primary periodicities in DNA sequences. Here we investigate the application of DFT method to identify and study alphoid higher order repeats.</p> <p>Results</p> <p>We used method based on DFT with mapping of symbolic into numerical sequence to identify and study alphoid higher order repeats (HOR). For HORs the power spectrum shows equidistant frequency pattern, with characteristic two-level hierarchical organization as signature of HOR. Our case study was the 16 mer HOR tandem in AC017075.8 from human chromosome 7. Very long array of equidistant peaks at multiple frequencies (more than a thousand higher harmonics) is based on fundamental frequency of 16 mer HOR. Pronounced subset of equidistant peaks is based on multiples of the fundamental HOR frequency (multiplication factor <it>n </it>for <it>n</it>mer) and higher harmonics. In general, <it>n</it>mer HOR-pattern contains equidistant secondary periodicity peaks, having a pronounced subset of equidistant primary periodicity peaks. This hierarchical pattern as signature for HOR detection is robust with respect to monomer insertions and deletions, random sequence insertions etc. For a monomeric alphoid sequence only primary periodicity peaks are present. The 1/<it>f</it><sup><it>β </it></sup>– noise and periodicity three pattern are missing from power spectra in alphoid regions, in accordance with expectations.</p> <p>Conclusion</p> <p>DFT provides a robust detection method for higher order periodicity. Easily recognizable HOR power spectrum is characterized by hierarchical two-level equidistant pattern: higher harmonics of the fundamental HOR-frequency (secondary periodicity) and a subset of pronounced peaks corresponding to constituent monomers (primary periodicity). The number of lower frequency peaks (secondary periodicity) below the frequency of the first primary periodicity peak reveals the size of <it>n</it>mer HOR, i.e., the number <it>n </it>of monomers contained in consensus HOR.</p

    Computational analysis of nucleosome positioning datasets

    Get PDF
    Chromatin is a complex of DNA and histone proteins that constitutes the elemental material of eukaryotic chromosomes. The basic repeating sub-unit of chromatin, the nucleosome core particle, is comprised of approximately 146 base pairs (bp) of DNA wrapped around an octamer of core histones. Core particles are joined together by variable lengths of linker DNA to form chains of nucleosomes that are folded into higher-order structures. The specific distribution of nucleosomes along the DNA fibre is known to influence this folding process. Furthermore, on a local level, the positioning of nucleosomes can control access to DNA sequence motifs, and thus plays a fundamental role in regulating gene expression. Despite considerable experimental effort, neither the folding process nor the mechanisms for gene regulation are currently well understood.Monomer extension (ME) is an established in vitro experimental technique which maps the positions adopted by reconstituted core histone octamers on a defined DNA sequence. It provides quantitative positioning information, at high resolution, over long continuous stretches of DNA sequence. This technique has been employed to map several genes: globin genes (8 kbp), the beta-lactoglobulin gene (10 kbp) and various imprinting genes (4 kbp).This study explores and analyses this unique dataset, utilising computational and stochastic techniques, to gain insight into the potential influence of nucleosomal positioning on the structure and function of chromatin. The first section of this thesis expands upon prior analyses, explores general features of the dataset using common bioinformatics tools, and attempts to relate the quantitative positioning information from ME to data from other commonly used competitive reconstitution protocols. Finally, evidence of a correlation between the in vitro ME dataset and in vivo nucleosome positions for the beta-lactoglobulin gene region is presented.The second section presents the development of a novel method for the analysis of ME maps using Monte Carlo simulation methods. The goal was to use the ME datasets to simulate a higher order chromatin fibre, taking advantage of the longrange and quantitative nature of the ME datasets.The Monte Carlo simulations have allowed new insights to be gleaned from the datasets. Analysis of the beta-lactoglobulin positioning map indicates the potential for discrete disruption of nucleosomal organisation, at specific physiological nucleosome densities, over regions found to have unusual chromatin structure in vivo. This suggests a correspondence between the quantitative histone octamer positioning information in vitro and the positioning of nucleosomes in vivo.Further, the simulations demonstrate that histone density-dependent changes in nucleosomal organisation, in both the beta-lactoglobulin and globin positioning maps, often occur in regions involved in gene regulation. This implies that irregular chromatin structures may form over certain biologically significant regions.Taken together, these studies lend weight to the hypothesis that nucleosome positioning information encoded within DNA plays a fundamental role in directing chromatin structure in vivo

    The Bioinformatics Tools for Discovery of Genetic Diversity by Means of Elastic Net and Hurst Exponent

    Get PDF
    The genome era allowed us to evaluate different aspects on genetic variation, with a precise manner followed by a valuable tip to guide the improvement of knowledge and direct to upgrade to human life. In order to scrutinize these treasured resources, some bioinformatics tools permit us a deep exploration of these data. Among them, we show the importance of the discrete non-decimated wavelet transform (NDWT). The wavelets have a better ability to capture hidden components of biological data and an efficient link between biological systems and the mathematical objects used to describe them. The decomposition of signals/sequences at different levels of resolution allows obtaining distinct characteristics in each level. The analysis using technique of wavelets has been growing increasingly in the study of genomes. One of the great advantages associated to this method corresponds to the computational gain, that is, the analyses are processed almost in real time. The applicability is in several areas of science, such as physics, mathematics, engineering, and genetics, among others. In this context, we believe that using R software and applied NDWT coupled with elastic net domains and Hurst exponent will be of valuable guideline to researchers of genetics in the investigation of the genetic variability

    Mapping Equivalence for Symbolic Sequences: Theory and Applications

    Full text link
    Processing of symbolic sequences represented by mapping of symbolic data into numerical signals is commonly used in various applications. It is a particularly popular approach in genomic and proteomic sequence analysis. Numerous mappings of symbolic sequences have been proposed for various applications. It is unclear however whether the processing of symbolic data provides an artifact of the numerical mapping or is an inherent property of the symbolic data. This issue has been long ignored in the engineering and scientific literature. It is possible that many of the results obtained in symbolic signal processing could be a byproduct of the mapping and might not shed any light on the underlying properties embedded in the data. Moreover, in many applications, conflicting conclusions may arise due to the choice of the mapping used for numerical representation of symbolic data. In this paper, we present a novel framework for the analysis of the equivalence of the mappings used for numerical representation of symbolic data. We present strong and weak equivalence properties and rely on signal correlation to characterize equivalent mappings. We derive theoretical results which establish conditions for consistency among numerical mappings of symbolic data. Furthermore, we introduce an abstract mapping model for symbolic sequences and extend the notion of equivalence to an algebraic framework. Finally, we illustrate our theoretical results by application to DNA sequence analysis

    DNA-encoded nucleosome occupancy is associated with transcription levels in the human malaria parasite Plasmodium falciparum.

    Get PDF
    BackgroundIn eukaryotic organisms, packaging of DNA into nucleosomes controls gene expression by regulating access of the promoter to transcription factors. The human malaria parasite Plasmodium falciparum encodes relatively few transcription factors, while extensive nucleosome remodeling occurs during its replicative cycle in red blood cells. These observations point towards an important role of the nucleosome landscape in regulating gene expression. However, the relation between nucleosome positioning and transcriptional activity has thus far not been explored in detail in the parasite.ResultsHere, we analyzed nucleosome positioning in the asexual and sexual stages of the parasite's erythrocytic cycle using chromatin immunoprecipitation of MNase-digested chromatin, followed by next-generation sequencing. We observed a relatively open chromatin structure at the trophozoite and gametocyte stages, consistent with high levels of transcriptional activity in these stages. Nucleosome occupancy of genes and promoter regions were subsequently compared to steady-state mRNA expression levels. Transcript abundance showed a strong inverse correlation with nucleosome occupancy levels in promoter regions. In addition, AT-repeat sequences were strongly unfavorable for nucleosome binding in P. falciparum, and were overrepresented in promoters of highly expressed genes.ConclusionsThe connection between chromatin structure and gene expression in P. falciparum shares similarities with other eukaryotes. However, the remarkable nucleosome dynamics during the erythrocytic stages and the absence of a large variety of transcription factors may indicate that nucleosome binding and remodeling are critical regulators of transcript levels. Moreover, the strong dependency between chromatin structure and DNA sequence suggests that the P. falciparum genome may have been shaped by nucleosome binding preferences. Nucleosome remodeling mechanisms in this deadly parasite could thus provide potent novel anti-malarial targets

    Order and Fluctuations in DNA Sequences

    Get PDF

    The bioinformatics tools for discovery of genetic diversity by means of elastic net and hurst exponent.

    Get PDF
    Abstract The genome era allowed us to evaluate different aspects on genetic variation, with a precise manner followed by a valuable tip to guide the improvement of knowledge and direct to upgrade to human life. In order to scrutinize these treasured resources, some bioinformatics tools permit us a deep exploration of these data. Among them, we show the importance of the discrete non-decimated wavelet transform (NDWT). The wavelets have a better ability to capture hidden components of biological data and an efficient link between biological systems and the mathematical objects used to describe them. The decomposition of signals/ sequences at different levels of resolution allows obtaining distinct characteristics in each level. The analysis using technique of wavelets has been growing increasingly in the study of genomes. One of the great advantages associated to this method corresponds to the computational gain, that is, the analyses are processed almost in real time. The applicability is in several areas of science, such as physics, mathematics, engineering, and genetics, among others. In this context, we believe that using R software and applied NDWT coupled with elastic net domains and Hurst exponent will be of valuable gu
    corecore