11 research outputs found

    Isochores Merit the Prefix 'Iso'

    Full text link
    The isochore concept in human genome sequence was challenged in an analysis by the International Human Genome Sequencing Consortium (IHGSC). We argue here that a statement in IGHSC analysis concerning the existence of isochore is incorrect, because it had applied an inappropriate statistical test. To test the existence of isochores should be equivalent to a test of homogeneity of windowed GC%. The statistical test applied in the IHGSC's analysis, the binomial test, is however a test of a sequence being random on the base level. For testing the existence of isochore, or homogeneity in GC%, we propose to use another statistical test: the analysis of variance (ANOVA). It can be shown that DNA sequences that are rejected by binomial test may not be rejected by the ANOVA test.Comment: 14 pages (including 1 figure), submitte

    A standalone version of IsoFinder for the computational prediction of isochores in genome sequences

    Get PDF
    Isochores are long genome segments relatively homogeneous in G+C. A heuristic algorithm based on entropic segmentation has been developed by our group, and a web server implementing all the required components is available. However, a researcher may want to perform batch processing of many sequences simultaneously in its local machine, instead of analyzing them on one by one basis through the web. To this end, standalone versions are required. We report here the implementation of two standalone programs, able to predict isochores at the sequence level: 1) a command-line version (IsoFinder) for Windows and Linux systems; and 2) a user-friendly version (IsoFinderWin) running under Windows.Comment: 7 pages, 3 figure

    How Not to Search for Isochores: A Reply to Cohen et al

    Get PDF
    In a recent paper in these pages, Cohen et al. search for isochores in the human genome, based on a system of attributes that they assign to isochores. The putative isochores that they find and choose for presentation are almost all below 45% GC and cover only about 41% of the genome. Closer inspection reveals that the authors' methodology systematically loses GC-rich isochores because it does not anticipate the considerable fluctuations and corresponding long-range correlations that characterize mammalian DNA and that are highest in GC-rich DNA. Thus, they over-fragment GC-rich isochores (and also many GC-poor isochores) beyond recognition

    Noncoding DNA, isochores and gene expression: nucleosome formation potential

    Get PDF
    The nucleosome formation potential of introns, intergenic spacers and exons of human genes is shown here to negatively correlate with among-tissues breadth of gene expression. The nucleosome formation potential is also found to negatively correlate with the GC content of genomic sequences; the slope of regression line is steeper in exons compared with noncoding DNA (introns and intergenic spacers). The correlation with GC content is independent of sequence length; in turn, the nucleosome formation potential of introns and intergenic spacers positively (albeit weakly) correlates with sequence length independently of GC content. These findings help explain the functional significance of the isochores (regions differing in GC content) in the human genome as a result of optimization of genomic structure for epigenetic complexity and support the notion that noncoding DNA is important for orderly chromatin condensation and chromatin-mediated suppression of tissue-specific genes

    Universal spectrum for DNA base C+G frequency distribution in human chromosomes 1 to 24

    Get PDF
    Power spectra of human DNA base C+G frequency distribution in all available contiguous sections exhibit the universal inverse power law form of the statistical normal distribution for the 24 chromosomes. Inverse power law form for power spectra of space-time fluctuations is generic to dynamical systems in nature and indicate long-range space-time correlations. A recently developed general systems theory predicts the observed non-local connections as intrinsic to quantumlike chaos governing space-time fluctuations of dynamical systems. The model predicts the following. (1) The quasiperiodic Penrose tiling pattern for the nested coiled structure of the DNA molecule in the chromosome resulting in maximum packing efficiency. (2) The DNA molecule functions as a unified whole fuzzy logic network with ordered two-way signal transmission between the coding and non-coding regions. Recent studies indicate influence of non-coding regions on functions of coding regions in the DNA molecule

    Reproductive Isolation, in Individuals and During Evolution, as Result of Gross Genomic Rearrangement in Pigs, Birds and Dinosaurs

    Get PDF
    Chromosomal (karyotypic) analysis in animals is performed for three primary reasons: to diagnose genetic disease; to map genes to their place in the genome and to retrace evolutionary events by cross species comparison. Technology for analysis has progressed from chromosome banding (cytogenetics), to fluorescence in-situ hybridisation (FISH - molecular cytogenetics) through to microarrays and ultimately whole genome sequence analysis (cytogenomics or chromonomics). Indeed, the past 10-15 years has seen a revolution in whole genome sequencing, first with the human genome project, followed by those of key model and agricultural species and, more recently, ~60 de novo avian genome assemblies. Whole genome analysis provides detailed insight into the biology of chromosome rearrangements that occur both in individuals (for diagnostic purposes) and at an evolutionary level. It permits the study of gene mapping, trait linkage, phylogenomics, and gross genomic organisation and change. An essential pre-requisite however is an unbroken length of contiguous DNA sequence along the length of each chromosome. Most recent de novo genome assemblies fall short of this level of resolution producing lengths of contiguous sequence that are sub-chromosomal in size (scaffolds). Chromosome rearrangements can affect reproductive capability at an individual level (causing reduced fertility) and at a population level leading to reproductive isolation and subsequent speciation. The purpose of this thesis was to implement a step change in the combination of FISH technology with genome sequence data to provide greater insight into the nature of chromosomal rearrangement at an individual and evolutionary level. It therefore had four specific aims: The first was to isolate sub-telomeric sequences from the pig, cattle and chicken genome assemblies to develop a tool for the rapid screening of chromosome rearrangements. Now routinely used for porcine translocation screening (and in the future bovine screening), development work revealed serious integrity errors in the pig genome. The second aim was to isolate evolutionary conserved sequences from avian chromosomes to create a means of screening for macro-and microchromosomal rearrangements in birds. Results confirmed the hypothesis that microchromosomal rearrangements were rare in birds, except for previously known whole chromosomal fusions. The third was to use the above tools to complete scaffold based genome assemblies in two key avian species - the peregrine falcon and the pigeon. Finally, bioinformatic tools were used to infer the overall genome structure of hypothetical saurian and avian ancestors. Retracing of the evolutionary changes that occurred up until the emergence of birds allowed an assessment of chromosome evolution along the saurischia-maniraptora- avialae lineage. Analysis of evolutionary breakpoint regions (EBRs) allowed testing of the hypothesis that the ontology of genes within EBRs corresponded to measurable phenotypic change in the lineage under investigation. An enrichment of genes associated with body height corresponded to rapid size change in the dinosaur linage that led to modern birds. Taken together, these results paint a picture of a genome that, from about 260 million years ago formed a 'signature' highly successful avian-dinosaur karyotype that remained largely unchanged interchromosomally to the present day. These results represent significant insight into amniote genomic organization with the added benefit of developing tools that are widely applicable and transferrable for commercial animal breeding, for constructing de novo genome assemblies and for reconstructing, by inference, the overall genomic structure and evolution of extinct animals

    Sinais simbólicos e aplicações em genómica

    Get PDF
    Doutoramento em Engenharia ElectrotécnicaEsta dissertação surge no contexto do processamento de sinais simbólicos com o objectivo específico de contribuir para o conhecimento da estrutura das sequências de DNA. A localização automática de genes foi um dos problemas biológicos que motivou o desenvolvimento deste trabalho. A compressão de sequências genéticas, quer para reduzir o espaço de armazenamento quer para obtenção de modelos das mesmas, foi outra das motivações. Com o objectivo de contribuir para melhorar uma das técnicas frequentemente usadas na localização automática de genes são comparadas metodologias de análise espectral para sequências simbólicas. Também se discute a validade de aplicação de metodologias de análise espectral às sequências simbólicas e apresenta-se um novo método baseada na função de autocorrelação simbólica. Uma característica que usualmente é tomada para identificação de genes é o tamanho da risca espectral que reflecte a periodicidade de período três. Apresenta-se um algoritmo rápido baseado em contadores de símbolos para cálculo de várias riscas espectrais, e em particular da risca de período três. São também enunciadas e analisadas propriedades associadas ao tamanho de algumas riscas e à redundância espectral. Por último, desenvolve-se uma técnica para compressão de sequências genéticas baseada num modelo de três estados. Em regiões codificantes do DNA esta técnica leva em geral a melhores resultados do que as actuais técnicas de compressão.This dissertation addresses the problem of processing sequences of symbols, and has the specific aim of contributing to the analysis and modeling of DNA sequences. This work was partly motivated by the problem of automatic gene location. Another motivation was the compression of genetic sequences, both for the purpose of reducing the required storage and for determining good DNA models. The main methodologies of spectral analysis of symbolic sequences are compared. The application of spectral analysis methods to the symbolic sequences is discussed and a new method based on the symbolic autocorrelation function is presented. One feature that is often used in gene identification is the size of the Fourier coefficient that reflects periodicity of period three. A fast algorithm for the calculation of Fourier coefficients, based on symbol counters, was developed. Some properties associated with the size of some spectral coefficients and spectral redundancy are discussed. Finally, a technique based on a model with three states was developed to compress genetic sequences. In protein-coding regions this technique leads in general to better results than the state-of-the-art DNA compression techniques
    corecore