22 research outputs found

    A standalone version of IsoFinder for the computational prediction of isochores in genome sequences

    Get PDF
    Isochores are long genome segments relatively homogeneous in G+C. A heuristic algorithm based on entropic segmentation has been developed by our group, and a web server implementing all the required components is available. However, a researcher may want to perform batch processing of many sequences simultaneously in its local machine, instead of analyzing them on one by one basis through the web. To this end, standalone versions are required. We report here the implementation of two standalone programs, able to predict isochores at the sequence level: 1) a command-line version (IsoFinder) for Windows and Linux systems; and 2) a user-friendly version (IsoFinderWin) running under Windows.Comment: 7 pages, 3 figure

    An Unusual 500,000 Bases Long Oscillation of Guanine and Cytosine Content in Human Chromosome 21

    Full text link
    An oscillation with a period of around 500 kb in guanine and cytosine content (GC%) is observed in the DNA sequence of human chromosome 21. This oscillation is localized in the rightmost one-eighth region of the chromosome, from 43.5 Mb to 46.5 Mb. Five cycles of oscillation are observed in this region with six GC-rich peaks and five GC-poor valleys. The GC-poor valleys comprise regions with low density of CpG islands and, alternating between the two DNA strands, low gene density regions. Consequently, the long-range oscillation of GC% result in spacing patterns of both CpG island density, and to a lesser extent, gene densities.Comment: 15 pages (figures included), 5 figure

    Evolutionary segmentation of yeast genome

    Get PDF
    Segmentation algorithms differ from clustering algorithms with regard to how to deal with the physical location of genes throughout the sequence. Therefore, segments have to keep the original positions of consecutive genes, which is not a constraint for clustering algorithms. It has been proven that exist functional relations among neighbour-genes, so the localization of the boundaries between these functionally similar groups of genes has turned out an important challenge. In this paper, we present an evolutionary algorithm to segment the yeast genome

    Identifying compositionally homogeneous and nonhomogeneous domains within the human genome using a novel segmentation algorithm

    Get PDF
    It has been suggested that the mammalian genome is composed mainly of long compositionally homogeneous domains. Such domains are frequently identified using recursive segmentation algorithms based on the Jensen–Shannon divergence. However, a common difficulty with such methods is deciding when to halt the recursive partitioning and what criteria to use in deciding whether a detected boundary between two segments is real or not. We demonstrate that commonly used halting criteria are intrinsically biased, and propose IsoPlotter, a parameter-free segmentation algorithm that overcomes such biases by using a simple dynamic halting criterion and tests the homogeneity of the inferred domains. IsoPlotter was compared with an alternative segmentation algorithm, DJS, using two sets of simulated genomic sequences. Our results show that IsoPlotter was able to infer both long and short compositionally homogeneous domains with low GC content dispersion, whereas DJS failed to identify short compositionally homogeneous domains and sequences with low compositional dispersion. By segmenting the human genome with IsoPlotter, we found that one-third of the genome is composed of compositionally nonhomogeneous domains and the remaining is a mixture of many short compositionally homogeneous domains and relatively few long ones

    Deciphering Heterogeneity in Pig Genome Assembly Sscrofa9 by Isochore and Isochore-Like Region Analyses

    Get PDF
    Background: The isochore, a large DNA sequence with relatively small GC variance, is one of the most important structures in eukaryotic genomes. Although the isochore has been widely studied in humans and other species, little is known about its distribution in pigs. Principal Findings: In this paper, we construct a map of long homogeneous genome regions (LHGRs), i.e., isochores and isochore-like regions, in pigs to provide an intuitive version of GC heterogeneity in each chromosome. The LHGR pattern study not only quantifies heterogeneities, but also reveals some primary characteristics of the chromatin organization, including the followings: (1) the majority of LHGRs belong to GC-poor families and are in long length; (2) a high gene density tends to occur with the appearance of GC-rich LHGRs; and (3) the density of LINE repeats decreases with an increase in the GC content of LHGRs. Furthermore, a portion of LHGRs with particular GC ranges (50%–51 % and 54%–55%) tend to have abnormally high gene densities, suggesting that biased gene conversion (BGC), as well as time- and energy-saving principles, could be of importance to the formation of genome organization. Conclusion: This study significantly improves our knowledge of chromatin organization in the pig genome. Correlations between the different biological features (e.g., gene density and repeat density) and GC content of LHGRs provide a uniqu

    Characterisation of Inactivation Domains and Evolutionary Strata in Human X Chromosome through Markov Segmentation

    Get PDF
    Markov segmentation is a method of identifying compositionally different subsequences in a given symbolic sequence. We have applied this technique to the DNA sequence of the human X chromosome to analyze its compositional structure. The human X chromosome is known to have acquired DNA through distinct evolutionary events and is believed to be composed of five evolutionary strata. In addition, in female mammals all copies of X chromosome in excess of one are transcriptionally inactivated. The location of a gene is correlated with its ability to undergo inactivation, but correlations between evolutionary strata and inactivation domains are less clear. Our analysis provides an accurate estimate of the location of stratum boundaries and gives a high–resolution map of compositionally different regions on the X chromosome. This leads to the identification of a novel stratum, as well as segments wherein a group of genes either undergo inactivation or escape inactivation in toto. We identify oligomers that appear to be unique to inactivation domains alone

    Kernel Principle Component Analysis of Microarray Data. Final Report

    Full text link

    Prediction of CpG-island function: CpG clustering vs. sliding-window methods

    Get PDF
    Background Unmethylated stretches of CpG dinucleotides (CpG islands) are an outstanding property of mammal genomes. Conventionally, these regions are detected by sliding window approaches using %G + C, CpG observed/expected ratio and length thresholds as main parameters. Recently, clustering methods directly detect clusters of CpG dinucleotides as a statistical property of the genome sequence. Results We compare sliding-window to clustering (i.e. CpGcluster) predictions by applying new ways to detect putative functionality of CpG islands. Analyzing the co-localization with several genomic regions as a function of window size vs. statistical significance (p-value), CpGcluster shows a higher overlap with promoter regions and highly conserved elements, at the same time showing less overlap with Alu retrotransposons. The major difference in the prediction was found for short islands (CpG islets), often exclusively predicted by CpGcluster. Many of these islets seem to be functional, as they are unmethylated, highly conserved and/or located within the promoter region. Finally, we show that window-based islands can spuriously overlap several, differentially regulated promoters as well as different methylation domains, which might indicate a wrong merge of several CpG islands into a single, very long island. The shorter CpGcluster islands seem to be much more specific when concerning the overlap with alternative transcription start sites or the detection of homogenous methylation domains. Conclusions The main difference between sliding-window approaches and clustering methods is the length of the predicted islands. Short islands, often differentially methylated, are almost exclusively predicted by CpGcluster. This suggests that CpGcluster may be the algorithm of choice to explore the function of these short, but putatively functional CpG islands

    Comparison of MHC class I risk haplotypes in Thai and Caucasian psoriatics shows locus heterogeneity at PSORS1

    Full text link
    Earlier studies have shown that psoriasis in Japan and Thailand is associated with two different major histocompatibility complex (MHC) haplotypes – those bearing HLA-Cw6 and those bearing HLA-Cw1 and HLA-B46. In an independent case-control sample from Thailand, we confirmed the association of psoriasis with both haplotypes. No association was seen in Thai HLA-Cw1 haplotypes lacking HLA-B46 , nor was HLA-Cw1 associated with psoriasis in a large Caucasian sample. To assess whether these risk haplotypes share a common origin, we sequenced genomic DNA from a Thai HLA-Cw1-B46 homozygote across the ∼300 kb MHC risk interval, and compared it with sequence of a HLA-Cw6-B57 risk haplotype. Three small regions of homology were found, but these regions share equivalent sequence similarity with one or more clearly non-risk haplotypes, and they contain no polymorphism alleles unique to all risk haplotypes. Differences in psoriasis phenotype were also observed, including lower risk of disease, greater nail involvement, and later age at onset in HLA-Cw1-B46 carriers compared with HLA-Cw6 carriers. These findings suggest locus heterogeneity at PSORS1 (psoriasis susceptibility 1), the major psoriasis susceptibility locus in the MHC, with HLA-Cw6 imparting risk in both Caucasians and Asians, and an allele other than HLA-Cw1 on the HLA-Cw1-B46 haplotype acting as an additional risk variant in East Asians.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/79072/1/TAN_1526_sm_tables1.pdfhttp://deepblue.lib.umich.edu/bitstream/2027.42/79072/2/j.1399-0039.2010.01526.x.pd
    corecore