100,784 research outputs found

    GenomeFingerprinter and universal genome fingerprint analysis for systematic comparative genomics

    Get PDF
    How to compare whole genome sequences at large scale has not been achieved via conventional methods based on pair-wisely base-to-base comparison; nevertheless, no attention was paid to handle in-one-sitting a number of genomes crossing genetic category (chromosome, plasmid, and phage) with farther divergences (much less or no homologous) over large size ranges (from Kbp to Mbp). We created a new method, GenomeFingerprinter, to unambiguously produce three-dimensional coordinates from a sequence, followed by one three-dimensional plot and six two-dimensional trajectory projections to illustrate whole genome fingerprints. We further developed a set of concepts and tools and thereby established a new method, universal genome fingerprint analysis. We demonstrated their applications through case studies on over a hundred of genome sequences. Particularly, we defined the total genetic component configuration (TGCC) (i.e., chromosome, plasmid, and phage) for describing a strain as a system, and the universal genome fingerprint map (UGFM) of TGCC for differentiating a strain as a universal system, as well as the systematic comparative genomics (SCG) for comparing in-one-sitting a number of genomes crossing genetic category in diverse strains. By using UGFM, UGFM-TGCC, and UGFM-TGCC-SCG, we compared a number of genome sequences with farther divergences (chromosome, plasmid, and phage; bacterium, archaeal bacterium, and virus) over large size ranges (6Kbp~5Mbp), giving new insights into critical problematic issues in microbial genomics in the post-genomic era. This paper provided a new method for rapidly computing, geometrically visualizing, and intuitively comparing genome sequences at fingerprint level, and hence established a new method of universal genome fingerprint analysis for systematic comparative genomics.Comment: 63 pages, 15 figures, 5 table

    Variation block-based genomics method for crop plants

    Get PDF
    BACKGROUND: In contrast with wild species, cultivated crop genomes consist of reshuffled recombination blocks, which occurred by crossing and selection processes. Accordingly, recombination block-based genomics analysis can be an effective approach for the screening of target loci for agricultural traits. RESULTS: We propose the variation block method, which is a three-step process for recombination block detection and comparison. The first step is to detect variations by comparing the short-read DNA sequences of the cultivar to the reference genome of the target crop. Next, sequence blocks with variation patterns are examined and defined. The boundaries between the variation-containing sequence blocks are regarded as recombination sites. All the assumed recombination sites in the cultivar set are used to split the genomes, and the resulting sequence regions are termed variation blocks. Finally, the genomes are compared using the variation blocks. The variation block method identified recurring recombination blocks accurately and successfully represented block-level diversities in the publicly available genomes of 31 soybean and 23 rice accessions. The practicality of this approach was demonstrated by the identification of a putative locus determining soybean hilum color. CONCLUSIONS: We suggest that the variation block method is an efficient genomics method for the recombination block-level comparison of crop genomes. We expect that this method will facilitate the development of crop genomics by bringing genomics technologies to the field of crop breeding

    Microarray-based global mapping of integration sites for the retrotransposon, intracisternal A-particle, in the mouse genome

    Get PDF
    Mammalian genomes contain numerous evolutionary harbored mobile elements, a part of which are still active and may cause genomic instability. Their movement and positional diversity occasionally result in phenotypic changes and variation by causing altered expression or disruption of neighboring host genes. Here, we describe a novel microarray-based method by which dispersed genomic locations of a type of retrotransposon in a mammalian genome can be identified. Using this method, we mapped the DNA elements for a mouse retrotransposon, intracisternal A-particle (IAP), within genomes of C3H/He and C57BL/6J inbred mouse strains; consequently we detected hundreds of probable IAP cDNAā€“integrated genomic regions, in which a considerable number of strain-specific putative insertions were included. In addition, by comparing genomic DNAs from radiation-induced myeloid leukemia cells and its reference normal tissue, we detected three genomic regions around which an IAP element was integrated. These results demonstrate the first successful genome-wide mapping of a retrotransposon type in a mammalian genome

    Synonymous dinucleotide usage: a codon-aware metric for quantifying dinucleotide representation in viruses

    Get PDF
    Distinct patterns of dinucleotide representation, such as CpG and UpA suppression, are characteristic of certain viral genomes. Recent research has uncovered vertebrate immune mechanisms that select against specific dinucleotides in targeted viruses. This evidence highlights the importance of systematically examining the dinucleotide composition of viral genomes. We have developed a novel metric, called synonymous dinucleotide usage (SDU), for quantifying dinucleotide representation in coding sequences. Our method compares the abundance of a given dinucleotide to the null hypothesis of equal synonymous codon usage in the sequence. We present a Python3 package, DinuQ, for calculating SDU and other relevant metrics. We have applied this method on two sets of invertebrate- and vertebrate-specific flaviviruses and rhabdoviruses. The SDU shows that the vertebrate viruses exhibit consistently greater under-representation of CpG dinucleotides in all three codon positions in both datasets. In comparison to existing metrics for dinucleotide quantification, the SDU allows for a statistical interpretation of its values by comparing it to a null expectation based on the codon table. Here we apply the method to viruses, but coding sequences of other living organisms can be analysed in the same way

    Seven clusters in genomic triplet distributions

    Get PDF
    Motivation: In several recent papers new algorithms were proposed for detecting coding regions without requiring learning dataset of already known genes. In this paper we studied cluster structure of several genomes in the space of codon usage. This allowed to interpret some of the results obtained in other studies and propose a simpler method, which is, nevertheless, fully functional. Results: Several complete genomic sequences were analyzed, using visualization of tables of triplet counts in a sliding window. The distribution of 64-dimensional vectors of triplet frequencies displays a well-detectable cluster structure. The structure was found to consist of seven clusters, corresponding to protein-coding information in three possible phases in one of the two complementary strands and in the non-coding regions. Awareness of the existence of this structure allows development of methods for the segmentation of sequences into regions with the same coding phase and non-coding regions. This method may be completely unsupervised or use some external information. Since the method does not need extraction of ORFs, it can be applied even for unassembled genomes. Accuracy calculated on the base-pair level (both sensitivity and specificity) exceeds 90%. This is not worse as compared to such methods as HMM, however, has the advantage to be much simpler and clear

    A Mutual Information Based Sequence Distance For Vertebrate Phylogeny Using Complete Mitochondrial Genomes

    Get PDF
    Traditional sequence distances require alignment. A new mutual information based sequence distance without alignment is defined in this paper. This distance is based on compositional vectors of DNA sequences or protein sequences from complete genomes. First we establish the mathematical foundation of this distance. Then this distance is applied to analyze the phylogenetic relationship of 64 vertebrates using complete mitochondrial genomes. The phylogenetic tree shows that the mitochondrial genomes are separated into three major groups. One group corresponds to mammals; one group corresponds to fish; and the last one is Archosauria (including birds and reptiles). The structure of the tree based on our new distance is roughly in agreement in topology with the current known phylogenies of vertebrates

    Using Ancient Samples in Projection Analysis.

    Get PDF
    Projection analysis is a tool that extracts information from the joint allele frequency spectrum to better understand the relationship between two populations. In projection analysis, a test genome is compared to a set of genomes from a reference population. The projection's shape depends on the historical relationship of the test genome's population to the reference population. Here, we explore in greater depth the effects on the projection when ancient samples are included in the analysis. First, we conduct a series of simulations in which the ancient sample is directly ancestral to a present-day population (one-population model), or the ancient sample is ancestral to a sister population that diverged before the time of sampling (two-population model). We find that there are characteristic differences between the projections for the one-population and two-population models, which indicate that the projection can be used to determine whether a test genome is directly ancestral to a present-day population or not. Second, we compute projections for several published ancient genomes. We compare two Neanderthals and three ancient human genomes to European, Han Chinese and Yoruba reference panels. We use a previously constructed demographic model and insert these five ancient genomes to assess how well the observed projections are recovered
    • ā€¦
    corecore