Search CORE

100,784 research outputs found

GenomeFingerprinter and universal genome fingerprint analysis for systematic comparative genomics

Author: Ai Hannan
Ai Yuncan
Meng Fanmei
Zhao Lei
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 09/03/2013
Field of study

How to compare whole genome sequences at large scale has not been achieved via conventional methods based on pair-wisely base-to-base comparison; nevertheless, no attention was paid to handle in-one-sitting a number of genomes crossing genetic category (chromosome, plasmid, and phage) with farther divergences (much less or no homologous) over large size ranges (from Kbp to Mbp). We created a new method, GenomeFingerprinter, to unambiguously produce three-dimensional coordinates from a sequence, followed by one three-dimensional plot and six two-dimensional trajectory projections to illustrate whole genome fingerprints. We further developed a set of concepts and tools and thereby established a new method, universal genome fingerprint analysis. We demonstrated their applications through case studies on over a hundred of genome sequences. Particularly, we defined the total genetic component configuration (TGCC) (i.e., chromosome, plasmid, and phage) for describing a strain as a system, and the universal genome fingerprint map (UGFM) of TGCC for differentiating a strain as a universal system, as well as the systematic comparative genomics (SCG) for comparing in-one-sitting a number of genomes crossing genetic category in diverse strains. By using UGFM, UGFM-TGCC, and UGFM-TGCC-SCG, we compared a number of genome sequences with farther divergences (chromosome, plasmid, and phage; bacterium, archaeal bacterium, and virus) over large size ranges (6Kbp~5Mbp), giving new insights into critical problematic issues in microbial genomics in the post-genomic era. This paper provided a new method for rapidly computing, geometrically visualizing, and intuitively comparing genome sequences at fingerprint level, and hence established a new method of universal genome fingerprint analysis for systematic comparative genomics.Comment: 63 pages, 15 figures, 5 table

arXiv.org e-Print Archive

Directory of Open Access Journals

PubMed Central

FigShare

Variation block-based genomics method for crop plants

Author: Byoung-Chul Kim
Chang Hong
Dongwoo Lee
Hak-Min Kim
Ho-Sung Yoon
Hong Yun
Hyang Park
Hye Chun
Hyunmin Kim
Ik-Young Choi
Kwang Jeong
Man Choi
Min Seo
Sang Lim
Seuk Lee
Seungwoo Hwang
Suk-Ha Lee
Sun Kim
Sunghoon Lee
Sungwoong Jho
Tae-Young Hwang
Wook Kim
Young Kim
Young-Ah Shin
Young-Up Kwon
Yul Kim
Yun Cho
Publication venue: Springer Nature
Publication date: 01/01/2014
Field of study

BACKGROUND: In contrast with wild species, cultivated crop genomes consist of reshuffled recombination blocks, which occurred by crossing and selection processes. Accordingly, recombination block-based genomics analysis can be an effective approach for the screening of target loci for agricultural traits. RESULTS: We propose the variation block method, which is a three-step process for recombination block detection and comparison. The first step is to detect variations by comparing the short-read DNA sequences of the cultivar to the reference genome of the target crop. Next, sequence blocks with variation patterns are examined and defined. The boundaries between the variation-containing sequence blocks are regarded as recombination sites. All the assumed recombination sites in the cultivar set are used to split the genomes, and the resulting sequence regions are termed variation blocks. Finally, the genomes are compared using the variation blocks. The variation block method identified recurring recombination blocks accurately and successfully represented block-level diversities in the publicly available genomes of 31 soybean and 23 rice accessions. The practicality of this approach was demonstrated by the identification of a putative locus determining soybean hilum color. CONCLUSIONS: We suggest that the variation block method is an efficient genomics method for the recombination block-level comparison of crop genomes. We expect that this method will facilitate the development of crop genomics by bringing genomics technologies to the field of crop breeding

Springer - Publisher Connector

PubMed Central

Microarray-based global mapping of integration sites for the retrotransposon, intracisternal A-particle, in the mouse genome

Author: Fujikawa Katsuyoshi
Hirouchi Tokuhisa
Ishihara Hiroshi
Kakinuma Shizuko
Nakamura Masako M.
Oghiso Yoichi
Ohmachi Yasushi
Shimada Yoshiya
Takabatake Takashi
Tanaka Izumi
Tanaka Kimio
Publication venue: Oxford University Press
Publication date: 01/05/2008
Field of study

Mammalian genomes contain numerous evolutionary harbored mobile elements, a part of which are still active and may cause genomic instability. Their movement and positional diversity occasionally result in phenotypic changes and variation by causing altered expression or disruption of neighboring host genes. Here, we describe a novel microarray-based method by which dispersed genomic locations of a type of retrotransposon in a mammalian genome can be identified. Using this method, we mapped the DNA elements for a mouse retrotransposon, intracisternal A-particle (IAP), within genomes of C3H/He and C57BL/6J inbred mouse strains; consequently we detected hundreds of probable IAP cDNA–integrated genomic regions, in which a considerable number of strain-specific putative insertions were included. In addition, by comparing genomic DNAs from radiation-induced myeloid leukemia cells and its reference normal tissue, we detected three genomic regions around which an IAP element was integrated. These results demonstrate the first successful genome-wide mapping of a retrotransposon type in a mammalian genome

PubMed Central

National Institute of Radiological Science: NIRS-Repository / 放射線医学総合研究所学術機関リポジトリ

Synonymous dinucleotide usage: a codon-aware metric for quantifying dinucleotide representation in viruses

Author: Hughes Joseph
Lytras Spyros
Publication venue: 'MDPI AG'
Publication date: 01/04/2020
Field of study

Distinct patterns of dinucleotide representation, such as CpG and UpA suppression, are characteristic of certain viral genomes. Recent research has uncovered vertebrate immune mechanisms that select against specific dinucleotides in targeted viruses. This evidence highlights the importance of systematically examining the dinucleotide composition of viral genomes. We have developed a novel metric, called synonymous dinucleotide usage (SDU), for quantifying dinucleotide representation in coding sequences. Our method compares the abundance of a given dinucleotide to the null hypothesis of equal synonymous codon usage in the sequence. We present a Python3 package, DinuQ, for calculating SDU and other relevant metrics. We have applied this method on two sets of invertebrate- and vertebrate-specific flaviviruses and rhabdoviruses. The SDU shows that the vertebrate viruses exhibit consistently greater under-representation of CpG dinucleotides in all three codon positions in both datasets. In comparison to existing metrics for dinucleotide quantification, the SDU allows for a statistical interpretation of its values by comparing it to a null expectation based on the codon table. Here we apply the method to viruses, but coding sequences of other living organisms can be analysed in the same way

Multidisciplinary Digital Publishing Institute

Enlighten

Seven clusters in genomic triplet distributions

Author: Gorban Prof. Alexander N.
Popova Dr. Tatyana G.
Zinovyev Dr. Andrei Yu
Publication venue
Publication date: 01/01/2002
Field of study

Motivation: In several recent papers new algorithms were proposed for detecting coding regions without requiring learning dataset of already known genes. In this paper we studied cluster structure of several genomes in the space of codon usage. This allowed to interpret some of the results obtained in other studies and propose a simpler method, which is, nevertheless, fully functional. Results: Several complete genomic sequences were analyzed, using visualization of tables of triplet counts in a sliding window. The distribution of 64-dimensional vectors of triplet frequencies displays a well-detectable cluster structure. The structure was found to consist of seven clusters, corresponding to protein-coding information in three possible phases in one of the two complementary strands and in the non-coding regions. Awareness of the existence of this structure allows development of methods for the segmentation of sequences into regions with the same coding phase and non-coding regions. This method may be completely unsupervised or use some external information. Since the method does not need extraction of ORFs, it can be applied even for unassembled genomes. Accuracy calculated on the base-pair level (both sensitivity and specificity) exceeds 90%. This is not worse as compared to such methods as HMM, however, has the advantage to be much simpler and clear

CogPrints Cognitive Sciences Eprint Archive

A Mutual Information Based Sequence Distance For Vertebrate Phylogeny Using Complete Mitochondrial Genomes

Author: Anh Vo
Mao Z
Yu Zuguo
Zhou Li-Qian
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2007
Field of study

Traditional sequence distances require alignment. A new mutual information based sequence distance without alignment is defined in this paper. This distance is based on compositional vectors of DNA sequences or protein sequences from complete genomes. First we establish the mathematical foundation of this distance. Then this distance is applied to analyze the phylogenetic relationship of 64 vertebrates using complete mitochondrial genomes. The phylogenetic tree shows that the mitochondrial genomes are separated into three major groups. One group corresponds to mammals; one group corresponds to fish; and the last one is Archosauria (including birds and reptiles). The structure of the tree based on our new distance is roughly in agreement in topology with the current known phylogenies of vertebrates

Crossref

Queensland University of Technology ePrints Archive

Using Ancient Samples in Projection Analysis.

Author: Slatkin Montgomery
Yang Melinda A
Publication venue: eScholarship, University of California
Publication date: 01/11/2015
Field of study

Projection analysis is a tool that extracts information from the joint allele frequency spectrum to better understand the relationship between two populations. In projection analysis, a test genome is compared to a set of genomes from a reference population. The projection's shape depends on the historical relationship of the test genome's population to the reference population. Here, we explore in greater depth the effects on the projection when ancient samples are included in the analysis. First, we conduct a series of simulations in which the ancient sample is directly ancestral to a present-day population (one-population model), or the ancient sample is ancestral to a sister population that diverged before the time of sampling (two-population model). We find that there are characteristic differences between the projections for the one-population and two-population models, which indicate that the projection can be used to determine whether a test genome is directly ancestral to a present-day population or not. Second, we compute projections for several published ancient genomes. We compare two Neanderthals and three ancient human genomes to European, Han Chinese and Yoruba reference panels. We use a previously constructed demographic model and insert these five ancient genomes to assess how well the observed projections are recovered

Directory of Open Access Journals

PubMed Central

eScholarship - University of California