1,073 research outputs found

    Assessing pooled BAC and whole genome shotgun strategies for assembly of complex genomes

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>We investigate if pooling BAC clones and sequencing the pools can provide for more accurate assembly of genome sequences than the "whole genome shotgun" (WGS) approach. Furthermore, we quantify this accuracy increase. We compare the pooled BAC and WGS approaches using <it>in silico </it>simulations. Standard measures of assembly quality focus on assembly size and fragmentation, which are desirable for large whole genome assemblies. We propose additional measures enabling easy and visual comparison of assembly quality, such as rearrangements and redundant sequence content, relative to the known target sequence.</p> <p>Results</p> <p>The best assembly quality scores were obtained using 454 coverage of 15× linear and 5× paired (3kb insert size) reads (15L-5P) on <it>Arabidopsis</it>. This regime gave similarly good results on four additional plant genomes of very different GC and repeat contents. BAC pooling improved assembly scores over WGS assembly, coverage and redundancy scores improving the most.</p> <p>Conclusions</p> <p>BAC pooling works better than WGS, however, both require a physical map to order the scaffolds. Pool sizes up to 12Mbp work well, suggesting this pooling density to be effective in medium-scale re-sequencing applications such as targeted sequencing of QTL intervals for candidate gene discovery. Assuming the current Roche/454 Titanium sequencing limitations, a 12 Mbp region could be re-sequenced with a full plate of linear reads and a half plate of paired-end reads, yielding 15L-5P coverage after read pre-processing. Our simulation suggests that massively over-sequencing may not improve accuracy. Our scoring measures can be used generally to evaluate and compare results of simulated genome assemblies.</p

    Unusual patterns of genetic diversity and gene expression in the maize genome

    Get PDF
    Maize (Zea mays subsp mays) was domesticated from teosinte (Z. mays subsp parviglumis) in southern Mexico between 6,000 and 9,000 years ago (Matsuoka et al., 2002; Sluyter and Dominguez, 2006). Both domestication and crop improvement involved selection of specific alleles at genes, resulting in reduced genetic diversity in the genes controlling key morphological and agronomic traits. This is termed the genetic bottleneck . We coupled the approaches of molecular population genetics with reverse genetics to associate genes with phenotypes. More than 16,000 primer pairs were subjected to gel and temperature gradient capillary electrophoresis (TGCE)-based assays. This screen identified 73 genes that contain zero sequence diversity (ZSD) fragments. They are monomorphic among 59 diverse maize lines, but polymorphic among 9 teosinte lines. Therefore, they are candidate domestication-related genes. Using 3,000 Mutator-insertion lines, a large-scaled screen for Mu transposon insertions in domestication candidate genes was performed. Our data supports the bottleneck model and at least 0.5% of maize genes are under selection during maize domestication based on our test. Phenotypic analysis of plants homozygous for the Mu-insertion alleles of the domestication candidate genes are underway. We also detected two other interesting features in the maize genome: First, the existence of nearly identical paralogs (NIPs) and orphan genes. Our data suggested that at least ~1% of maize genes are members of a NIP family, defined as paralogous genes that exhibit \u3e=98% identity (Emrich et al. 2007). Members of a NIP family are expressed and in some instances, members of a given NIP family exhibit differential patterns of gene expression. NIPs may have played important roles during the evolution and domestication of maize. NIPs expression data supports subfunctionalization model for duplicated genes. Besides, NIPs were also detected in other maize inbred lines. Second, ~400 orphan transcripts were captured via 454 sequencing of cDNA, isolated using laser capture microdissection (LCM) from functionally important shoot apical meristems (SAMs). The expression of 27 randomly picked cDNA was validated via RT-PCR. Expression of 20 of these SAM-expressed genes (~74%) were not detected in meristem-rich immature ears. 454 sequenced cDNAs, isolated by LCM from B73 and Mo17 SAM, enabled us to detect gene-associated SNPs, which escaped previous tests

    The ALDH gene superfamily of Arabidopsis

    No full text

    Physical and Genetic Structure of the Maize Genome Reflects Its Complex Evolutionary History

    Get PDF
    Maize (Zea mays L.) is one of the most important cereal crops and a model for the study of genetics, evolution, and domestication. To better understand maize genome organization and to build a framework for genome sequencing, we constructed a sequence-ready fingerprinted contig-based physical map that covers 93.5% of the genome, of which 86.1% is aligned to the genetic map. The fingerprinted contig map contains 25,908 genic markers that enabled us to align nearly 73% of the anchored maize genome to the rice genome. The distribution pattern of expressed sequence tags correlates to that of recombination. In collinear regions, 1 kb in rice corresponds to an average of 3.2 kb in maize, yet maize has a 6-fold genome size expansion. This can be explained by the fact that most rice regions correspond to two regions in maize as a result of its recent polyploid origin. Inversions account for the majority of chromosome structural variations during subsequent maize diploidization. We also find clear evidence of ancient genome duplication predating the divergence of the progenitors of maize and rice. Reconstructing the paleoethnobotany of the maize genome indicates that the progenitors of modern maize contained ten chromosomes

    Assembling genomes using short-read sequencing technology

    Get PDF
    Short-read sequencing technology can bring gigabase genome assemblies in under a million dollars

    Complete Chloroplast Genome Sequence of a Major Allogamous Forage Species, Perennial Ryegrass (Lolium perenne L.)

    Get PDF
    Lolium perenne L. (perennial ryegrass) is globally one of the most important forage and grassland crops. We sequenced the chloroplast (cp) genome of Lolium perenne cultivar Cashel. The L. perenne cp genome is 135 282 bp with a typical quadripartite structure. It contains genes for 76 unique proteins, 30 tRNAs and four rRNAs. As in other grasses, the genes accD, ycf1 and ycf2 are absent. The genome is of average size within its subfamily Pooideae and of medium size within the Poaceae. Genome size differences are mainly due to length variations in non-coding regions. However, considerable length differences of 1–27 codons in comparison of L. perenne to other Poaceae and 1–68 codons among all Poaceae were also detected. Within the cp genome of this outcrossing cultivar, 10 insertion/deletion polymorphisms and 40 single nucleotide polymorphisms were detected. Two of the polymorphisms involve tiny inversions within hairpin structures. By comparing the genome sequence with RT–PCR products of transcripts for 33 genes, 31 mRNA editing sites were identified, five of them unique to Lolium. The cp genome sequence of L. perenne is available under Accession number AM777385 at the European Molecular Biology Laboratory, National Center for Biotechnology Information and DNA DataBank of Japan

    SNP discovery via 454 transcriptome sequencing

    Get PDF
    A massively parallel pyro-sequencing technology commercialized by 454 Life Sciences Corporation was used to sequence the transcriptomes of shoot apical meristems isolated from two inbred lines of maize using laser capture microdissection (LCM). A computational pipeline that uses the POLYBAYES polymorphism detection system was adapted for 454 ESTs and used to detect SNPs (single nucleotide polymorphisms) between the two inbred lines. Putative SNPs were computationally identified using 260 000 and 280 000 454 ESTs from the B73 and Mo17 inbred lines, respectively. Over 36 000 putative SNPs were detected within 9980 unique B73 genomic anchor sequences (MAGIs). Stringent post-processing reduced this number to > 7000 putative SNPs. Over 85% (94/110) of a sample of these putative SNPs were successfully validated by Sanger sequencing. Based on this validation rate, this pilot experiment conservatively identified > 4900 valid SNPs within > 2400 maize genes. These results demonstrate that 454-based transcriptome sequencing is an excellent method for the high-throughput acquisition of gene-associated SNPs

    Mu Transposon Insertion Sites and Meiotic Recombination Events Co-Localize with Epigenetic Marks for Open Chromatin across the Maize Genome

    Get PDF
    The Mu transposon system of maize is highly active, with each of the ∼50–100 copies transposing on average once each generation. The approximately one dozen distinct Mu transposons contain highly similar ∼215 bp terminal inverted repeats (TIRs) and generate 9-bp target site duplications (TSDs) upon insertion. Using a novel genome walking strategy that uses these conserved TIRs as primer binding sites, Mu insertion sites were amplified from Mu stocks and sequenced via 454 technology. 94% of ∼965,000 reads carried Mu TIRs, demonstrating the specificity of this strategy. Among these TIRs, 21 novel Mu TIRs were discovered, revealing additional complexity of the Mu transposon system. The distribution of >40,000 non-redundant Mu insertion sites was strikingly non-uniform, such that rates increased in proportion to distance from the centromere. An identified putative Mu transposase binding consensus site does not explain this non-uniformity. An integrated genetic map containing more than 10,000 genetic markers was constructed and aligned to the sequence of the maize reference genome. Recombination rates (cM/Mb) are also strikingly non-uniform, with rates increasing in proportion to distance from the centromere. Mu insertion site frequencies are strongly correlated with recombination rates. Gene density does not fully explain the chromosomal distribution of Mu insertion and recombination sites, because pronounced preferences for the distal portion of chromosome are still observed even after accounting for gene density. The similarity of the distributions of Mu insertions and meiotic recombination sites suggests that common features, such as chromatin structure, are involved in site selection for both Mu insertion and meiotic recombination. The finding that Mu insertions and meiotic recombination sites both concentrate in genomic regions marked with epigenetic marks of open chromatin provides support for the hypothesis that open chromatin enhances rates of both Mu insertion and meiotic recombination
    corecore