28,884 research outputs found
Recovering complete and draft population genomes from metagenome datasets.
Assembly of metagenomic sequence data into microbial genomes is of fundamental value to improving our understanding of microbial ecology and metabolism by elucidating the functional potential of hard-to-culture microorganisms. Here, we provide a synthesis of available methods to bin metagenomic contigs into species-level groups and highlight how genetic diversity, sequencing depth, and coverage influence binning success. Despite the computational cost on application to deeply sequenced complex metagenomes (e.g., soil), covarying patterns of contig coverage across multiple datasets significantly improves the binning process. We also discuss and compare current genome validation methods and reveal how these methods tackle the problem of chimeric genome bins i.e., sequences from multiple species. Finally, we explore how population genome assembly can be used to uncover biogeographic trends and to characterize the effect of in situ functional constraints on the genome-wide evolution
De novo assembly of transcriptomes from a B73 maize line introgressed with a QTL for resistance to gray leaf spot disease reveals a candidate allele of a lectin receptor-like kinase
Gray leaf spot (GLS) disease in maize, caused by the fungus Cercospora zeina, is a threat to maize production globally. Understanding the molecular basis for quantitative resistance to GLS is therefore important for food security. We developed a de novo assembly pipeline to identify candidate maize resistance genes. Near-isogenic maize lines with and without a QTL for GLS resistance on chromosome 10 from inbred CML444 were produced in the inbred B73 background. The B73-QTL line showed a 20% reduction in GLS disease symptoms compared to B73 in the field (p = 0.01). B73-QTL leaf samples from this field experiment conducted under GLS disease pressure were RNA sequenced. The reads that did not map to the B73 or C. zeina genomes were expected to contain novel defense genes and were de novo assembled. A total of 141 protein-coding sequences with B73-like or plant annotations were identified from the B73-QTL plants exposed to C. zeina. To determine whether candidate gene expression was induced by C. zeina, the RNAseq reads from C. zeina-challenged and control leaves were mapped to a master assembly of all of the B73-QTL reads, and differential gene expression analysis was conducted. Combining results from both bioinformatics approaches led to the identification of a likely candidate gene, which was a novel allele of a lectin receptor-like kinase named L-RLK-CML that (i) was induced by C. zeina, (ii) was positioned in the QTL region, and (iii) had functional domains for pathogen perception and defense signal transduction. The 817AA L-RLK-CML protein had 53 amino acid differences from its 818AA counterpart in B73. A second "B73-like" allele of L-RLK was expressed at a low level in B73-QTL. Gene copy-specific RT-qPCR confirmed that the l-rlk-cml transcript was the major product induced four-fold by C. zeina. Several other expressed defense-related candidates were identified, including a wall-associated kinase, two glutathione s-transferases, a chitinase, a glucan beta-glucosidase, a plasmodesmata callose-binding protein, several other receptor-like kinases, and components of calcium signaling, vesicular trafficking, and ethylene biosynthesis. This work presents a bioinformatics protocol for gene discovery from de novo assembled transcriptomes and identifies candidate quantitative resistance genes
Landscape of standing variation for tandem duplications in Drosophila yakuba and Drosophila simulans
We have used whole genome paired-end Illumina sequence data to identify
tandem duplications in 20 isofemale lines of D. yakuba, and 20 isofemale lines
of D. simulans and performed genome wide validation with PacBio long molecule
sequencing. We identify 1,415 tandem duplications that are segregating in D.
yakuba as well as 975 duplications in D. simulans, indicating greater variation
in D. yakuba. Additionally, we observe high rates of secondary deletions at
duplicated sites, with 8% of duplicated sites in D. simulans and 17% of sites
in D. yakuba modified with deletions. These secondary deletions are consistent
with the action of the large loop mismatch repair system acting to remove
polymorphic tandem duplication, resulting in rapid dynamics of gain and loss in
duplicated alleles and a richer substrate of genetic novelty than has been
previously reported. Most duplications are present in only single strains,
suggesting deleterious impacts are common. D. simulans shows larger numbers of
whole gene duplications in comparison to larger proportions of gene fragments
in D. yakuba. D. simulans displays an excess of high frequency variants on the
X chromosome, consistent with adaptive evolution through duplications on the D.
simulans X or demographic forces driving duplicates to high frequency. We
identify 78 chimeric genes in D. yakuba and 38 chimeric genes in D. simulans,
as well as 143 cases of recruited non-coding sequence in D. yakuba and 96 in D.
simulans, in agreement with rates of chimeric gene origination in D.
melanogaster. Together, these results suggest that tandem duplications often
result in complex variation beyond whole gene duplications that offers a rich
substrate of standing variation that is likely to contribute both to
detrimental phenotypes and disease, as well as to adaptive evolutionary change.Comment: Revised Version- Accepted at Molecular Biology and Evolutio
Recommended from our members
Ultraaccurate genome sequencing and haplotyping of single human cells.
Accurate detection of variants and long-range haplotypes in genomes of single human cells remains very challenging. Common approaches require extensive in vitro amplification of genomes of individual cells using DNA polymerases and high-throughput short-read DNA sequencing. These approaches have two notable drawbacks. First, polymerase replication errors could generate tens of thousands of false-positive calls per genome. Second, relatively short sequence reads contain little to no haplotype information. Here we report a method, which is dubbed SISSOR (single-stranded sequencing using microfluidic reactors), for accurate single-cell genome sequencing and haplotyping. A microfluidic processor is used to separate the Watson and Crick strands of the double-stranded chromosomal DNA in a single cell and to randomly partition megabase-size DNA strands into multiple nanoliter compartments for amplification and construction of barcoded libraries for sequencing. The separation and partitioning of large single-stranded DNA fragments of the homologous chromosome pairs allows for the independent sequencing of each of the complementary and homologous strands. This enables the assembly of long haplotypes and reduction of sequence errors by using the redundant sequence information and haplotype-based error removal. We demonstrated the ability to sequence single-cell genomes with error rates as low as 10-8 and average 500-kb-long DNA fragments that can be assembled into haplotype contigs with N50 greater than 7 Mb. The performance could be further improved with more uniform amplification and more accurate sequence alignment. The ability to obtain accurate genome sequences and haplotype information from single cells will enable applications of genome sequencing for diverse clinical needs
Distinct expression and methylation patterns for genes with different fates following a single whole-genome duplication in flowering plants
For most sequenced flowering plants, multiple whole-genome duplications (WGDs) are found. Duplicated genes following WGD often have different fates that can quickly disappear again, be retained for long(er) periods, or subsequently undergo small-scale duplications. However, how different expression, epigenetic regulation, and functional constraints are associated with these different gene fates following a WGD still requires further investigation due to successive WGDs in angiosperms complicating the gene trajectories. In this study, we investigate lotus (Nelumbo nucifera), an angiosperm with a single WGD during the K–pg boundary. Based on improved intraspecific-synteny identification by a chromosome-level assembly, transcriptome, and bisulfite sequencing, we explore not only the fundamental distinctions in genomic features, expression, and methylation patterns of genes with different fates after a WGD but also the factors that shape post-WGD expression divergence and expression bias between duplicates. We found that after a WGD genes that returned to single copies show the highest levels and breadth of expression, gene body methylation, and intron numbers, whereas the long-retained duplicates exhibit the highest degrees of protein–protein interactions and protein lengths and the lowest methylation in gene flanking regions. For those long-retained duplicate pairs, the degree of expression divergence correlates with their sequence divergence, degree in protein–protein interactions, and expression level, whereas their biases in expression level reflecting subgenome dominance are associated with the bias of subgenome fractionation. Overall, our study on the paleopolyploid nature of lotus highlights the impact of different functional constraints on gene fate and duplicate divergence following a single WGD in plant
- …