712 research outputs found

    Current challenges in de novo plant genome sequencing and assembly

    Get PDF
    ABSTRACT: Genome sequencing is now affordable, but assembling plant genomes de novo remains challenging. We assess the state of the art of assembly and review the best practices for the community

    Sequencing the maize genome

    Get PDF
    Sequencing of complex genomes can be accomplished by enriching shotgun libraries for genes. In maize, gene-enrichment by copy-number normalization (high C(0)t) and methylation filtration (MF) have been used to generate up to two-fold coverage of the gene-space with less than 1 million sequencing reads. Simulations using sequenced bacterial artificial chromosome (BAC) clones predict that 5x coverage of gene-rich regions, accompanied by less than 1x coverage of subclones from BAC contigs, will generate high-quality mapped sequence that meets the needs of geneticists while accommodating unusually high levels of structural polymorphism. By sequencing several inbred strains, we propose a strategy for capturing this polymorphism to investigate hybrid vigor or heterosis

    Oxford Nanopore sequencing, hybrid error correction, and de novo assembly of a eukaryotic genome

    Get PDF
    Monitoring the progress of DNA molecules through a membrane pore has been postulated as a method for sequencing DNA for several decades. Recently, a nanopore-based sequencing instrument, the Oxford Nanopore MinION, has become available, and we used this for sequencing the Saccharomyces cerevisiae genome. To make use of these data, we developed a novel open-source hybrid error correction algorithm Nanocorr specifically for Oxford Nanopore reads, because existing packages were incapable of assembling the long read lengths (5-50 kbp) at such high error rates (between approximately 5% and 40% error). With this new method, we were able to perform a hybrid error correction of the nanopore reads using complementary MiSeq data and produce a de novo assembly that is highly contiguous and accurate: The contig N50 length is more than ten times greater than an Illumina-only assembly (678 kb versus 59.9 kbp) and has >99.88% consensus identity when compared to the reference. Furthermore, the assembly with the long nanopore reads presents a much more complete representation of the features of the genome and correctly assembles gene cassettes, rRNAs, transposable elements, and other genomic features that were almost entirely absent in the Illumina-only assembly

    Validation and assessment of variant calling pipelines for next-generation sequencing

    Get PDF
    Background: The processing and analysis of the large scale data generated by next-generation sequencing (NGS) experiments is challenging and is a burgeoning area of new methods development. Several new bioinformatics tools have been developed for calling sequence variants from NGS data. Here, we validate the variant calling of these tools and compare their relative accuracy to determine which data processing pipeline is optimal. Results: We developed a unified pipeline for processing NGS data that encompasses four modules: mapping, filtering, realignment and recalibration, and variant calling. We processed 130 subjects from an ongoing whole exome sequencing study through this pipeline. To evaluate the accuracy of each module, we conducted a series of comparisons between the single nucleotide variant (SNV) calls from the NGS data and either gold-standard Sanger sequencing on a total of 700 variants or array genotyping data on a total of 9,935 single-nucleotide polymorphisms. A head to head comparison showed that Genome Analysis Toolkit (GATK) provided more accurate calls than SAMtools (positive predictive value of 92.55% vs. 80.35%, respectively). Realignment of mapped reads and recalibration of base quality scores before SNV calling proved to be crucial to accurate variant calling. GATK HaplotypeCaller algorithm for variant calling outperformed the UnifiedGenotype algorithm. We also showed a relationship between mapping quality, read depth and allele balance, and SNV call accuracy. However, if best practices are used in data processing, then additional filtering based on these metrics provides little gains and accuracies of >99% are achievable. Conclusions: Our findings will help to determine the best approach for processing NGS data to confidently call variants for downstream analyses. To enable others to implement and replicate our results, all of our codes are freely available at http://metamoodics.org/wes

    INCOME INEQUALITY AND GROWTH: PROBLEMS WITH THE ORTHODOX APPROACH

    Get PDF
    Abstract. This paper discusses the main issues about increasing inequality, whether it matters and its impact on economic activity and growth. It starts by briefly considering the empirical evidence of the share of income going to the top one percent since 1945 in the advanced countries. It then considers whether this represents an increase in the productivity of the top one percent or merely an extraction of economic rent. The empirical evidence suggests the latter is generally the case and, as a consequence, there is not likely to be a trade-off between greater income equality and efficiency (the latter being reflected in a lower economic growth rate). This is reinforced by considering the mainstream explanation of the distribution of income and by a consideration of the argument as to whether labor is paid its marginal product, which is found to be problematic. Hence, some reservations about the use of the aggregate production function are raised. The paper turns next to the question of whether or not a greater degree of inequality causes a slower economic growth, both for the advanced and the developing countries. It next considers if the increasing gap between the top one percent and the rest of the income distribution has been either responsible for, or exacerbated, the Great Recession. It concludes that the degree of inequality is an important factor in determining economic activity and one that has been ignored for too long in macroeconomics

    The crystal structure and electrical properties of the oxide ion conductor Ba3WNbO8.5

    Get PDF
    This research was supported by the Northern Research Partnership and the University of Aberdeen. We also acknowledge Science and Technology Facilities Council (STFC) for provision of beamtime at ISIS.Peer reviewedPostprin

    Integrated RNA-seq and sRNA-seq analysis identifies novel nitrate-responsive genes in Arabidopsis thaliana roots

    Get PDF
    Background:Nitrate and other nitrogen metabolites can act as signals that regulate global gene expression in plants. Adaptive changes in plant morphology and physiology triggered by changes in nitrate availability are partly explained by these changes in gene expression. Despite several genome-wide efforts to identify nitrate-regulated genes, no comprehensive study of the Arabidopsis root transcriptome under contrasting nitrate conditions has been carried out. Results:In this work, we employed the Illumina high throughput sequencing technology to perform an integrated analysis of the poly-A + enriched and the small RNA fractions of the Arabidopsis thaliana root transcriptome in response to nitrate treatments. Our sequencing strategy identified new nitrate-regulated genes including 40 genes not represented in the ATH1 Affymetrix GeneChip, a novel nitrate-responsive antisense transcript and a new nitrate responsive miRNA/TARGET module consisting of a novel microRNA, miR5640 and its target, AtPPC3. Conclusions:Sequencing of small RNAs and mRNAs uncovered new genes, and enabled us to develop new hypotheses for nitrate regulation and coordination of carbon and nitrogen metabolism

    Syntenic relationships between Medicago truncatula and Arabidopsis reveal extensive divergence of genome organization

    Get PDF
    Arabidopsis and Medicago truncatula represent sister clades within the dicot subclass Rosidae. We used genetic map-based and bacterial artificial chromosome sequence-based approaches to estimate the level of synteny between the genomes of these model plant species. Mapping of 82 tentative orthologous gene pairs reveals a lack of extended macrosynteny between the two genomes, although marker collinearity is frequently observed over small genetic intervals. Divergence estimates based on non-synonymous nucleotide substitutions suggest that a majority of the genes under analysis have experienced duplication in Arabidopsis subsequent to divergence of the two genomes, potentially confounding synteny analysis. Moreover, in cases of localized synteny, genetically linked loci in M. truncatula often share multiple points of synteny with Arabidopsis; this latter observation is consistent with the large number of segmental duplications that compose the Arabidopsis genome. More detailed analysis, based on complete sequencing and annotation of three M. truncatula bacterial artificial chromosome contigs suggests that the two genomes are related by networks of microsynteny that are often highly degenerate. In some cases, the erosion of microsynteny could be ascribed to the selective gene loss from duplicated loci, whereas in other cases, it is due to the absence of close homologs of M. truncatula genes in Arabidopsis
    corecore