314 research outputs found

    Assembly and analysis of complex plant genomes

    Get PDF
    Concurrent advances in high-throughput sequencing and assembly have led to the completion of many complex genomes. Even so, these assemblies require substantial computational resources. In this dissertation, we present a massively parallel approach that scales to thousands of processors without duplicating the biological expertise present in conventional assembly software.;Additional bioinformatics techniques were required to accurately assemble the maize genome including novel repeat detection, and the resulting framework has been strongly supported by maize experimental data. More recently, this framework has been generalized for fruit fly, sorghum, soybean and environmental sequence assemblies.;Questions in plant genome analysis were also addressed. For example, we have discovered an estimated 350 orphan maize genes and have shown that approximately 1% of all maize genes were recently duplicated, many of which into at least two functional copies. LCM-454 sequencing is introduced and analyses that indicate this approach can discover rare, potentially tissue-specific transcripts and thousands of SNPs will be presented.;This dissertation combines high performance computing, computational biology and high-throughput sequencing for our ongoing work on the maize genome project. We conclude by describing how these contributions can be useful for any species, including non-model organisms that are unlikely to be fully sequenced

    Haplotype and minimum-chimerism consensus determination using short sequence data

    Full text link

    A statistical approach to finding overlooked genetic associations

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Complexity and noise in expression quantitative trait loci (eQTL) studies make it difficult to distinguish potential regulatory relationships among the many interactions. The predominant method of identifying eQTLs finds associations that are significant at a genome-wide level. The vast number of statistical tests carried out on these data make false negatives very likely. Corrections for multiple testing error render genome-wide eQTL techniques unable to detect modest regulatory effects.</p> <p>We propose an alternative method to identify eQTLs that builds on traditional approaches. In contrast to genome-wide techniques, our method determines the significance of an association between an expression trait and a locus with respect to the set of all associations to the expression trait. The use of this specific information facilitates identification of expression traits that have an expression profile that is characterized by a single exceptional association to a locus.</p> <p>Our approach identifies expression traits that have exceptional associations regardless of the genome-wide significance of those associations. This property facilitates the identification of possible false negatives for genome-wide significance. Further, our approach has the property of prioritizing expression traits that are affected by few strong associations. Expression traits identified by this method may warrant additional study because their expression level may be affected by targeting genes near a single locus.</p> <p>Results</p> <p>We demonstrate our method by identifying eQTL hotspots in <it>Plasmodium falciparum </it>(malaria) and <it>Saccharomyces cerevisiae </it>(yeast). We demonstrate the prioritization of traits with few strong genetic effects through Gene Ontology (GO) analysis of Yeast. Our results are strongly consistent with results gathered using genome-wide methods and identify additional hotspots and eQTLs.</p> <p>Conclusions</p> <p>New eQTLs and hotspots found with this method may represent regions of the genome or biological processes that are controlled through few relatively strong genetic interactions. These points of interest warrant experimental investigation.</p

    VectorBase: improvements to a bioinformatics resource for invertebrate vector genomics.

    Get PDF
    VectorBase (http://www.vectorbase.org) is a NIAID-supported bioinformatics resource for invertebrate vectors of human pathogens. It hosts data for nine genomes: mosquitoes (three Anopheles gambiae genomes, Aedes aegypti and Culex quinquefasciatus), tick (Ixodes scapularis), body louse (Pediculus humanus), kissing bug (Rhodnius prolixus) and tsetse fly (Glossina morsitans). Hosted data range from genomic features and expression data to population genetics and ontologies. We describe improvements and integration of new data that expand our taxonomic coverage. Releases are bi-monthly and include the delivery of preliminary data for emerging genomes. Frequent updates of the genome browser provide VectorBase users with increasing options for visualizing their own high-throughput data. One major development is a new population biology resource for storing genomic variations, insecticide resistance data and their associated metadata. It takes advantage of improved ontologies and controlled vocabularies. Combined, these new features ensure timely release of multiple types of data in the public domain while helping overcome the bottlenecks of bioinformatics and annotation by engaging with our user community

    Examination of the genetic basis for sexual dimorphism in the Aedes aegypti (dengue vector mosquito) pupal brain

    Get PDF
    BACKGROUND: Most animal species exhibit sexually dimorphic behaviors, many of which are linked to reproduction. A number of these behaviors, including blood feeding in female mosquitoes, contribute to the global spread of vector-borne illnesses. However, knowledge concerning the genetic basis of sexually dimorphic traits is limited in any organism, including mosquitoes, especially with respect to differences in the developing nervous system. METHODS: Custom microarrays were used to examine global differences in female vs. male gene expression in the developing pupal head of the dengue vector mosquito, Aedes aegypti. The spatial expression patterns of a subset of differentially expressed transcripts were examined in the developing female vs. male pupal brain through in situ hybridization experiments. Small interfering RNA (siRNA)-mediated knockdown studies were used to assess the putative role of Doublesex, a terminal component of the sex determination pathway, in the regulation of sex-specific gene expression observed in the developing pupal brain. RESULTS: Transcripts (2,527), many of which were linked to proteolysis, the proteasome, metabolism, catabolic, and biosynthetic processes, ion transport, cell growth, and proliferation, were found to be differentially expressed in A. aegypti female vs. male pupal heads. Analysis of the spatial expression patterns for a subset of dimorphically expressed genes in the pupal brain validated the data set and also facilitated the identification of brain regions with dimorphic gene expression. In many cases, dimorphic gene expression localized to the optic lobe. Sex-specific differences in gene expression were also detected in the antennal lobe and mushroom body. siRNA-mediated gene targeting experiments demonstrated that Doublesex, a transcription factor with consensus binding sites located adjacent to many dimorphically expressed transcripts that function in neural development, is required for regulation of sex-specific gene expression in the developing A. aegypti brain. CONCLUSIONS: These studies revealed sex-specific gene expression profiles in the developing A. aegypti pupal head and identified Doublesex as a key regulator of sexually dimorphic gene expression during mosquito neural development

    High-throughput cis-regulatory element discovery in the vector mosquito Aedes aegypti

    Get PDF
    BACKGROUND: Despite substantial progress in mosquito genomic and genetic research, few cis-regulatory elements (CREs), DNA sequences that control gene expression, have been identified in mosquitoes or other non-model insects. Formaldehyde-assisted isolation of regulatory elements paired with DNA sequencing, FAIRE-seq, is emerging as a powerful new high-throughput tool for global CRE discovery. FAIRE results in the preferential recovery of open chromatin DNA fragments that are not bound by nucleosomes, an evolutionarily conserved indicator of regulatory activity, which are then sequenced. Despite the power of the approach, FAIRE-seq has not yet been applied to the study of non-model insects. In this investigation, we utilized FAIRE-seq to profile open chromatin and identify likely regulatory elements throughout the genome of the human disease vector mosquito Aedes aegypti. We then assessed genetic variation in the regulatory elements of dengue virus susceptible (Moyo-S) and refractory (Moyo-R) mosquito strains. RESULTS: Analysis of sequence data obtained through next generation sequencing of FAIRE DNA isolated from A. aegypti embryos revealed >121,000 FAIRE peaks (FPs), many of which clustered in the 1 kb 5' upstream flanking regions of genes known to be expressed at this stage. As expected, known transcription factor consensus binding sites were enriched in the FPs, and of these FoxA1, Hunchback, Gfi, Klf4, MYB/ph3 and Sox9 are most predominant. All of the elements tested in vivo were confirmed to drive gene expression in transgenic Drosophila reporter assays. Of the >13,000 single nucleotide polymorphisms (SNPs) recently identified in dengue virus-susceptible and refractory mosquito strains, 3365 were found to map to FPs. CONCLUSION: FAIRE-seq analysis of open chromatin in A. aegypti permitted genome-wide discovery of CREs. The results of this investigation indicate that FAIRE-seq is a powerful tool for identification of regulatory DNA in the genomes of non-model organisms, including human disease vector mosquitoes

    Population-level transcriptome sequencing of nonmodel organisms Erynnis propertius and Papilio zelicaon

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Several recent studies have demonstrated the use of Roche 454 sequencing technology for <it>de novo </it>transcriptome analysis. Low error rates and high coverage also allow for effective SNP discovery and genetic diversity estimates. However, genetically diverse datasets, such as those sourced from natural populations, pose challenges for assembly programs and subsequent analysis. Further, estimating the effectiveness of transcript discovery using Roche 454 transcriptome data is still a difficult task.</p> <p>Results</p> <p>Using the Roche 454 FLX Titanium platform, we sequenced and assembled larval transcriptomes for two butterfly species: the Propertius duskywing, <it>Erynnis propertius </it>(Lepidoptera: Hesperiidae) and the Anise swallowtail, <it>Papilio zelicaon </it>(Lepidoptera: Papilionidae). The Expressed Sequence Tags (ESTs) generated represent a diverse sample drawn from multiple populations, developmental stages, and stress treatments.</p> <p>Despite this diversity, > 95% of the ESTs assembled into long (> 714 bp on average) and highly covered (> 9.6× on average) contigs. To estimate the effectiveness of transcript discovery, we compared the number of bases in the hit region of unigenes (contigs and singletons) to the length of the best match silkworm (<it>Bombyx mori</it>) protein--this "ortholog hit ratio" gives a close estimate on the amount of the transcript discovered relative to a model lepidopteran genome. For each species, we tested two assembly programs and two parameter sets; although CAP3 is commonly used for such data, the assemblies produced by Celera Assembler with modified parameters were chosen over those produced by CAP3 based on contig and singleton counts as well as ortholog hit ratio analysis. In the final assemblies, 1,413 <it>E. propertius </it>and 1,940 <it>P. zelicaon </it>unigenes had a ratio > 0.8; 2,866 <it>E. propertius </it>and 4,015 <it>P. zelicaon </it>unigenes had a ratio > 0.5.</p> <p>Conclusions</p> <p>Ultimately, these assemblies and SNP data will be used to generate microarrays for ecoinformatics examining climate change tolerance of different natural populations. These studies will benefit from high quality assemblies with few singletons (less than 26% of bases for each assembled transcriptome are present in unassembled singleton ESTs) and effective transcript discovery (over 6,500 of our putative orthologs cover at least 50% of the corresponding model silkworm gene).</p
    corecore