72 research outputs found
Global repeat discovery and estimation of genomic copy number in a large, complex genome using a high-throughput 454 sequence survey
<p>Abstract</p> <p>Background</p> <p>Extensive computational and database tools are available to mine genomic and genetic databases for model organisms, but little genomic data is available for many species of ecological or agricultural significance, especially those with large genomes. Genome surveys using conventional sequencing techniques are powerful, particularly for detecting sequences present in many copies per genome. However these methods are time-consuming and have potential drawbacks. High throughput 454 sequencing provides an alternative method by which much information can be gained quickly and cheaply from high-coverage surveys of genomic DNA.</p> <p>Results</p> <p>We sequenced 78 million base-pairs of randomly sheared soybean DNA which passed our quality criteria. Computational analysis of the survey sequences provided global information on the abundant repetitive sequences in soybean. The sequence was used to determine the copy number across regions of large genomic clones or contigs and discover higher-order structures within satellite repeats. We have created an annotated, online database of sequences present in multiple copies in the soybean genome. The low bias of pyrosequencing against repeat sequences is demonstrated by the overall composition of the survey data, which matches well with past estimates of repetitive DNA content obtained by DNA re-association kinetics (Cot analysis).</p> <p>Conclusion</p> <p>This approach provides a potential aid to conventional or shotgun genome assembly, by allowing rapid assessment of copy number in any clone or clone-end sequence. In addition, we show that partial sequencing can provide access to partial protein-coding sequences.</p
Sympatric ecological speciation meets pyrosequencing: sampling the transcriptome of the apple maggot Rhagoletis pomonella
Background
The full power of modern genetics has been applied to the study of speciation in only a small handful of genetic model species - all of which speciated allopatrically. Here we report the first large expressed sequence tag (EST) study of a candidate for ecological sympatric speciation, the apple maggot Rhagoletis pomonella, using massively parallel pyrosequencing on the Roche 454-FLX platform. To maximize transcript diversity we created and sequenced separate libraries from larvae, pupae, adult heads, and headless adult bodies. Results
We obtained 239,531 sequences which assembled into 24,373 contigs. A total of 6810 unique protein coding genes were identified among the contigs and long singletons, corresponding to 48% of all known Drosophila melanogaster protein-coding genes. Their distribution across GO classes suggests that we have obtained a representative sample of the transcriptome. Among these sequences are many candidates for potential R. pomonella speciation genes (or barrier genes ) such as those controlling chemosensory and life-history timing processes. Furthermore, we identified important marker loci including more than 40,000 single nucleotide polymorphisms (SNPs) and over 100 microsatellites. An initial search for SNPs at which the apple and hawthorn host races differ suggested at least 75 loci warranting further work. We also determined that developmental expression differences remained even after normalization; transcripts expected to show different expression levels between larvae and pupae in D. melanogaster also did so in R. pomonella. Preliminary comparative analysis of transcript presences and absences revealed evidence of gene loss in Drosophila and gain in the higher dipteran clade Schizophora. Conclusions
These data provide a much needed resource for exploring mechanisms of divergence in this important model for sympatric ecological speciation. Our description of ESTs from a substantial portion of the R. pomonella transcriptome will facilitate future functional studies of candidate genes for olfaction and diapause-related life history timing, and will enable large scale expression studies. Similarly, the identification of new SNP and microsatellite markers will facilitate future population and quantitative genetic studies of divergence between the apple and hawthorn-infesting host races
“Hit-and-Run” transcription: de novo transcription initiated by a transient bZIP1 “hit” persists after the “run”
BACKGROUND: Dynamic transcriptional regulation is critical for an organism’s response to environmental signals and yet remains elusive to capture. Such transcriptional regulation is mediated by master transcription factors (TF) that control large gene regulatory networks. Recently, we described a dynamic mode of TF regulation named “hit-and-run”. This model proposes that master TF can interact transiently with a set of targets, but the transcription of these transient targets continues after the TF dissociation from the target promoter. However, experimental evidence validating active transcription of the transient TF-targets is still lacking. RESULTS: Here, we show that active transcription continues after transient TF-target interactions by tracking de novo synthesis of RNAs made in response to TF nuclear import. To do this, we introduced an affinity-labeled 4-thiouracil (4tU) nucleobase to specifically isolate newly synthesized transcripts following conditional TF nuclear import. Thus, we extended the TARGET system (Transient Assay Reporting Genome-wide Effects of Transcription factors) to include 4tU-labeling and named this new technology TARGET-tU. Our proof-of-principle example is the master TF Basic Leucine Zipper 1 (bZIP1), a central integrator of metabolic signaling in plants. Using TARGET-tU, we captured newly synthesized mRNAs made in response to bZIP1 nuclear import at a time when bZIP1 is no longer detectably bound to its target. Thus, the analysis of de novo transcripomics demonstrates that bZIP1 may act as a catalyst TF to initiate a transcriptional complex (“hit”), after which active transcription by RNA polymerase continues without the TF being bound to the gene promoter (“run”). CONCLUSION: Our findings provide experimental proof for active transcription of transient TF-targets supporting a “hit-and-run” mode of action. This dynamic regulatory model allows a master TF to catalytically propagate rapid and broad transcriptional responses to changes in environment. Thus, the functional read-out of de novo transcripts produced by transient TF-target interactions allowed us to capture new models for genome-wide transcriptional control. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12864-016-2410-2) contains supplementary material, which is available to authorized users
Transient genome-wide interactions of the master transcription factor NLP7 initiate a rapid nitrogen-response cascade
Dynamic reprogramming of gene regulatory networks (GRNs) enables organisms to rapidly respond to environmental perturbation. However, the underlying transient interactions between transcription factors (TFs) and genome-wide targets typically elude biochemical detection. Here, we capture both stable and transient TF-target interactions genome-wide within minutes after controlled TF nuclear import using time-series chromatin immunoprecipitation (ChIP-seq) and/or DNA adenine methyltransferase identification (DamID-seq). The transient TF-target interactions captured uncover the early mode-of-action of NIN-LIKE PROTEIN 7 (NLP7), a master regulator of the nitrogen signaling pathway in plants. These transient NLP7 targets captured in root cells using temporal TF perturbation account for 50% of NLP7-regulated genes not detectably bound by NLP7 in planta. Rapid and transient NLP7 binding activates early nitrogen response TFs, which we validate to amplify the NLP7-initiated transcriptional cascade. Our approaches to capture transient TF-target interactions genome-wide can be applied to validate dynamic GRN models for any pathway or organism of interest. Conventional methods cannot reveal transient transcription factors (TFs) and targets interactions. Here, Alvarez et al. capture both stable and transient TF-target interactions by time-series ChIP-seq and/or DamID-seq in a cell-based TF perturbation system and show NLP7 as a master TF to initiate a rapid nitrogen-response cascade
Genomic and small RNA sequencing of Miscanthus × giganteus shows the utility of sorghum as a reference genome sequence for Andropogoneae grasses
Genomic data together with sequencing of tissue specific small RNA libraries reveals insights into the genome content, small RNA repertoire and evolutionary origins of the grass Miscanthus × giganteus
A framework genetic map for \u3ci\u3eMiscanthus sinensis\u3c/i\u3e from RNAseq-based markers shows recent tetraploidy
Background: Miscanthus (subtribe Saccharinae, tribe Andropogoneae, family Poaceae) is a genus of temperate perennial C4 grasses whose high biomass production makes it, along with its close relatives sugarcane and sorghum, attractive as a biofuel feedstock. The base chromosome number of Miscanthus (x = 19) is different from that of other Saccharinae and approximately twice that of the related Sorghum bicolor (x = 10), suggesting largescale duplications may have occurred in recent ancestors of Miscanthus. Owing to the complexity of the Miscanthus genome and the complications of self-incompatibility, a complete genetic map with a high density of markers has not yet been developed.
Results: We used deep transcriptome sequencing (RNAseq) from two M. sinensis accessions to define 1536 single nucleotide variants (SNVs) for a GoldenGate™ genotyping array, and found that simple sequence repeat (SSR) markers defined in sugarcane are often informative in M. sinensis. A total of 658 SNP and 210 SSR markers were validated via segregation in a full sibling F1 mapping population. Using 221 progeny from this mapping population, we constructed a genetic map for M. sinensis that resolves into 19 linkage groups, the haploid chromosome number expected from cytological evidence. Comparative genomic analysis documents a genomewide duplication in Miscanthus relative to Sorghum bicolor, with subsequent insertional fusion of a pair of chromosomes. The utility of the map is confirmed by the identification of two paralogous C4-pyruvate, phosphate dikinase (C4-PPDK) loci in Miscanthus, at positions syntenic to the single orthologous gene in Sorghum.
Conclusions: The genus Miscanthus experienced an ancestral tetraploidy and chromosome fusion prior to its diversification, but after its divergence from the closely related sugarcane clade. The recent timing of this tetraploidy complicates discovery and mapping of genetic markers for Miscanthus species, since alleles and fixed differences between paralogs are comparable. These difficulties can be overcome by careful analysis of segregation patterns in a mapping population and genotyping of doubled haploids. The genetic map for Miscanthus will be useful in biological discovery and breeding efforts to improve this emerging biofuel crop, and also provide a valuable resource for understanding genomic responses to tetraploidy and chromosome fusion
A framework genetic map for Miscanthus sinensis from RNAseq-based markers shows recent tetraploidy
Abstract Background Miscanthus (subtribe Saccharinae, tribe Andropogoneae, family Poaceae) is a genus of temperate perennial C4 grasses whose high biomass production makes it, along with its close relatives sugarcane and sorghum, attractive as a biofuel feedstock. The base chromosome number of Miscanthus (x = 19) is different from that of other Saccharinae and approximately twice that of the related Sorghum bicolor (x = 10), suggesting large-scale duplications may have occurred in recent ancestors of Miscanthus. Owing to the complexity of the Miscanthus genome and the complications of self-incompatibility, a complete genetic map with a high density of markers has not yet been developed. Results We used deep transcriptome sequencing (RNAseq) from two M. sinensis accessions to define 1536 single nucleotide variants (SNVs) for a GoldenGate™ genotyping array, and found that simple sequence repeat (SSR) markers defined in sugarcane are often informative in M. sinensis. A total of 658 SNP and 210 SSR markers were validated via segregation in a full sibling F1 mapping population. Using 221 progeny from this mapping population, we constructed a genetic map for M. sinensis that resolves into 19 linkage groups, the haploid chromosome number expected from cytological evidence. Comparative genomic analysis documents a genome-wide duplication in Miscanthus relative to Sorghum bicolor, with subsequent insertional fusion of a pair of chromosomes. The utility of the map is confirmed by the identification of two paralogous C4-pyruvate, phosphate dikinase (C4-PPDK) loci in Miscanthus, at positions syntenic to the single orthologous gene in Sorghum. Conclusions The genus Miscanthus experienced an ancestral tetraploidy and chromosome fusion prior to its diversification, but after its divergence from the closely related sugarcane clade. The recent timing of this tetraploidy complicates discovery and mapping of genetic markers for Miscanthus species, since alleles and fixed differences between paralogs are comparable. These difficulties can be overcome by careful analysis of segregation patterns in a mapping population and genotyping of doubled haploids. The genetic map for Miscanthus will be useful in biological discovery and breeding efforts to improve this emerging biofuel crop, and also provide a valuable resource for understanding genomic responses to tetraploidy and chromosome fusion
A unified nomenclature of NITRATE TRANSPORTER 1/PEPTIDE TRANSPORTER family members in plants
Members of the plant NITRATE TRANSPORTER 1/PEPTIDE TRANSPORTER (NRT1/PTR) family display protein sequence homology with the SLC15/PepT/PTR/POT family of peptide transporters in animals. In comparison to their animal and bacterial counterparts, these plant proteins transport a wide variety of substrates: nitrate, peptides, amino acids, dicarboxylates, glucosinolates, IAA, and ABA. The phylogenetic relationship of the members of the NRT1/PTR family in 31 fully sequenced plant genomes allowed the identification of unambiguous clades, defining eight subfamilies. The phylogenetic tree was used to determine a unified nomenclature of this family named NPF, for NRT1/PTR FAMILY. We propose that the members should be named accordingly: NPFX.Y, where X denotes the subfamily and Y the individual member within the species
Rapid Genotyping of Soybean Cultivars Using High Throughput Sequencing
Soybean (Glycine max) breeding involves improving commercially grown varieties by introgressing important agronomic traits from poor yielding accessions and/or wild relatives of soybean while minimizing the associated yield drag. Molecular markers associated with these traits are instrumental in increasing the efficiency of producing such crosses and Single Nucleotide Polymorphisms (SNPs) are particularly well suited for this task, owing to high density in the non-genic regions and thus increased likelihood of finding a tightly linked marker to a given trait. A rapid method to develop SNP markers that can differentiate specific loci between any two parents in soybean is thus highly desirable. In this study we investigate such a protocol for developing SNP markers between multiple soybean accessions and the reference Williams 82 genome. To restrict sampling frequency reduced representation libraries (RRLs) of genomic DNA were generated by restriction digestion followed by library construction. We chose to sequence four accessions Dowling (PI 548663), Dwight (PI 597386), Komata (PI200492) and PI 594538A for their agronomic importance as well as Williams 82 as a control
Genome composition of Glycine max and sequence diversity among cultivated and exotic accessions
Soybean is an economically important crop in large portions of the world. Incorporation of soybean in to the food system in many direct and indirect ways has vastly increased the nutritional quality of low cost and plant-based diets. Therefore an enormous amount of effort has gone into increasing the yield and nutritional quality of soybeans through plant breeding over hundreds of years. Despite this economic and nutritional importance the soybean genome was largely uncharacterized until 2004. Research described in here deals with the application of novel sequencing technologies to elucidate the soybean genome composition as an initial step to understanding the organization of the genome. Three, partially independent, studies were performed to study soybean genome content and diversity. The first study applied 454 pyrosequencing to obtain a low coverage survey that identifi ed repeat composition of the genome. The second study compiled data from numerous small RNA sequence datasets to follow the small RNA level regulation of soybean genes and the maintenance of genomic stability by siRNA mediated heterochromatization. The third study applied a reduced representation sampling strategy to identify SNP markers in the non-repetitive regions of the genome that can distinguish between soybean accessions. The method developed in this study should be generally applicable to other lines of soybean or even in other crop plants that have a fully sequenced genome. These studies, along with others reported simultaneously, and those that will be conducted in the near future, together enhance our understanding of soybean and increase our ability to manipulate this important species to our advantage
- …
