33 research outputs found

    Discovery of cis-elements between sorghum and rice using co-expression and evolutionary conservation

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The spatiotemporal regulation of gene expression largely depends on the presence and absence of <it>cis</it>-regulatory sites in the promoter. In the economically highly important grass family, our knowledge of transcription factor binding sites and transcriptional networks is still very limited. With the completion of the sorghum genome and the available rice genome sequence, comparative promoter analyses now allow genome-scale detection of conserved <it>cis</it>-elements.</p> <p>Results</p> <p>In this study, we identified thousands of phylogenetic footprints conserved between orthologous rice and sorghum upstream regions that are supported by co-expression information derived from three different rice expression data sets. In a complementary approach, <it>cis</it>-motifs were discovered by their highly conserved co-occurrence in syntenic promoter pairs. Sequence conservation and matches to known plant motifs support our findings. Expression similarities of gene pairs positively correlate with the number of motifs that are shared by gene pairs and corroborate the importance of similar promoter architectures for concerted regulation. This strongly suggests that these motifs function in the regulation of transcript levels in rice and, presumably also in sorghum.</p> <p>Conclusion</p> <p>Our work provides the first large-scale collection of <it>cis</it>-elements for rice and sorghum and can serve as a paradigm for <it>cis</it>-element analysis through comparative genomics in grasses in general.</p

    Evaluation and classification of RING-finger domains encoded by the Arabidopsis genome

    Get PDF
    BACKGROUND: In computational analysis, the RING-finger domain is one of the most frequently detected domains in the Arabidopsis proteome. In fact, it is more abundant in Arabidopsis than in other eukaryotic genomes. However, computational analysis might classify ambiguous domains of the closely related PHD and LIM motifs as RING domains by mistake. Thus, we set out to define an ordered set of Arabidopsis RING domains by evaluating predicted domains on the basis of recent structural data. RESULTS: Inspection of the proteome with a current InterPro release predicts 446 RING domains. We evaluated each detected domain and as a result eliminated 59 false positives. The remaining 387 domains were grouped by cluster analysis and according to their metal-ligand arrangement. We further defined novel patterns for additional computational analyses of the proteome. They were based on recent structural data that enable discrimination between the related RING, PHD and LIM domains. These patterns allow us to predict with different degrees of certainty whether a particular domain is indeed likely to form a RING finger. CONCLUSIONS: In summary, 387 domains have a significant potential to form a RING-type cross-brace structure. Many of these RING domains overlap with predicted PHD domains; however, the RING domain signature mostly prevails. Thus, the abundance of PHD domains in Arabidopsis has been significantly overestimated. Cluster analysis of the RING domains defines groups of proteins, which frequently show significant similarity outside the RING domain. These groups document a common evolutionary origin of their members and potentially represent genes of overlapping functionality

    START lipid/sterol-binding domains are amplified in plants and are predominantly associated with homeodomain transcription factors

    Get PDF
    BACKGROUND: In animals, steroid hormones regulate gene expression by binding to nuclear receptors. Plants lack genes for nuclear receptors, yet genetic evidence from Arabidopsis suggests developmental roles for lipids/sterols analogous to those in animals. In contrast to nuclear receptors, the lipid/sterol-binding StAR-related lipid transfer (START) protein domains are conserved, making them candidates for involvement in both animal and plant lipid/sterol signal transduction. RESULTS: We surveyed putative START domains from the genomes of Arabidopsis, rice, animals, protists and bacteria. START domains are more common in plants than in animals and in plants are primarily found within homeodomain (HD) transcription factors. The largest subfamily of HD-START proteins is characterized by an HD amino-terminal to a plant-specific leucine zipper with an internal loop, whereas in a smaller subfamily the HD precedes a classic leucine zipper. The START domains in plant HD-START proteins are not closely related to those of animals, implying collateral evolution to accommodate organism-specific lipids/sterols. Using crystal structures of mammalian START proteins, we show structural conservation of the mammalian phosphatidylcholine transfer protein (PCTP) START domain in plants, consistent with a common role in lipid transport and metabolism. We also describe putative START-domain proteins from bacteria and unicellular protists. CONCLUSIONS: The majority of START domains in plants belong to a novel class of putative lipid/sterol-binding transcription factors, the HD-START family, which is conserved across the plant kingdom. HD-START proteins are confined to plants, suggesting a mechanism by which lipid/sterol ligands can directly modulate transcription in plants

    Molecular characterisation of the STRUBBELIG-RECEPTOR FAMILY of genes encoding putative leucine-rich repeat receptor-like kinases in Arabidopsis thaliana

    Get PDF
    BACKGROUND: Receptor-like kinases are a prominent class of surface receptors that regulate many aspects of the plant life cycle. Despite recent advances the function of most receptor-like kinases remains elusive. Therefore, it is paramount to investigate these receptors. The task is complicated by the fact that receptor-like kinases belong to a large monophyletic family with many sub-clades. In general, functional analysis of gene family members by reverse genetics is often obscured by several issues, such as redundancy, subtle or difficult to detect phenotypes in mutants, or by decision problems regarding suitable biological and biochemical assays. Therefore, in many cases additional strategies have to be employed to allow inference of hypotheses regarding gene function. RESULTS: We approached the function of genes encoding the nine-member STRUBBELIG-RECEPTOR FAMILY (SRF) class of putative leucine-rich repeat receptor-like kinases. Sequence comparisons show overall conservation but also divergence in predicted functional domains among SRF proteins. Interestingly, SRF1 undergoes differential splicing. As a result, SRF1 is predicted to exist in a standard receptor configuration and in a membrane-anchored receptor-like version that lacks most of the intracellular domain. Furthermore, SRF1 is characterised by a high degree of polymorphism between the Ler and Col accessions. Two independent T-DNA-based srf4 mutants showed smaller leaves while 35S::SRF4 plants displayed enlarged leaves. This is in addition to the strubbelig phenotype which has been described before. Additional single and several key double mutant combinations did not reveal obvious mutant phenotypes. Ectopic expression of several SRF genes, using the 35S promoter, resulted in male sterility. To gain possible insights into SRF gene function we employed a computational analysis of publicly available microarray data. We performed global expression profiling, coexpression analysis, and an analysis of the enrichment of gene ontology terms among coexpressed genes. The bioinformatic analyses raise the possibility that some SRF genes affect different aspects of cell wall biology. The results also indicate that redundancy is a minor aspect of the SRF family. CONCLUSION: The results provide evidence that SRF4 is a positive regulator of leaf size. In addition, they suggest that the SRF family is characterised by functional diversity and that some SRF genes may function in cell wall biology. They also indicate that complementing reverse genetics with bioinformatical data mining of genome-wide expression data aids in inferring hypotheses on possible functions for members of a gene family

    From RNA-seq to large-scale genotyping - genomics resources for rye (Secale cereale L.)

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The improvement of agricultural crops with regard to yield, resistance and environmental adaptation is a perpetual challenge for both breeding and research. Exploration of the genetic potential and implementation of genome-based breeding strategies for efficient rye (<it>Secale cereale </it>L.) cultivar improvement have been hampered by the lack of genome sequence information. To overcome this limitation we sequenced the transcriptomes of five winter rye inbred lines using Roche/454 GS FLX technology.</p> <p>Results</p> <p>More than 2.5 million reads were assembled into 115,400 contigs representing a comprehensive rye expressed sequence tag (EST) resource. From sequence comparisons 5,234 single nucleotide polymorphisms (SNPs) were identified to develop the Rye5K high-throughput SNP genotyping array. Performance of the Rye5K SNP array was investigated by genotyping 59 rye inbred lines including the five lines used for sequencing, and five barley, three wheat, and two triticale accessions. A balanced distribution of allele frequencies ranging from 0.1 to 0.9 was observed. Residual heterozygosity of the rye inbred lines varied from 4.0 to 20.4% with higher average heterozygosity in the pollen compared to the seed parent pool.</p> <p>Conclusions</p> <p>The established sequence and molecular marker resources will improve and promote genetic and genomic research as well as genome-based breeding in rye.</p

    Distribution, functional impact, and origin mechanisms of copy number variation in the barley genome

    Get PDF
    BACKGROUND There is growing evidence for the prevalence of copy number variation (CNV) and its role in phenotypic variation in many eukaryotic species. Here we use array comparative genomic hybridization to explore the extent of this type of structural variation in domesticated barley cultivars and wild barleys. RESULTS A collection of 14 barley genotypes including eight cultivars and six wild barleys were used for comparative genomic hybridization. CNV affects 14.9% of all the sequences that were assessed. Higher levels of CNV diversity are present in the wild accessions relative to cultivated barley. CNVs are enriched near the ends of all chromosomes except 4H, which exhibits the lowest frequency of CNVs. CNV affects 9.5% of the coding sequences represented on the array and the genes affected by CNV are enriched for sequences annotated as disease-resistance proteins and protein kinases. Sequence-based comparisons of CNV between cultivars Barke and Morex provided evidence that DNA repair mechanisms of double-strand breaks via single-stranded annealing and synthesis-dependent strand annealing play an important role in the origin of CNV in barley. CONCLUSIONS We present the first catalog of CNVs in a diploid Triticeae species, which opens the door for future genome diversity research in a tribe that comprises the economically important cereal species wheat, barley, and rye. Our findings constitute a valuable resource for the identification of CNV affecting genes of agronomic importance. We also identify potential mechanisms that can generate variation in copy number in plant genomes.This work was financially supported by the following grants: project GABI-BARLEX, German Federal Ministry of Education and Research (BMBF), #0314000 to MP, US, KFXM and NS; Triticeae Coordinated Agricultural Project, USDA-NIFA #2011-68002-30029 to GJM; and Agriculture and Food Research Initiative Plant Genome, Genetics and Breeding Program of USDA’s Cooperative State Research and Extension Service, #2009-65300- 05645 to GJM

    De novo 454 sequencing of barcoded BAC pools for comprehensive gene survey and genome analysis in the complex genome of barley

    Get PDF
    <p>Abstract</p> <p>Background</p> <p><it>De novo </it>sequencing the entire genome of a large complex plant genome like the one of barley (<it>Hordeum vulgare </it>L.) is a major challenge both in terms of experimental feasibility and costs. The emergence and breathtaking progress of next generation sequencing technologies has put this goal into focus and a clone based strategy combined with the 454/Roche technology is conceivable.</p> <p>Results</p> <p>To test the feasibility, we sequenced 91 barcoded, pooled, gene containing barley BACs using the GS FLX platform and assembled the sequences under iterative change of parameters. The BAC assemblies were characterized by N50 of ~50 kb (N80 ~31 kb, N90 ~21 kb) and a Q40 of 94%. For ~80% of the clones, the best assemblies consisted of less than 10 contigs at 24-fold mean sequence coverage. Moreover we show that gene containing regions seem to assemble completely and uninterrupted thus making the approach suitable for detecting complete and positionally anchored genes.</p> <p>By comparing the assemblies of four clones to their complete reference sequences generated by the Sanger method, we evaluated the distribution, quality and representativeness of the 454 sequences as well as the consistency and reliability of the assemblies.</p> <p>Conclusion</p> <p>The described multiplex 454 sequencing of barcoded BACs leads to sequence consensi highly representative for the clones. Assemblies are correct for the majority of contigs. Though the resolution of complex repetitive structures requires additional experimental efforts, our approach paves the way for a clone based strategy of sequencing the barley genome.</p

    Distribution, functional impact, and origin mechanisms of copy number variation in the barley genome

    Get PDF
    BACKGROUND: There is growing evidence for the prevalence of copy number variation (CNV) and its role in phenotypic variation in many eukaryotic species. Here we use array comparative genomic hybridization to explore the extent of this type of structural variation in domesticated barley cultivars and wild barleys. RESULTS: A collection of 14 barley genotypes including eight cultivars and six wild barleys were used for comparative genomic hybridization. CNV affects 14.9% of all the sequences that were assessed. Higher levels of CNV diversity are present in the wild accessions relative to cultivated barley. CNVs are enriched near the ends of all chromosomes except 4H, which exhibits the lowest frequency of CNVs. CNV affects 9.5% of the coding sequences represented on the array and the genes affected by CNV are enriched for sequences annotated as disease-resistance proteins and protein kinases. Sequence-based comparisons of CNV between cultivars Barke and Morex provided evidence that DNA repair mechanisms of double-strand breaks via single-stranded annealing and synthesis-dependent strand annealing play an important role in the origin of CNV in barley. CONCLUSIONS: We present the first catalog of CNVs in a diploid Triticeae species, which opens the door for future genome diversity research in a tribe that comprises the economically important cereal species wheat, barley, and rye. Our findings constitute a valuable resource for the identification of CNV affecting genes of agronomic importance. We also identify potential mechanisms that can generate variation in copy number in plant genomes

    Sequencing of BAC pools by different next generation sequencing platforms and strategies

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Next generation sequencing of BACs is a viable option for deciphering the sequence of even large and highly repetitive genomes. In order to optimize this strategy, we examined the influence of read length on the quality of Roche/454 sequence assemblies, to what extent Illumina/Solexa mate pairs (MPs) improve the assemblies by scaffolding and whether barcoding of BACs is dispensable.</p> <p>Results</p> <p>Sequencing four BACs with both FLX and Titanium technologies revealed similar sequencing accuracy, but showed that the longer Titanium reads produce considerably less misassemblies and gaps. The 454 assemblies of 96 barcoded BACs were improved by scaffolding 79% of the total contig length with MPs from a non-barcoded library.</p> <p>Assembly of the unmasked 454 sequences without separation by barcodes revealed chimeric contig formation to be a major problem, encompassing 47% of the total contig length. Masking the sequences reduced this fraction to 24%.</p> <p>Conclusion</p> <p>Optimal BAC pool sequencing should be based on the longest available reads, with barcoding essential for a comprehensive assessment of both repetitive and non-repetitive sequence information. When interest is restricted to non-repetitive regions and repeats are masked prior to assembly, barcoding is non-essential. In any case, the assemblies can be improved considerably by scaffolding with non-barcoded BAC pool MPs.</p
    corecore