19 research outputs found

    Evolution of Stress-Regulated Gene Expression in Duplicate Genes of Arabidopsis thaliana

    Get PDF
    Due to the selection pressure imposed by highly variable environmental conditions, stress sensing and regulatory response mechanisms in plants are expected to evolve rapidly. One potential source of innovation in plant stress response mechanisms is gene duplication. In this study, we examined the evolution of stress-regulated gene expression among duplicated genes in the model plant Arabidopsis thaliana. Key to this analysis was reconstructing the putative ancestral stress regulation pattern. By comparing the expression patterns of duplicated genes with the patterns of their ancestors, duplicated genes likely lost and gained stress responses at a rapid rate initially, but the rate is close to zero when the synonymous substitution rate (a proxy for time) is >∼0.8. When considering duplicated gene pairs, we found that partitioning of putative ancestral stress responses occurred more frequently compared to cases of parallel retention and loss. Furthermore, the pattern of stress response partitioning was extremely asymmetric. An analysis of putative cis-acting DNA regulatory elements in the promoters of the duplicated stress-regulated genes indicated that the asymmetric partitioning of ancestral stress responses are likely due, at least in part, to differential loss of DNA regulatory elements; the duplicated genes losing most of their stress responses were those that had lost more of the putative cis-acting elements. Finally, duplicate genes that lost most or all of the ancestral responses are more likely to have gained responses to other stresses. Therefore, the retention of duplicates that inherit few or no functions seems to be coupled to neofunctionalization. Taken together, our findings provide new insight into the patterns of evolutionary changes in gene stress responses after duplication and lay the foundation for testing the adaptive significance of stress regulatory changes under highly variable biotic and abiotic environments

    Evolutionary History and Stress Regulation of Plant Receptor-Like Kinase/Pelle Genes1[W][OA]

    No full text
    Receptor-Like Kinase (RLK)/Pelle genes play roles ranging from growth regulation to defense response, and the dramatic expansion of this family has been postulated to be crucial for plant-specific adaptations. Despite this, little is known about the history of or the factors that contributed to the dramatic expansion of this gene family. In this study, we show that expansion coincided with the establishment of land plants and that RLK/Pelle subfamilies were established early in land plant evolution. The RLK/Pelle family expanded at a significantly higher rate than other kinases, due in large part to expansion of a few subfamilies by tandem duplication. Interestingly, these subfamilies tend to have members with known roles in defense response, suggesting that their rapid expansion was likely a consequence of adaptation to fast-evolving pathogens. Arabidopsis (Arabidopsis thaliana) expression data support the importance of RLK/Pelles in biotic stress response. We found that hundreds of RLK/Pelles are up-regulated by biotic stress. Furthermore, stress responsiveness is correlated with the degree of tandem duplication in RLK/Pelle subfamilies. Our findings suggest a link between stress response and tandem duplication and provide an explanation for why a large proportion of the RLK/Pelle gene family is found in tandem repeats. In addition, our findings provide a useful framework for potentially predicting RLK/Pelle stress functions based on knowledge of expansion pattern and duplication mechanism. Finally, we propose that the detection of highly variable molecular patterns associated with specific pathogens/parasites is the main reason for the up-regulation of hundreds of RLK/Pelles under biotic stress

    Importance of Lineage-Specific Expansion of Plant Tandem Duplicates in the Adaptive Response to Environmental Stimuli1[W][OA]

    No full text
    Plants have substantially higher gene duplication rates compared with most other eukaryotes. These plant gene duplicates are mostly derived from whole genome and/or tandem duplications. Earlier studies have shown that a large number of duplicate genes are retained over a long evolutionary time, and there is a clear functional bias in retention. However, the influence of duplication mechanism, particularly tandem duplication, on duplicate retention has not been thoroughly investigated. We have defined orthologous groups (OGs) between Arabidopsis (Arabidopsis thaliana) and three other land plants to examine the functional bias of retained duplicate genes during vascular plant evolution. Based on analysis of Gene Ontology categories, it is clear that genes in OGs that expanded via tandem duplication tend to be involved in responses to environmental stimuli, while those that expanded via nontandem mechanisms tend to have intracellular regulatory roles. Using Arabidopsis stress expression data, we further demonstrated that tandem duplicates in expanded OGs are significantly enriched in genes that are up-regulated by biotic stress conditions. In addition, tandem duplication of genes in an OG tends to be highly asymmetric. That is, expansion of OGs with tandem genes in one organismal lineage tends to be coupled with losses in the other. This is consistent with the notion that these tandem genes have experienced lineage-specific selection. In contrast, OGs with genes duplicated via nontandem mechanisms tend to experience convergent expansion, in which similar numbers of genes are gained in parallel. Our study demonstrates that the expansion of gene families and the retention of duplicates in plants exhibit substantial functional biases that are strongly influenced by the mechanism of duplication. In particular, genes involved in stress responses have an elevated probability of retention in a single-lineage fashion following tandem duplication, suggesting that these tandem duplicates are likely important for adaptive evolution to rapidly changing environments

    Evolutionary and Expression Signatures of Pseudogenes in Arabidopsis and Rice1[C][W][OA]

    No full text
    Pseudogenes (Ψ) are nonfunctional genomic sequences resembling functional genes. Knowledge of Ψs can improve genome annotation and our understanding of genome evolution. However, there has been relatively little systemic study of Ψs in plants. In this study, we characterized the evolution and expression patterns of Ψs in Arabidopsis (Arabidopsis thaliana) and rice (Oryza sativa). In contrast to animal Ψs, many plant Ψs experienced much stronger purifying selection. In addition, plant Ψs experiencing stronger selective constraints tend to be derived from relatively ancient duplicates, suggesting that they were functional for a relatively long time but became Ψs recently. Interestingly, the regions 5′ to the first stops in the Ψs have experienced stronger selective constraints compared with 3′ regions, suggesting that the 5′ regions were functional for a longer period of time after the premature stops appeared. We found that few Ψs have expression evidence, and their expression levels tend to be lower compared with annotated genes. Furthermore, Ψs with expressed sequence tags tend to be derived from relatively recent duplication events, indicating that Ψ expression may be due to insufficient time for complete degeneration of regulatory signals. Finally, larger protein domain families have significantly more Ψs in general. However, while families involved in environmental stress responses have a significant excess of Ψs, transcription factors and receptor-like kinases have lower than expected numbers of Ψs, consistent with their elevated retention rate in plant genomes. Our findings illustrate peculiar properties of plant Ψs, providing additional insight into the evolution of duplicate genes and benefiting future genome annotation

    Utility and Limitations of Using Gene Expression Data to Identify Functional Associations

    No full text
    <div><p>Gene co-expression has been widely used to hypothesize gene function through guilt-by association. However, it is not clear to what degree co-expression is informative, whether it can be applied to genes involved in different biological processes, and how the type of dataset impacts inferences about gene functions. Here our goal is to assess the utility and limitations of using co-expression as a criterion to recover functional associations between genes. By determining the percentage of gene pairs in a metabolic pathway with significant expression correlation, we found that many genes in the same pathway do not have similar transcript profiles and the choice of dataset, annotation quality, gene function, expression similarity measure, and clustering approach significantly impacts the ability to recover functional associations between genes using <i>Arabidopsis thaliana</i> as an example. Some datasets are more informative in capturing coordinated expression profiles and larger data sets are not always better. In addition, to recover the maximum number of known pathways and identify candidate genes with similar functions, it is important to explore rather exhaustively multiple dataset combinations, similarity measures, clustering algorithms and parameters. Finally, we validated the biological relevance of co-expression cluster memberships with an independent phenomics dataset and found that genes that consistently cluster with leucine degradation genes tend to have similar leucine levels in mutants. This study provides a framework for obtaining gene functional associations by maximizing the information that can be obtained from gene expression datasets.</p></div

    Impact of datasets on pathway EC percentile.

    No full text
    <p><b>(A)</b> Relationship between pathway EC percentiles calculated using the combined stress gene expression dataset and those calculated based on one of the individual stress datasets, abiotic/shoot. <b>(B)</b> Relationship between pathway EC percentiles calculated using the light, development and stress combined dataset and those calculated based on individual dataset, stress. In (A) and (B) the dashed line represents <i>y</i> = <i>x</i>, and each dot represents a pathway. <b>(C)</b> Individual and combinations of datasets used to determine pathway EC Percentiles. *: NASCArray consisting of all the datasets listed here as well as additional datasets (~700 samples). The columns in (C) correspond to those in (D) and (E). <b>(D)</b> Bar plot of percent high EC pathways using different expression datasets <b>(E)</b> Heat map of pathway EC percentiles from 13 gene expression datasets. Dark red: EC percentiles≥ 95. Orange: 95 > EC percentiles < 75. Yellow: 75 > EC percentiles <50, Blue: 50 > EC percentiles < 0 <b>(F)</b> Histogram of the numbers of datasets leading to high EC values for each pathway. Example pathways are labeled with an arrow.</p

    Performance of clusters in predicting pathways.

    No full text
    <p><b>(A)</b> Histogram of the maximum scores (-log(<i>q</i>)) for over-representation of pathways within clusters. <b>(B)</b> Histogram of the maximum F measures for prediction of pathway membership based on cluster membership. <b>(C)</b> Relationship between precision and recall for clusters. In (A-C), clusters were generated using <i>k</i>-means with <i>k</i> = 100. <b>(D)</b> Heat map of over-representation scores obtained from different individual and combined clustering algorithms (top) and cluster numbers (bottom) Color represents over-representation scores (-log(<i>q</i>)) from 0 to 12. Scores less than 1.3 are indicated by dark blue. Scores more than 1.3 are represented by a spectrum of light blue to red. Pathways in the heat map are sorted based on the number of times that they are over-represented in the clusters, high to low. <b>(E)</b> Bar plot showing the difference between overall maximum over-representation score—the highest score from any single cluster—and the over-representation score from clusters generated using <i>k</i>-means, <i>k</i> = 100 for each pathway. <b>(F)</b> Bar plot showing the difference between the overall maximum F measure—the highest score from any single cluster—and the F measure from clusters generated using <i>k</i>-means, <i>k</i> = 100 for each pathway. <b>(G)</b> Bar plot showing the difference between maximum Precision—the highest score from any single cluster—and the Precision from clusters generated using <i>k</i>-means, <i>k</i> = 100 from each pathway. Arrow: performance values for the leucine degradation pathway.</p

    Impact of pathway size and other factors on EC.

    No full text
    <p><b>(A)</b> Relationship between ECexp of a pathway and pathway size (the number of genes assigned to a pathway). <b>(B)</b> ECexp value distribution for pathway genes with products that have subcellular location annotations. PM: Plasma membrane. ER: Endoplasmic reticulum. <b>(C)</b> ECexp value distribution for different pathway classes (general pathway categories). <b>(D)</b> Datasets used to determine pathway ECs. A “<b>+</b>” indicates that the dataset in question was used (either individually or in combination) for the analyses depicted by bar graphs in (E) and (F). The columns in (D) correspond to those in (E) and (F). <b>(E)</b> The 95th percentile PCC values (PCC95) in the null distributions for each dataset or combination of datasets. PCC95 of combined datasets (stress fold change and light (L)+stress (S)+development (D) absolute intensity) are labeled in the bar plot <b>(F)</b> Number of pathways with high EC for each dataset and/or combination of datasets. Green: fold change values were used to calculate ECs. Orange: absolute intensity values were used for calculating ECs.</p

    Relationship between pathway ECs, annotation quality and similarity measures.

    No full text
    <p><b>(A)</b> Relationship between the EC calculated for pathway genes that are annotated based on experimental evidence (ECexp) and EC calculated for pathway genes that are annotated only based on computational evidence (ECcomp). The genes used to calculate ECexp and ECcomp do not overlap. Each dot represents one pathway. Dashed line: <i>y</i> = <i>x</i> line. <b>(B)</b> Heatmap of correlations between pathway EC percentiles calculated with: partial correlations estimated with the corpcor method, Spearman’s rank correlation coefficient (Spearman), Pearson Correlation Coefficient (PCC), adjusted and normalized Mutual Information (MI), partial correlation calculated with the partialcorr method, and transformed <i>p-</i>values of Bayesian Network (BN) (<b>C)</b> Percent pathways that have high EC using different similarity measures. <b>(D)</b> Heatmap of pathway EC percentiles calculated using different similarity measures. Color represents EC percentiles. White dotted rectangles: high EC pathways that are specific to one measure.</p
    corecore