66 research outputs found
Increased Expression and Protein Divergence in Duplicate Genes Is Associated with Morphological Diversification
The differentiation of both gene expression and protein function is thought to be important as a mechanism of the functionalization of duplicate genes. However, it has not been addressed whether expression or protein divergence of duplicate genes is greater in those genes that have undergone functionalization compared with those that have not. We examined a total of 492 paralogous gene pairs associated with morphological diversification in a plant model organism (Arabidopsis thaliana). Classifying these paralogous gene pairs into high, low, and no morphological diversification groups, based on knock-out data, we found that the divergence rate of both gene expression and protein sequences were significantly higher in either high or low morphological diversification groups compared with those in the no morphological diversification group. These results strongly suggest that the divergence of both expression and protein sequence are important sources for morphological diversification of duplicate genes. Although both mechanisms are not mutually exclusive, our analysis suggested that changes of expression pattern play the minor role (33%–41%) and that changes of protein sequence play the major role (59%–67%) in morphological diversification. Finally, we examined to what extent duplicate genes are associated with expression or protein divergence exerting morphological diversification at the whole-genome level. Interestingly, duplicate genes randomly chosen from A. thaliana had not experienced expression or protein divergence that resulted in morphological diversification. These results indicate that most duplicate genes have experienced minor functionalization
Efficient anchoring of alien chromosome segments introgressed into bread wheat by new Leymus racemosus genome-based markers
Background: The tertiary gene pool of bread wheat, to which Leymus racemosus belongs, has remained underutilized due to the current limited genomic resources of the species that constitute it. Continuous enrichment of public databases with useful information regarding these species is, therefore, needed to provide insights on their genome structures and aid successful utilization of their genes to develop improved wheat cultivars for effective management of environmental stresses. Results: We generated de novo DNA and mRNA sequence information of L. racemosus and developed 110 polymorphic PCR-based markers from the data, and to complement the PCR markers, DArT-seq genotyping was applied to develop additional 9990 SNP markers. Approximately 52% of all the markers enabled us to clearly genotype 22 wheat-L. racemosus chromosome introgression lines, and L. racemosus chromosome-specific markers were highly efficient in detailed characterization of the translocation and recombination lines analyzed. A further analysis revealed remarkable transferability of the PCR markers to three other important Triticeae perennial species: L. mollis, Psathyrostachys huashanica and Elymus ciliaris, indicating their suitability for characterizing wheat-alien chromosome introgressions carrying chromosomes of these genomes. Conclusion: The efficiency of the markers in characterizing wheat-L. racemosus chromosome introgression lines proves their reliability, and their high transferability further broadens their scope of application. This is the first report on sequencing and development of markers from L. racemosus genome and the application of DArT-seq to develop markers from a perennial wild relative of wheat, marking a paradigm shift from the seeming concentration of the technology on cultivated species. Integration of these markers with appropriate cytogenetic methods would accelerate development and characterization of wheat-alien chromosome introgression lines
Identification of endogenous small peptides involved in rice immunity through transcriptomics- and proteomics-based screening
Small signalling peptides, generated from larger protein precursors, are important components to orchestrate various plant processes such as development and immune responses. However, small signalling peptides involved in plant immunity remain largely unknown. Here, we developed a pipeline using transcriptomics- and proteomics-based screening to identify putative precursors of small signalling peptides: small secreted proteins (SSPs) in rice, induced by rice blast fungus Magnaporthe oryzae and its elicitor, chitin. We identified 236 SSPs including members of two known small signalling peptide families, namely rapid alkalinization factors and phytosulfokines, as well as many other protein families that are known to be involved in immunity, such as proteinase inhibitors and pathogenesis-related protein families. We also isolated 52 unannotated SSPs and among them, we found one gene which we named immune response peptide (IRP) that appeared to encode the precursor of a small signalling peptide regulating rice immunity. In rice suspension cells, the expression of IRP was induced by bacterial peptidoglycan and fungal chitin. Overexpression of IRP enhanced the expression of a defence gene, PAL1 and induced the activation of the MAPKs in rice suspension cells. Moreover, the IRP protein level increased in suspension cell medium after chitin treatment. Collectively, we established a simple and efficient pipeline to discover SSP candidates that probably play important roles in rice immunity and identified 52 unannotated SSPs that may be useful for further elucidation of rice immunity. Our method can be applied to identify SSPs that are involved not only in immunity but also in other plant functions
Evolutionary Persistence of Functional Compensation by Duplicate Genes in Arabidopsis
Knocking out a gene from a genome often causes no phenotypic effect. This phenomenon has been explained in part by the existence of duplicate genes. However, it was found that in mouse knockout data duplicate genes are as essential as singleton genes. Here, we study whether it is also true for the knockout data in Arabidopsis. From the knockout data in Arabidopsis thaliana obtained in our study and in the literature, we find that duplicate genes show a significantly lower proportion of knockout effects than singleton genes. Because the persistence of duplicate genes in evolution tends to be dependent on their phenotypic effect, we compared the ages of duplicate genes whose knockout mutants showed less severe phenotypic effects with those with more severe effects. Interestingly, the latter group of genes tends to be more anciently duplicated than the former group of genes. Moreover, using multiple-gene knockout data, we find that functional compensation by duplicate genes for a more severe phenotypic effect tends to be preserved by natural selection for a longer time than that for a less severe effect. Taken together, we conclude that duplicate genes contribute to genetic robustness mainly by preserving compensation for severe phenotypic effects in A. thaliana
ARTADE2DB: Improved Statistical Inferences for Arabidopsis Gene Functions and Structure Predictions by Dynamic Structure-Based Dynamic Expression (DSDE) Analyses
Recent advances in technologies for observing high-resolution genomic activities, such as whole-genome tiling arrays and high-throughput sequencers, provide detailed information for understanding genome functions. However, the functions of 50% of known Arabidopsis thaliana genes remain unknown or are annotated only on the basis of static analyses such as protein motifs or similarities. In this paper, we describe dynamic structure-based dynamic expression (DSDE) analysis, which sequentially predicts both structural and functional features of transcripts. We show that DSDE analysis inferred gene functions 12% more precisely than static structure-based dynamic expression (SSDE) analysis or conventional co-expression analysis based on previously determined gene structures of A. thaliana. This result suggests that more precise structural information than the fixed conventional annotated structures is crucial for co-expression analysis in systems biology of transcriptional regulation and dynamics. Our DSDE method, ARabidopsis Tiling-Array-based Detection of Exons version 2 and over-representation analysis (ARTADE2-ORA), precisely predicts each gene structure by combining two statistical analyses: a probe-wise co-expression analysis of multiple transcriptome measurements and a Markov model analysis of genome sequences. ARTADE2-ORA successfully identified the true functions of about 90% of functionally annotated genes, inferred the functions of 98% of functionally unknown genes and predicted 1,489 new gene structures and functions. We developed a database ARTADE2DB that integrates not only the information predicted by ARTADE2-ORA but also annotations and other functional information, such as phenotypes and literature citations, and is expected to contribute to the study of the functional genomics of A. thaliana. URL: http://artade.org
Integrative Annotation of 21,037 Human Genes Validated by Full-Length cDNA Clones
The human genome sequence defines our inherent biological potential; the realization of the biology encoded therein requires knowledge of the function of each gene. Currently, our knowledge in this area is still limited. Several lines of investigation have been used to elucidate the structure and function of the genes in the human genome. Even so, gene prediction remains a difficult task, as the varieties of transcripts of a gene may vary to a great extent. We thus performed an exhaustive integrative characterization of 41,118 full-length cDNAs that capture the gene transcripts as complete functional cassettes, providing an unequivocal report of structural and functional diversity at the gene level. Our international collaboration has validated 21,037 human gene candidates by analysis of high-quality full-length cDNA clones through curation using unified criteria. This led to the identification of 5,155 new gene candidates. It also manifested the most reliable way to control the quality of the cDNA clones. We have developed a human gene database, called the H-Invitational Database (H-InvDB; http://www.h-invitational.jp/). It provides the following: integrative annotation of human genes, description of gene structures, details of novel alternative splicing isoforms, non-protein-coding RNAs, functional domains, subcellular localizations, metabolic pathways, predictions of protein three-dimensional structure, mapping of known single nucleotide polymorphisms (SNPs), identification of polymorphic microsatellite repeats within human genes, and comparative results with mouse full-length cDNAs. The H-InvDB analysis has shown that up to 4% of the human genome sequence (National Center for Biotechnology Information build 34 assembly) may contain misassembled or missing regions. We found that 6.5% of the human gene candidates (1,377 loci) did not have a good protein-coding open reading frame, of which 296 loci are strong candidates for non-protein-coding RNA genes. In addition, among 72,027 uniquely mapped SNPs and insertions/deletions localized within human genes, 13,215 nonsynonymous SNPs, 315 nonsense SNPs, and 452 indels occurred in coding regions. Together with 25 polymorphic microsatellite repeats present in coding regions, they may alter protein structure, causing phenotypic effects or resulting in disease. The H-InvDB platform represents a substantial contribution to resources needed for the exploration of human biology and pathology
Integrative annotation of 21,037 human genes validated by full-length cDNA clones.
publication en ligne. Article dans revue scientifique avec comité de lecture. nationale.National audienceThe human genome sequence defines our inherent biological potential; the realization of the biology encoded therein requires knowledge of the function of each gene. Currently, our knowledge in this area is still limited. Several lines of investigation have been used to elucidate the structure and function of the genes in the human genome. Even so, gene prediction remains a difficult task, as the varieties of transcripts of a gene may vary to a great extent. We thus performed an exhaustive integrative characterization of 41,118 full-length cDNAs that capture the gene transcripts as complete functional cassettes, providing an unequivocal report of structural and functional diversity at the gene level. Our international collaboration has validated 21,037 human gene candidates by analysis of high-quality full-length cDNA clones through curation using unified criteria. This led to the identification of 5,155 new gene candidates. It also manifested the most reliable way to control the quality of the cDNA clones. We have developed a human gene database, called the H-Invitational Database (H-InvDB; http://www.h-invitational.jp/). It provides the following: integrative annotation of human genes, description of gene structures, details of novel alternative splicing isoforms, non-protein-coding RNAs, functional domains, subcellular localizations, metabolic pathways, predictions of protein three-dimensional structure, mapping of known single nucleotide polymorphisms (SNPs), identification of polymorphic microsatellite repeats within human genes, and comparative results with mouse full-length cDNAs. The H-InvDB analysis has shown that up to 4% of the human genome sequence (National Center for Biotechnology Information build 34 assembly) may contain misassembled or missing regions. We found that 6.5% of the human gene candidates (1,377 loci) did not have a good protein-coding open reading frame, of which 296 loci are strong candidates for non-protein-coding RNA genes. In addition, among 72,027 uniquely mapped SNPs and insertions/deletions localized within human genes, 13,215 nonsynonymous SNPs, 315 nonsense SNPs, and 452 indels occurred in coding regions. Together with 25 polymorphic microsatellite repeats present in coding regions, they may alter protein structure, causing phenotypic effects or resulting in disease. The H-InvDB platform represents a substantial contribution to resources needed for the exploration of human biology and pathology
Evolutionary features of RNA viruses with special reference to mutation rates and transmission modes
It is known that many kinds of diseases are caused by viruses having RNA as their genetic materials. In general, RNA viruses evolve by evolutionary factors including mutation and selection. Selection against RNA viruses is mainly caused by the interaction with the host species, because RNA viruses can survive only as parasites of the host species. Therefore, it is of particular importance to investigate the interactions between RNA viruses and the host for studying the evolution of RNA viruses. In this thesis, l focused on the following three interacting features with the host; 1) modes of viral infection to the host, 2) viral adaptation to a single host and 3) exchanging genomic regions between RNA viruses and the host. In chapter 1, first, I defined the virus as an organism that could survive and grow only in the living cell, and that contained a protein coat surrounding a nucleic acid core but having no semi permeable membrane. In addition to the definitions, I outlined the taxonomy and evolutionary mechanisms of RNA viruses. In chapter 2, I estimated the rates of synonymous substitution for 46 species of RNA viruses and found a large amount of variation in the rates (the difference in the 3rd orders of magnitude). On the other hand, through constancy in the rate of replication error among RNA viruses examined, I concluded that the main factor for the variation of the substitution rates was the differences in the replication frequency. This is because we can assume that the rate of synonymous substitution is determined by the rate of replication error and the replication frequency. Moreover, I examined relationships between the rates of synonymous substitution and several modes of viral infections to the host including the transmission modes. The results obtained indicate that the rate of synonymous substitution was strongly related to the difference in the modes of viral infection to the host. The reason was speculated as that the modes of viral infection to the host altered the replication frequency. In chapter 3, using porcine reproductive and respiratory syndrome virus(PRRSV) whose synonymous substitution rate was the highest among the 46 species of RNA viruses, I conducted evolutionary analyses in order to understand the evolutionary process of PRRSV. The virus is a recently emerged pathogen in domesticated swines. Epidemiological data suggest that the divergence time of PRRSV is about 15 years ago. For confirming the rapidness of the synonymous substitution rate in PRRSV, I first estimated the divergence time of PRRSV by molecular evolutionary analysis, and compared it with that inferred from the epidemiological data. As a result, the divergence time estimated by the evolutionary analysis well corresponded to that estimated by the epidemiological data. This correspondence ensured the rapidness of the rate in PRRSV Second, I studied the envelope regions as an important element for viral adaptation to the host. In particular, positively selected sites were detected in the envelope gene by my computer analysis. Interestingly, the sites were located not only in the regions attacked by the host immune system but also in the transmembrane regions including a signal peptide. The positively selected sites in the transmembrane regions were considered to be irrelevant for escaping the immune system, because no amino acid substitutions were observed in the transmembrane regions of the sequences isolated from piglets that were experimentally infected by PRRSV. In other words, the transmembrane regions and the signal peptide are thought to be specific to a given membrane. Therefore, l think that the positively selected sites of the membrane regions are important not for the viral adaptation to the host immune system but for the viral attachment to the membrane of the new host cell, because PRRSV emerged recently as mentioned above. In chapter 4, I searched for eukaryotic genomic regions homologous to RNA viruses to find how often the exchange of a genomic sequence has occurred between RNA viruses including retro and non-retro viruses and 6 eukaryotic genomes such as Homo sapiens, Mus musculus, Drosophila melanogaster, Caenorhabditis elegans, Arabidopsis thaliana, and Saccharomyces cerevisiae. The evolutionary origin of the homologous regions was studied by phylogenetic analysis. For the non-retrovirus RNA viruses, I obtained two major results: First, a part of the Borna virus genome (nucleocapsid protein gene) was shown for the first time to be derived from mammalian genomes. Second, the 6 eukaryotic genomes did not have any part of the virus genome. In the case of the retroviruses and the two mammalian species, Homo sapiens and Mus musculus, I obtained four results. First, retrovirus-like regions occupied about 0.1% of each of the whole genomes of the two species. Second, physical maps indicating the locations of the retrovirus-like regions were constructed for both genomes. Third, the retrovirus-like regions were not randomly distributed in both complete genomes at a significant level (Pbetween the GC content of retrovirus-like regions and that of the flanking regions for both species. From these results, I have concluded that retroviruses have been integrated into the host genome where the GC content was similar to each other. The present study will give a insight not only into the evolutionary origin and process of RNA viruses but also the interacting features between RNA viruses and their hosts
2-19 Rapid growth rate of sulfur-turf microbial mat developed in hot spring water in Nakafusa, Japan
publishe
- …