Search CORE

375 research outputs found

C-terminal motif prediction in eukaryotic proteomes using comparative genomics and statistical over-representation across protein families

Author: Austin Ryan S
Cutler Sean R
Provart Nicholas J
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

Abstract Background The carboxy termini of proteins are a frequent site of activity for a variety of biologically important functions, ranging from post-translational modification to protein targeting. Several short peptide motifs involved in protein sorting roles and dependent upon their proximity to the C-terminus for proper function have already been characterized. As a limited number of such motifs have been identified, the potential exists for genome-wide statistical analysis and comparative genomics to reveal novel peptide signatures functioning in a C-terminal dependent manner. We have applied a novel methodology to the prediction of C-terminal-anchored peptide motifs involving a simple z-statistic and several techniques for improving the signal-to-noise ratio. Results We examined the statistical over-representation of position-specific C-terminal tripeptides in 7 eukaryotic proteomes. Sequence randomization models and simple-sequence masking were applied to the successful reduction of background noise. Similarly, as C-terminal homology among members of large protein families may artificially inflate tripeptide counts in an irrelevant and obfuscating manner, gene-family clustering was performed prior to the analysis in order to assess tripeptide over-representation across protein families as opposed to across all proteins. Finally, comparative genomics was used to identify tripeptides significantly occurring in multiple species. This approach has been able to predict, to our knowledge, all C-terminally anchored targeting motifs present in the literature. These include the PTS1 peroxisomal targeting signal (SKL*), the ER-retention signal (K/HDEL*), the ER-retrieval signal for membrane bound proteins (KKxx*), the prenylation signal (CC*) and the CaaX box prenylation motif. In addition to a high statistical over-representation of these known motifs, a collection of significant tripeptides with a high propensity for biological function exists between species, among kingdoms and across eukaryotes. Motifs of note include a serine-acidic peptide (DSD*) as well as several lysine enriched motifs found in nearly all eukaryotic genomes examined. Conclusion We have successfully generated a high confidence representation of eukaryotic motifs anchored at the C-terminus. A high incidence of true-positives in our results suggests that several previously unidentified tripeptide patterns are strong candidates for representing novel peptide motifs of a widely employed nature in the C-terminal biology of eukaryotes. Our application of comparative genomics, statistical over-representation and the adjustment for protein family homology has generated several hypotheses concerning the C-terminal topology as it pertains to sorting and potential protein interaction signals. This approach to background reduction could be expanded for application to protein motif prediction in the protein interior. A parallel N-terminal analysis is presented as supplementary data.</p

University of Toronto Research Repository

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Combining classifiers to predict gene function in Arabidopsis thaliana using large-scale gene expression measurements

Author: Bonner Anthony J
Carson Rachel
Lan Hui
Provart Nicholas J
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

Abstract Background <it>Arabidopsis thaliana </it>is the model species of current plant genomic research with a genome size of 125 Mb and approximately 28,000 genes. The function of half of these genes is currently unknown. The purpose of this study is to infer gene function in Arabidopsis using machine-learning algorithms applied to large-scale gene expression data sets, with the goal of identifying genes that are potentially involved in plant response to abiotic stress. Results Using in house and publicly available data, we assembled a large set of gene expression measurements for <it>A. thaliana</it>. Using those genes of known function, we first evaluated and compared the ability of basic machine-learning algorithms to predict which genes respond to stress. Predictive accuracy was measured using ROC50 and precision curves derived through cross validation. To improve accuracy, we developed a method for combining these classifiers using a weighted-voting scheme. The combined classifier was then trained on genes of known function and applied to genes of unknown function, identifying genes that potentially respond to stress. Visual evidence corroborating the predictions was obtained using electronic Northern analysis. Three of the predicted genes were chosen for biological validation. Gene knockout experiments confirmed that all three are involved in a variety of stress responses. The biological analysis of one of these genes (At1g16850) is presented here, where it is shown to be necessary for the normal response to temperature and NaCl. Conclusion Supervised learning methods applied to large-scale gene expression measurements can be used to predict gene function. However, the ability of basic learning methods to predict stress response varies widely and depends heavily on how much dimensionality reduction is used. Our method of combining classifiers can improve the accuracy of such predictions – in this case, predictions of genes involved in stress response in plants – and it effectively chooses the appropriate amount of dimensionality reduction automatically. The method provides a useful means of identifying genes in <it>A. thaliana </it>that potentially respond to stress, and we expect it would be useful in other organisms and for other gene functions.</p

University of Toronto Research Repository

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

The role of the Arabidopsis FUSCA3 transcription factor during inhibition of seed germination at high temperature

Author: Chiu Rex S
Gazzarrini Sonia
Nahal Hardeep
Provart Nicholas J
Publication venue: BioMed Central
Publication date: 01/01/2012
Field of study

Abstract Background Imbibed seeds integrate environmental and endogenous signals to break dormancy and initiate growth under optimal conditions. Seed maturation plays an important role in determining the survival of germinating seeds, for example one of the roles of dormancy is to stagger germination to prevent mass growth under suboptimal conditions. The B3-domain transcription factor FUSCA3 (FUS3) is a master regulator of seed development and an important node in hormonal interaction networks in Arabidopsis thaliana. Its function has been mainly characterized during embryonic development, where FUS3 is highly expressed to promote seed maturation and dormancy by regulating ABA/GA levels. Results In this study, we present evidence for a role of FUS3 in delaying seed germination at supraoptimal temperatures that would be lethal for the developing seedlings. During seed imbibition at supraoptimal temperature, the FUS3 promoter is reactivated and induces de novo synthesis of FUS3 mRNA, followed by FUS3 protein accumulation. Genetic analysis shows that FUS3 contributes to the delay of seed germination at high temperature. Unlike WT, seeds overexpressing FUS3 (ML1:FUS3-GFP) during imbibition are hypersensitive to high temperature and do not germinate, however, they can fully germinate after recovery at control temperature reaching 90% seedling survival. ML1:FUS3-GFP hypersensitivity to high temperature can be partly recovered in the presence of fluridone, an inhibitor of ABA biosynthesis, suggesting this hypersensitivity is due in part to higher ABA level in this mutant. Transcriptomic analysis shows that WT seeds imbibed at supraoptimal temperature activate seed-specific genes and ABA biosynthetic and signaling genes, while inhibiting genes that promote germination and growth, such as GA biosynthetic and signaling genes. Conclusion In this study, we have uncovered a novel function for the master regulator of seed maturation, FUS3, in delaying germination at supraoptimal temperature. Physiologically, this is important since delaying germination has a protective role at high temperature. Transcriptomic analysis of seeds imbibed at supraoptimal temperature reveal that a complex program is in place, which involves not only the regulation of heat and dehydration response genes to adjust cellular functions, but also the activation of seed-specific programs and the inhibition of germination-promoting programs to delay germination

University of Toronto Research Repository

Crossref

Springer - Publisher Connector

PubMed Central

Current status of the multinational Arabidopsis community

Author: Parry Geraint
Provart Nicholas J.
The Multinational Arabidopsis Steering Committee
Wrzaczek Michael
Publication venue
Publication date: 01/07/2020
Field of study

Publisher Copyright: © 2020 The Authors. Plant Direct published by American Society of Plant Biologists and the Society for Experimental Biology and John Wiley & Sons LtdThe multinational Arabidopsis research community is highly collaborative and over the past thirty years these activities have been documented by the Multinational Arabidopsis Steering Committee (MASC). Here, we (a) highlight recent research advances made with the reference plant Arabidopsis thaliana; (b) provide summaries from recent reports submitted by MASC subcommittees, projects and resources associated with MASC and from MASC country representatives; and (c) initiate a call for ideas and foci for the “fourth decadal roadmap,” which will advise and coordinate the global activities of the Arabidopsis research community.Peer reviewe

Helsingin yliopiston digitaalinen arkisto

Current status of the multinational Arabidopsis community

Author: Brady Siobhan
Parry Geraint
Provart Nicholas
Romanowski Andrew
Steering Committee The Multinational Arabidopsis
Uzilday Baris
Publication venue: 'Wiley'
Publication date: 02/08/2020
Field of study

Edinburgh Research Explorer

NLStradamus: a simple Hidden Markov Model for nuclear localization signal prediction

Author: Alan M Moses
Alex N Nguyen Ba
Anastassia Pogoutse
Nguyen Ba Alex
Nicholas Provart
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background Nuclear localization signals (NLSs) are stretches of residues within a protein that are important for the regulated nuclear import of the protein. Of the many import pathways that exist in yeast, the best characterized is termed the 'classical' NLS pathway. The classical NLS contains specific patterns of basic residues and computational methods have been designed to predict the location of these motifs on proteins. The consensus sequences, or patterns, for the other import pathways are less well-understood. Results In this paper, we present an analysis of characterized NLSs in yeast, and find, despite the large number of nuclear import pathways, that NLSs seem to show similar patterns of amino acid residues. We test current prediction methods and observe a low true positive rate. We therefore suggest an approach using hidden Markov models (HMMs) to predict novel NLSs in proteins. We show that our method is able to consistently find 37% of the NLSs with a low false positive rate and that our method retains its true positive rate outside of the yeast data set used for the training parameters. Conclusion Our implementation of this model, NLStradamus, is made available at: <url>http://www.moseslab.csb.utoronto.ca/NLStradamus/</url></p

University of Toronto Research Repository

Springer

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Population Structure and Genetic Diversity in a Rice Core Collection (Oryza sativa L.) Investigated with SSR Markers

Author: Jinquan Li
Nicholas James Provart
Peng Zhang
Xiangdong Liu
Xiaoling Li
Xingjuan Zhao
Yonggen Lu
Publication venue: Public Library of Science
Publication date: 02/12/2011
Field of study

The assessment of genetic diversity and population structure of a core collection would benefit to make use of these germplasm as well as applying them in association mapping. The objective of this study were to (1) examine the population structure of a rice core collection; (2) investigate the genetic diversity within and among subgroups of the rice core collection; (3) identify the extent of linkage disequilibrium (LD) of the rice core collection. A rice core collection consisting of 150 varieties which was established from 2260 varieties of Ting's collection of rice germplasm were genotyped with 274 SSR markers and used in this study. Two distinct subgroups (i.e. SG 1 and SG 2) were detected within the entire population by different statistical methods, which is in accordance with the differentiation of indica and japonica rice. MCLUST analysis might be an alternative method to STRUCTURE for population structure analysis. A percentage of 26% of the total markers could detect the population structure as the whole SSR marker set did with similar precision. Gene diversity and MRD between the two subspecies varied considerably across the genome, which might be used to identify candidate genes for the traits under domestication and artificial selection of indica and japonica rice. The percentage of SSR loci pairs in significant (P<0.05) LD is 46.8% in the entire population and the ratio of linked to unlinked loci pairs in LD is 1.06. Across the entire population as well as the subgroups and sub-subgroups, LD decays with genetic distance, indicating that linkage is one main cause of LD. The results of this study would provide valuable information for association mapping using the rice core collection in future

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

An extensive (co-)expression analysis tool for the cytochrome P450 superfamily in Arabidopsis thaliana

Author: Ehlting Jürgen
Ginglinger Jean-François
Olry Alexandre
Provart Nicholas J
Sauveplane Vincent
Werck-Reichhart Danièle
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background Sequencing of the first plant genomes has revealed that cytochromes P450 have evolved to become the largest family of enzymes in secondary metabolism. The proportion of P450 enzymes with characterized biochemical function(s) is however very small. If P450 diversification mirrors evolution of chemical diversity, this points to an unexpectedly poor understanding of plant metabolism. We assumed that extensive analysis of gene expression might guide towards the function of P450 enzymes, and highlight overlooked aspects of plant metabolism. Results We have created a comprehensive database, 'CYPedia', describing P450 gene expression in four data sets: organs and tissues, stress response, hormone response, and mutants of <it>Arabidopsis thaliana</it>, based on public Affymetrix ATH1 microarray expression data. P450 expression was then combined with the expression of 4,130 re-annotated genes, predicted to act in plant metabolism, for co-expression analyses. Based on the annotation of co-expressed genes from diverse pathway annotation databases, co-expressed pathways were identified. Predictions were validated for most P450s with known functions. As examples, co-expression results for P450s related to plastidial functions/photosynthesis, and to phenylpropanoid, triterpenoid and jasmonate metabolism are highlighted here. Conclusion The large scale hypothesis generation tools presented here provide leads to new pathways, unexpected functions, and regulatory networks for many P450s in plant metabolism. These can now be exploited by the community to validate the proposed functions experimentally using reverse genetics, biochemistry, and metabolic profiling.</p

University of Toronto Research Repository

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Complexity and specificity of the maize (Zea mays L.) root hair transcriptome

Author: Baldauf Jutta
Hey Stefan
Hochholdinger Frank
Lithio Andrew
Nettleton Dan
Nettleton Dan
Opitz Nina
Pasha Asher
Provart NIcholas
Publication venue: Iowa State University Digital Repository
Publication date: 01/04/2017
Field of study

Root hairs are tubular extensions of epidermis cells. Transcriptome profiling demonstrated that the single cell-type root hair transcriptome was less complex than the transcriptome of multiple cell-type primary roots without root hairs. In total, 831 genes were exclusively and 5585 genes were preferentially expressed in root hairs [false discovery rate (FDR) ≤1%]. Among those, the most significantly enriched Gene Ontology (GO) functional terms were related to energy metabolism, highlighting the high energy demand for the development and function of root hairs. Subsequently, the maize homologs for 138 Arabidopsis genes known to be involved in root hair development were identified and their phylogenetic relationship and expression in root hairs were determined. This study indicated that the genetic regulation of root hair development in Arabidopsis and maize is controlled by common genes, but also shows differences which need to be dissected in future genetic experiments. Finally, a maize root view of the eFP browser was implemented including the root hair transcriptome of the present study and several previously published maize root transcriptome data sets. The eFP browser provides color-coded expression levels for these root types and tissues for any gene of interest, thus providing a novel resource to study gene expression and function in maize roots

Digital Repository @ Iowa State University (ISU)

Recommended from our members

Light-responsive expression atlas reveals the effects of light quality and intensity in Kalanchoë fedtschenkoi, a plant with crassulacean acid metabolism.

Author: Borland Anne M
Chen Jin-Gui
Cushman John C
Garcia Travis M
Hu Rongbin
Lipzen Anna
Liu Degao
Muchero Wellington
Ng Vivian
Pasha Asher
Provart Nicholas J
Schmutz Jeremy
Sreedasyam Avinash
Tuskan Gerald A
Wang Mei
Yang Xiaohan
Yerramsetty Pradeep
Zhang Jin
Publication venue: eScholarship, University of California
Publication date: 01/03/2020
Field of study

BackgroundCrassulacean acid metabolism (CAM), a specialized mode of photosynthesis, enables plant adaptation to water-limited environments and improves photosynthetic efficiency via an inorganic carbon-concentrating mechanism. Kalanchoë fedtschenkoi is an obligate CAM model featuring a relatively small genome and easy stable transformation. However, the molecular responses to light quality and intensity in CAM plants remain understudied.ResultsHere we present a genome-wide expression atlas of K. fedtschenkoi plants grown under 12 h/12 h photoperiod with different light quality (blue, red, far-red, white light) and intensity (0, 150, 440, and 1,000 μmol m-2 s-1) based on RNA sequencing performed for mature leaf samples collected at dawn (2 h before the light period) and dusk (2 h before the dark period). An eFP web browser was created for easy access of the gene expression data. Based on the expression atlas, we constructed a light-responsive co-expression network to reveal the potential regulatory relationships in K. fedtschenkoi. Measurements of leaf titratable acidity, soluble sugar, and starch turnover provided metabolic indicators of the magnitude of CAM under the different light treatments and were used to provide biological context for the expression dataset. Furthermore, CAM-related subnetworks were highlighted to showcase genes relevant to CAM pathway, circadian clock, and stomatal movement. In comparison with white light, monochrome blue/red/far-red light treatments repressed the expression of several CAM-related genes at dusk, along with a major reduction in acid accumulation. Increasing light intensity from an intermediate level (440 μmol m-2 s-1) of white light to a high light treatment (1,000 μmol m-2 s-1) increased expression of several genes involved in dark CO2 fixation and malate transport at dawn, along with an increase in organic acid accumulation.ConclusionsThis study provides a useful genomics resource for investigating the molecular mechanism underlying the light regulation of physiology and metabolism in CAM plants. Our results support the hypothesis that both light intensity and light quality can modulate the CAM pathway through regulation of CAM-related genes in K. fedtschenkoi

eScholarship - University of California