73 research outputs found

    NLStradamus: a simple Hidden Markov Model for nuclear localization signal prediction

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Nuclear localization signals (NLSs) are stretches of residues within a protein that are important for the regulated nuclear import of the protein. Of the many import pathways that exist in yeast, the best characterized is termed the 'classical' NLS pathway. The classical NLS contains specific patterns of basic residues and computational methods have been designed to predict the location of these motifs on proteins. The consensus sequences, or patterns, for the other import pathways are less well-understood.</p> <p>Results</p> <p>In this paper, we present an analysis of characterized NLSs in yeast, and find, despite the large number of nuclear import pathways, that NLSs seem to show similar patterns of amino acid residues. We test current prediction methods and observe a low true positive rate. We therefore suggest an approach using hidden Markov models (HMMs) to predict novel NLSs in proteins. We show that our method is able to consistently find 37% of the NLSs with a low false positive rate and that our method retains its true positive rate outside of the yeast data set used for the training parameters.</p> <p>Conclusion</p> <p>Our implementation of this model, NLStradamus, is made available at: <url>http://www.moseslab.csb.utoronto.ca/NLStradamus/</url></p

    Intuitive Visualization and Analysis of Multi-Omics Data and Application to Escherichia coli Carbon Metabolism

    Get PDF
    Combinations of ‘omics’ investigations (i.e, transcriptomic, proteomic, metabolomic and/or fluxomic) are increasingly applied to get comprehensive understanding of biological systems. Because the latter are organized as complex networks of molecular and functional interactions, the intuitive interpretation of multi-omics datasets is difficult. Here we describe a simple strategy to visualize and analyze multi-omics data. Graphical representations of complex biological networks can be generated using Cytoscape where all molecular and functional components could be explicitly represented using a set of dedicated symbols. This representation can be used i) to compile all biologically-relevant information regarding the network through web link association, and ii) to map the network components with multi-omics data. A Cytoscape plugin was developed to increase the possibilities of both multi-omic data representation and interpretation. This plugin allowed different adjustable colour scales to be applied to the various omics data and performed the automatic extraction and visualization of the most significant changes in the datasets. For illustration purpose, the approach was applied to the central carbon metabolism of Escherichia coli. The obtained network contained 774 components and 1232 interactions, highlighting the complexity of bacterial multi-level regulations. The structured representation of this network represents a valuable resource for systemic studies of E. coli, as illustrated from the application to multi-omics data. Some current issues in network representation are discussed on the basis of this work

    Identifying Cis-Regulatory Sequences by Word Profile Similarity

    Get PDF
    Recognizing regulatory sequences in genomes is a continuing challenge, despite a wealth of available genomic data and a growing number of experimentally validated examples.We discuss here a simple approach to search for regulatory sequences based on the compositional similarity of genomic regions and known cis-regulatory sequences. This method, which is not limited to searching for predefined motifs, recovers sequences known to be under similar regulatory control. The words shared by the recovered sequences often correspond to known binding sites. Furthermore, we show that although local word profile clustering is predictive for the regulatory sequences involved in blastoderm segmentation, local dissimilarity is a more universal feature of known regulatory sequences in Drosophila.Our method leverages sequence motifs within a known regulatory sequence to identify co-regulated sequences without explicitly defining binding sites. We also show that regulatory sequences can be distinguished from surrounding sequences by local sequence dissimilarity, a novel feature in identifying regulatory sequences across a genome. Source code for WPH-finder is available for download at http://rana.lbl.gov/downloads/wph.tar.gz

    ePlant: Visualizing and Exploring Multiple Levels of Data for Hypothesis Generation in Plant Biology

    No full text
    A big challenge in current systems biology research arises when different types of data must be accessed from separate sources and visualized using separate tools. The high cognitive load required to navigate such a workflow is detrimental to hypothesis generation. Accordingly, there is a need for a robust research platform that incorporates all data and provides integrated search, analysis, and visualization features through a single portal. Here, we present ePlant (http://bar.utoronto.ca/eplant), a visual analytic tool for exploring multiple levels of Arabidopsis thaliana data through a zoomable user interface. ePlant connects to several publicly available web services to download genome, proteome, interactome, transcriptome, and 3D molecular structure data for one or more genes or gene products of interest. Data are displayed with a set of visualization tools that are presented using a conceptual hierarchy from big to small, and many of the tools combine information from more than one data type. We describe the development of ePlant in this article and present several examples illustrating its integrative features for hypothesis generation. We also describe the process of deploying ePlant as an “app” on Araport. Building on readily available web services, the code for ePlant is freely available for any other biological species research

    Reading the Second Code: Mapping Epigenomes to Understand Plant Growth, Development, and Adaptation to the Environment

    Get PDF
    We have entered a new era in agricultural and biomedical science made possible by remarkable advances in DNA sequencing technologies. The complete sequence of an individual's set of chromosomes (collectively, its genome) provides a primary genetic code for what makes that individual unique, just as the contents of every personal computer reflect the unique attributes of its owner. But a second code, composed of "epigenetic" layers of information, affects the accessibility of the stored information and the execution of specific tasks. Nature's second code is enigmatic and must be deciphered if we are to fully understand and optimize the genetic potential of crop plants. The goal of the Epigenomics of Plants International Consortium is to crack this second code, and ultimately master its control, to help catalyze a new green revolution

    Vision, challenges and opportunities for a Plant Cell Atlas

    Get PDF
    With growing populations and pressing environmental problems, future economies will be increasingly plant-based. Now is the time to reimagine plant science as a critical component of fundamental science, agriculture, environmental stewardship, energy, technology and healthcare. This effort requires a conceptual and technological framework to identify and map all cell types, and to comprehensively annotate the localization and organization of molecules at cellular and tissue levels. This framework, called the Plant Cell Atlas (PCA), will be critical for understanding and engineering plant development, physiology and environmental responses. A workshop was convened to discuss the purpose and utility of such an initiative, resulting in a roadmap that acknowledges the current knowledge gaps and technical challenges, and underscores how the PCA initiative can help to overcome them.National Science Foundation 1916797 David W Ehrhardt, Kenneth D Birnbaum, Seung Yon Rhee; National Science Foundation 2052590 Seung Yon Rhe

    The Re-Establishment of Desiccation Tolerance in Germinated Arabidopsis thaliana Seeds and Its Associated Transcriptome

    Get PDF
    The combination of robust physiological models with “omics” studies holds promise for the discovery of genes and pathways linked to how organisms deal with drying. Here we used a transcriptomics approach in combination with an in vivo physiological model of re-establishment of desiccation tolerance (DT) in Arabidopsis thaliana seeds. We show that the incubation of desiccation sensitive (DS) germinated Arabidopsis seeds in a polyethylene glycol (PEG) solution re-induces the mechanisms necessary for expression of DT. Based on a SNP-tile array gene expression profile, our data indicates that the re-establishment of DT, in this system, is related to a programmed reversion from a metabolic active to a quiescent state similar to prior to germination. Our findings show that transcripts of germinated seeds after the PEG-treatment are dominated by those encoding LEA, seed storage and dormancy related proteins. On the other hand, a massive repression of genes belonging to many other classes such as photosynthesis, cell wall modification and energy metabolism occurs in parallel. Furthermore, comparison with a similar system for Medicago truncatula reveals a significant overlap between the two transcriptomes. Such overlap may highlight core mechanisms and key regulators of the trait DT. Taking into account the availability of the many genetic and molecular resources for Arabidopsis, the described system may prove useful for unraveling DT in higher plants

    ePlant and the 3D Data Display Initiative: Integrative Systems Biology on the World Wide Web

    Get PDF
    Visualization tools for biological data are often limited in their ability to interactively integrate data at multiple scales. These computational tools are also typically limited by two-dimensional displays and programmatic implementations that require separate configurations for each of the user's computing devices and recompilation for functional expansion. Towards overcoming these limitations we have developed “ePlant” (http://bar.utoronto.ca/eplant) – a suite of open-source world wide web-based tools for the visualization of large-scale data sets from the model organism Arabidopsis thaliana. These tools display data spanning multiple biological scales on interactive three-dimensional models. Currently, ePlant consists of the following modules: a sequence conservation explorer that includes homology relationships and single nucleotide polymorphism data, a protein structure model explorer, a molecular interaction network explorer, a gene product subcellular localization explorer, and a gene expression pattern explorer. The ePlant's protein structure explorer module represents experimentally determined and theoretical structures covering >70% of the Arabidopsis proteome. The ePlant framework is accessed entirely through a web browser, and is therefore platform-independent. It can be applied to any model organism. To facilitate the development of three-dimensional displays of biological data on the world wide web we have established the “3D Data Display Initiative” (http://3ddi.org)

    Current status of the multinational Arabidopsis community

    Get PDF
    The multinational Arabidopsis research community is highly collaborative and over the past thirty years these activities have been documented by the Multinational Arabidopsis Steering Committee (MASC). Here, we (a) highlight recent research advances made with the reference plant Arabidopsis thaliana; (b) provide summaries from recent reports submitted by MASC subcommittees, projects and resources associated with MASC and from MASC country representatives; and (c) initiate a call for ideas and foci for the “fourth decadal roadmap,” which will advise and coordinate the global activities of the Arabidopsis research community

    Dissection of the Complex Phenotype in Cuticular Mutants of Arabidopsis Reveals a Role of SERRATE as a Mediator

    Get PDF
    Mutations in LACERATA (LCR), FIDDLEHEAD (FDH), and BODYGUARD (BDG) cause a complex developmental syndrome that is consistent with an important role for these Arabidopsis genes in cuticle biogenesis. The genesis of their pleiotropic phenotypes is, however, poorly understood. We provide evidence that neither distorted depositions of cutin, nor deficiencies in the chemical composition of cuticular lipids, account for these features, instead suggesting that the mutants alleviate the functional disorder of the cuticle by reinforcing their defenses. To better understand how plants adapt to these mutations, we performed a genome-wide gene expression analysis. We found that apparent compensatory transcriptional responses in these mutants involve the induction of wax, cutin, cell wall, and defense genes. To gain greater insight into the mechanism by which cuticular mutations trigger this response in the plants, we performed an overlap meta-analysis, which is termed MASTA (MicroArray overlap Search Tool and Analysis), of differentially expressed genes. This suggested that different cell integrity pathways are recruited in cesA cellulose synthase and cuticular mutants. Using MASTA for an in silico suppressor/enhancer screen, we identified SERRATE (SE), which encodes a protein of RNA–processing multi-protein complexes, as a likely enhancer. In confirmation of this notion, the se lcr and se bdg double mutants eradicate severe leaf deformations as well as the organ fusions that are typical of lcr and bdg and other cuticular mutants. Also, lcr does not confer resistance to Botrytis cinerea in a se mutant background. We propose that there is a role for SERRATE-mediated RNA signaling in the cuticle integrity pathway
    corecore