15 research outputs found

    Christine Vogel.

    No full text
    <p>Photo courtesy of Christine Vogel.</p

    Motivation and Outline of the Analysis

    No full text
    <div><p>(A) The number of genes and eukaryotic complexity are uncorrelated. The figure displays for 38 eukaryotic genomes the estimated number of different cell types [<a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.0020048#pcbi-0020048-b028" target="_blank">28</a>,<a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.0020048#pcbi-0020048-b029" target="_blank">29</a>] in relation to the predicted total number of genes. The tree indicates, in a simplified form, the phylogenetic relationships between the organisms as taken from the National Center of Biotechnology Information (NCBI) taxonomy server (<a href="http://www.ncbi.nlm.nih.gov/Taxonomy" target="_blank">http://www.ncbi.nlm.nih.gov/Taxonomy</a>). The order of the organisms is the same in all figures and tables; their major groups are: plants (green), protozoa (blue), fungi (black), and animals (red and brown). The correlation between the number of different cell types and the number of genes is poor (<i>R<sup>2</sup></i> = 0.29, <i>R</i> = 0.54).</p><p>Within the plants, we distinguish green algae <i>(Cre, Chlamydomonas reinhardtii),</i> and flowering plants <i>(Osa, O. sativa; Ath, Arabidopsis thaliana).</i> We include eight protozoa <i>(Ddi, Dictyostelium discoideum; Tbr, Trypanosoma brucei; Lma, Leishmania major; Pra, Phytophthora ramorum; Tps, Thalassiosira pseudonana; Ehi, Entamoeba histolytica; Tan, Theileria annulata; Pfa, Plasmodium falciparum),</i> and ten fungi <i>(Ncr, Neurospora crassa; Eni, Emericella nidulans; Spo, Schizosaccharomyces pombe; Sce, S. cerevisiae; Kla, Kluyveromyces lactis; Cal, Candida albicans; Yli, Yarrowia lipolytica; Ecu, Encephalitozoon cuniculi; Pch, Phanerochaete chrysosporium; Uma, Ustilago maydis).</i> Protostomia include two nematodes <i>(Cbr, Caenorhabditis briggsae; Cel, C. elegans),</i> and three insects <i>(Ame, Apis mellifera; Aga, Anopheles gambiae; Dme, D. melanogaster).</i> Deuterostomia include one urochordate <i>(Cin, Ciona intestinalis),</i> and 11 vertebrates, among which six are mammals <i>(Dre, Danio rerio; Tni, Tetraodon nigroviridis; Tru, Takifugu rubripes; Xtr, Xenopus tropicalis; Gga, Gallus gallus;</i> and <i>Cfa, Canis familiaris; Bta, Bos taurus; Rno, Rattus norvegicus; Mmu, Mus musculus; Ptr, Pan troglodytes;</i> and <i>Hsa, H. sapiens,</i> respectively).</p><p>(B) Outline of our analysis. For each of the 38 genomes (three, symbolised by circles), we collected information on the number of proteins (lines with boxes) that contain domains of particular superfamilies (boxes of particular colour). The resulting abundance profiles were normalised and compared both to the estimated number of different cell types in each organism, and to each other. Analysis of function of particular groups of domain superfamilies gives information on how their expansion in some organisms may have supported an increase in organismal complexity.</p></div

    Some Family Expansions Correlate Well with the Number of Different Cell Types in Each Organism

    No full text
    <p>For each of the 1,219 domain superfamilies and their profile of abundance in the 38 genomes, we calculated the correlation coefficient <i>R</i> of the profile with the number of different cell types per organism. The distribution of <i>R</i> values is plotted in black. For the subset of largest superfamilies (i.e., those with at least 25 proteins in one of the genomes) the distribution of <i>R</i> values is shown in red. There are few superfamilies with high correlation (<i>R</i> ≥ 0.80), and many with poor correlation or slight anticorrelation (<i>R</i> ≤ 0.20); this distribution is similar for both sets of superfamilies.</p

    Domain Superfamilies Show Different Expansion Patterns

    No full text
    <p>The matrix shows the 299 largest domain superfamilies that occur in ≥25 proteins in at least one of the genomes, hierarchically clustered. Each row represents one superfamily. Colour-coded profiles show the normalised abundance of each domain superfamily across the different eukaryotic genomes: white, low relative abundance; blue, high relative abundance. Each column represents one genome. All genomes are abbreviated and organised as in <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.0020048#pcbi-0020048-g001" target="_blank">Figure 1</a>A. A grouping of superfamily pairs with <i>R</i> ≥ 0.90 results in 26 clusters, and the three largest clusters are indicated in red boxes: expansions in vertebrates (52 superfamilies) and expansions in plants (33 superfamilies), and expansions in vertebrates and plants (26 superfamilies). Further descriptions can be found in <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.0020048#pcbi-0020048-t004" target="_blank">Table 4</a> and at <a href="http://polaris.icmb.utexas.edu/people/cvogel/HV" target="_blank">http://polaris.icmb.utexas.edu/people/cvogel/HV</a>.</p

    PECA: A Novel Statistical Tool for Deconvoluting Time-Dependent Gene Expression Regulation

    No full text
    Protein expression varies as a result of intricate regulation of synthesis and degradation of messenger RNAs (mRNA) and proteins. Studies of dynamic regulation typically rely on time-course data sets of mRNA and protein expression, yet there are no statistical methods that integrate these multiomics data and deconvolute individual regulatory processes of gene expression control underlying the observed concentration changes. To address this challenge, we developed Protein Expression Control Analysis (PECA), a method to quantitatively dissect protein expression variation into the contributions of mRNA synthesis/degradation and protein synthesis/degradation, termed RNA-level and protein-level regulation respectively. PECA computes the rate ratios of synthesis versus degradation as the statistical summary of expression control during a given time interval at each molecular level and computes the probability that the rate ratio changed between adjacent time intervals, indicating regulation change at the time point. Along with the associated false-discovery rates, PECA gives the complete description of dynamic expression control, that is, which proteins were up- or down-regulated at each molecular level and each time point. Using PECA, we analyzed two yeast data sets monitoring the cellular response to hyperosmotic and oxidative stress. The rate ratio profiles reported by PECA highlighted a large magnitude of RNA-level up-regulation of stress response genes in the early response and concordant protein-level regulation with time delay. However, the contributions of RNA- and protein-level regulation and their temporal patterns were different between the two data sets. We also observed several cases where protein-level regulation counterbalanced transcriptomic changes in the early stress response to maintain the stability of protein concentrations, suggesting that proteostasis is a proteome-wide phenomenon mediated by post-transcriptional regulation

    PECA: A Novel Statistical Tool for Deconvoluting Time-Dependent Gene Expression Regulation

    No full text
    Protein expression varies as a result of intricate regulation of synthesis and degradation of messenger RNAs (mRNA) and proteins. Studies of dynamic regulation typically rely on time-course data sets of mRNA and protein expression, yet there are no statistical methods that integrate these multiomics data and deconvolute individual regulatory processes of gene expression control underlying the observed concentration changes. To address this challenge, we developed Protein Expression Control Analysis (PECA), a method to quantitatively dissect protein expression variation into the contributions of mRNA synthesis/degradation and protein synthesis/degradation, termed RNA-level and protein-level regulation respectively. PECA computes the rate ratios of synthesis versus degradation as the statistical summary of expression control during a given time interval at each molecular level and computes the probability that the rate ratio changed between adjacent time intervals, indicating regulation change at the time point. Along with the associated false-discovery rates, PECA gives the complete description of dynamic expression control, that is, which proteins were up- or down-regulated at each molecular level and each time point. Using PECA, we analyzed two yeast data sets monitoring the cellular response to hyperosmotic and oxidative stress. The rate ratio profiles reported by PECA highlighted a large magnitude of RNA-level up-regulation of stress response genes in the early response and concordant protein-level regulation with time delay. However, the contributions of RNA- and protein-level regulation and their temporal patterns were different between the two data sets. We also observed several cases where protein-level regulation counterbalanced transcriptomic changes in the early stress response to maintain the stability of protein concentrations, suggesting that proteostasis is a proteome-wide phenomenon mediated by post-transcriptional regulation

    PECA: A Novel Statistical Tool for Deconvoluting Time-Dependent Gene Expression Regulation

    No full text
    Protein expression varies as a result of intricate regulation of synthesis and degradation of messenger RNAs (mRNA) and proteins. Studies of dynamic regulation typically rely on time-course data sets of mRNA and protein expression, yet there are no statistical methods that integrate these multiomics data and deconvolute individual regulatory processes of gene expression control underlying the observed concentration changes. To address this challenge, we developed Protein Expression Control Analysis (PECA), a method to quantitatively dissect protein expression variation into the contributions of mRNA synthesis/degradation and protein synthesis/degradation, termed RNA-level and protein-level regulation respectively. PECA computes the rate ratios of synthesis versus degradation as the statistical summary of expression control during a given time interval at each molecular level and computes the probability that the rate ratio changed between adjacent time intervals, indicating regulation change at the time point. Along with the associated false-discovery rates, PECA gives the complete description of dynamic expression control, that is, which proteins were up- or down-regulated at each molecular level and each time point. Using PECA, we analyzed two yeast data sets monitoring the cellular response to hyperosmotic and oxidative stress. The rate ratio profiles reported by PECA highlighted a large magnitude of RNA-level up-regulation of stress response genes in the early response and concordant protein-level regulation with time delay. However, the contributions of RNA- and protein-level regulation and their temporal patterns were different between the two data sets. We also observed several cases where protein-level regulation counterbalanced transcriptomic changes in the early stress response to maintain the stability of protein concentrations, suggesting that proteostasis is a proteome-wide phenomenon mediated by post-transcriptional regulation

    PECA: A Novel Statistical Tool for Deconvoluting Time-Dependent Gene Expression Regulation

    No full text
    Protein expression varies as a result of intricate regulation of synthesis and degradation of messenger RNAs (mRNA) and proteins. Studies of dynamic regulation typically rely on time-course data sets of mRNA and protein expression, yet there are no statistical methods that integrate these multiomics data and deconvolute individual regulatory processes of gene expression control underlying the observed concentration changes. To address this challenge, we developed Protein Expression Control Analysis (PECA), a method to quantitatively dissect protein expression variation into the contributions of mRNA synthesis/degradation and protein synthesis/degradation, termed RNA-level and protein-level regulation respectively. PECA computes the rate ratios of synthesis versus degradation as the statistical summary of expression control during a given time interval at each molecular level and computes the probability that the rate ratio changed between adjacent time intervals, indicating regulation change at the time point. Along with the associated false-discovery rates, PECA gives the complete description of dynamic expression control, that is, which proteins were up- or down-regulated at each molecular level and each time point. Using PECA, we analyzed two yeast data sets monitoring the cellular response to hyperosmotic and oxidative stress. The rate ratio profiles reported by PECA highlighted a large magnitude of RNA-level up-regulation of stress response genes in the early response and concordant protein-level regulation with time delay. However, the contributions of RNA- and protein-level regulation and their temporal patterns were different between the two data sets. We also observed several cases where protein-level regulation counterbalanced transcriptomic changes in the early stress response to maintain the stability of protein concentrations, suggesting that proteostasis is a proteome-wide phenomenon mediated by post-transcriptional regulation

    The Overlap between AS and GD Insertions/Deletions

    No full text
    <p>The overlap between AS and GD indels is very small. For the frequency distribution of the overlap between AS and GD indels, AS indels were taken as reference. GD data at 80% seq.id. are shown in light violet, while GD data at 40% seq.id. are shown in dark and light blue for both all indels and only short indels (≤30aa), respectively. Given the small overlap, AS and GD indels are likely to affect different locations in protein structure.</p

    The Size Distribution of Insertions/Deletions in AS and GD

    No full text
    <div><p>All analyses of indels have been made for gene families with both AS and GD (i.e., AS+/GD+).</p><p>(A) AS indels are longer than GD indels. Indels for GD were obtained from the alignments of GD families at 40% (dark red) and 80% (light violet) seq.id. Information on AS indels (green) was obtained from the SwissProt record of the corresponding protein. Indel size distributions for both GD40 and GD80 are very similar, with most of the indels being shorter than five residues. In contrast, many AS indels are longer than 100 residues.</p><p>(B,C) Size distribution for external and internal indels in AS and GD. External indels (B) lie at the N- or C-terminal ends of the protein; internal indels (C) lie in the middle. AS and GD40 indel sizes are different depending on the position of the indels in the sequence. While AS indels are generally larger than GD indels (also see <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.0030033#pcbi-0030033-g006" target="_blank">Figure 6</a>A), external indels (B) are larger than internal ones (C), both for AS and GD. The shift in indel sizes implies that large indels (as often introduced by AS) are better-tolerated at the N- and C-termini of proteins, where they are less likely to induce important structural changes.</p></div
    corecore