77 research outputs found

    SHEPHARD: A modular and extensible software architecture for analyzing and annotating large protein datasets

    Get PDF
    MOTIVATION: The emergence of high-throughput experiments and high-resolution computational predictions has led to an explosion in the quality and volume of protein sequence annotations at proteomic scales. Unfortunately, sanity checking, integrating, and analyzing complex sequence annotations remains logistically challenging and introduces a major barrier to entry for even superficial integrative bioinformatics. RESULTS: To address this technical burden, we have developed SHEPHARD, a Python framework that trivializes large-scale integrative protein bioinformatics. SHEPHARD combines an object-oriented hierarchical data structure with database-like features, enabling programmatic annotation, integration, and analysis of complex datatypes. Importantly SHEPHARD is easy to use and enables a Pythonic interrogation of largescale protein datasets with millions of unique annotations. We use SHEPHARD to examine three orthogonal proteome-wide questions relating protein sequence to molecular function, illustrating its ability to uncover novel biology. AVAILABILITY AND IMPLEMENTATION: We provided SHEPHARD as both a stand-alone software package (https://github.com/holehouse-lab/shephard), and as a Google Colab notebook with a collection of precomputed proteome-wide annotations (https://github.com/holehouse-lab/shephard-colab)

    Clustering of aromatic residues in prion-like domains can tune the formation, state, and organization of biomolecular condensates

    Get PDF
    In immature oocytes, Balbiani bodies are conserved membraneless condensates implicated in oocyte polarization, the organization of mitochondria, and long-term organelle and RNA storage. I

    Intrinsically disordered regions are poised to act as sensors of cellular chemistry

    Get PDF
    Intrinsically disordered proteins and protein regions (IDRs) are abundant in eukaryotic proteomes and play a wide variety of essential roles. Instead of folding into a stable structure, IDRs exist in an ensemble of interconverting conformations whose structure is biased by sequence-dependent interactions. The absence of a stable 3D structure, combined with high solvent accessibility, means that IDR conformational biases are inherently sensitive to changes in their environment. Here, we argue that IDRs are ideally poised to act as sensors and actuators of cellular physicochemistry. We review the physical principles that underlie IDR sensitivity, the molecular mechanisms that translate this sensitivity to function, and recent studies where environmental sensing by IDRs may play a key role in their downstream function

    Direct prediction of intrinsically disordered protein conformational properties from sequence

    Get PDF
    Intrinsically disordered regions (IDRs) are ubiquitous across all domains of life and play a range of functional roles. While folded domains are generally well described by a stable three-dimensional structure, IDRs exist in a collection of interconverting states known as an ensemble. This structural heterogeneity means that IDRs are largely absent from the Protein Data Bank, contributing to a lack of computational approaches to predict ensemble conformational properties from sequence. Here we combine rational sequence design, large-scale molecular simulations and deep learning to develop ALBATROSS, a deep-learning model for predicting ensemble dimensions of IDRs, including the radius of gyration, end-to-end distance, polymer-scaling exponent and ensemble asphericity, directly from sequences at a proteome-wide scale. ALBATROSS is lightweight, easy to use and accessible as both a locally installable software package and a point-and-click-style interface via Google Colab notebooks. We first demonstrate the applicability of our predictors by examining the generalizability of sequence-ensemble relationships in IDRs. Then, we leverage the high-throughput nature of ALBATROSS to characterize the sequence-specific biophysical behavior of IDRs within and between proteomes

    Disordered clock protein interactions and charge blocks turn an hourglass into a persistent circadian oscillator

    Get PDF
    Organismal physiology is widely regulated by the molecular circadian clock, a feedback loop composed of protein complexes whose members are enriched in intrinsically disordered regions. These regions can mediate protein-protein interactions via SLiMs, but the contribution of these disordered regions to clock protein interactions had not been elucidated. To determine the functionality of these disordered regions, we applied a synthetic peptide microarray approach to the disordered clock protein FRQ in Neurospora crassa. We identified residues required for FRQ\u27s interaction with its partner protein FRH, the mutation of which demonstrated FRH is necessary for persistent clock oscillations but not repression of transcriptional activity. Additionally, the microarray demonstrated an enrichment of FRH binding to FRQ peptides with a net positive charge. We found that positively charged residues occurred in significant blocks within the amino acid sequence of FRQ and that ablation of one of these blocks affected both core clock timing and physiological clock output. Finally, we found positive charge clusters were a commonly shared molecular feature in repressive circadian clock proteins. Overall, our study suggests a mechanistic purpose for positive charge blocks and yielded insights into repressive arm protein roles in clock function

    Adaptable P body physical states differentially regulate bicoid mRNA storage during early Drosophila development.

    Get PDF
    Ribonucleoprotein condensates can exhibit diverse physical states in vitro and in vivo. Despite considerable progress, the relevance of condensate physical states for in vivo biological function remains limited. Here, we investigated the physical properties of processing bodies (P bodies) and their impact on mRNA storage in mature Drosophila oocytes. We show that the conserved DEAD-box RNA helicase Me31B forms viscous P body condensates, which adopt an arrested physical state. We demonstrate that structurally distinct proteins and protein-protein interactions, together with RNA, regulate the physical properties of P bodies. Using live imaging and in situ hybridization, we show that the arrested state and integrity of P bodies support the storage of bicoid (bcd) mRNA and that egg activation modulates P body properties, leading to the release of bcd for translation in the early embryo. Together, this work provides an example of how physical states of condensates regulate cellular function in development

    A survey-based analysis of the academic job market

    Get PDF
    Many postdoctoral researchers apply for faculty positions knowing relatively little about the hiring process or what is needed to secure a job offer. To address this lack of knowledge about the hiring process we conducted a survey of applicants for faculty positions: the survey ran between May 2018 and May 2019, and received 317 responses. We analyzed the responses to explore the interplay between various scholarly metrics and hiring outcomes. We concluded that, above a certain threshold, the benchmarks traditionally used to measure research success - including funding, number of publications or journals published in - were unable to completely differentiate applicants with and without job offers. Respondents also reported that the hiring process was unnecessarily stressful, time-consuming, and lacking in feedback, irrespective of outcome. Our findings suggest that there is considerable scope to improve the transparency of the hiring process

    ProteomeScout: A repository and analysis resource for post-translational modifications and proteins

    Get PDF
    ProteomeScout (https://proteomescout.wustl.edu) is a resource for the study of proteins and their post-translational modifications (PTMs) consisting of a database of PTMs, a repository for experimental data, an analysis suite for PTM experiments, and a tool for visualizing the relationships between complex protein annotations. The PTM database is a compendium of public PTM data, coupled with user-uploaded experimental data. ProteomeScout provides analysis tools for experimental datasets, including summary views and subset selection, which can identify relationships within subsets of data by testing for statistically significant enrichment of protein annotations. Protein annotations are incorporated in the ProteomeScout database from external resources and include terms such as Gene Ontology annotations, domains, secondary structure and non-synonymous polymorphisms. These annotations are available in the database download, in the analysis tools and in the protein viewer. The protein viewer allows for the simultaneous visualization of annotations in an interactive web graphic, which can be exported in Scalable Vector Graphics (SVG) format. Finally, quantitative data measurements associated with public experiments are also easily viewable within protein records, allowing researchers to see how PTMs change across different contexts. ProteomeScout should prove useful for protein researchers and should benefit the proteomics community by providing a stable repository for PTM experiments

    Reproducible Analysis of Post-Translational Modifications in Proteomes—Application to Human Mutations

    No full text
    <div><p>Background</p><p>Protein post-translational modifications (PTMs) are an important aspect of protein regulation. The number of PTMs discovered within the human proteome, and other proteomes, has been rapidly expanding in recent years. As a consequence of the rate in which new PTMs are identified, analysis done in one year may result in different conclusions when repeated in subsequent years. Among the various functional questions pertaining to PTMs, one important relationship to address is the interplay between modifications and mutations. Specifically, because the linear sequence surrounding a modification site often determines molecular recognition, it is hypothesized that mutations near sites of PTMs may be more likely to result in a detrimental effect on protein function, resulting in the development of disease.</p><p>Methods and Results</p><p>We wrote an application programming interface (API) to make analysis of ProteomeScout, a comprehensive database of PTMs and protein information, easy and reproducible. We used this API to analyze the relationship between PTMs and human mutations associated with disease (based on the ‘Clinical Significance’ annotation from dbSNP). Proteins containing pathogenic mutations demonstrated a significant study bias which was controlled for by analyzing only well-studied proteins, based on their having at least one pathogenic mutation. We found that pathogenic mutations are significantly more likely to lie within eight amino acids of a phosphoserine, phosphotyrosine or ubiquitination site when compared to mutations in general, based on a Fisher’s Exact test. Despite the skew of pathogenic mutations occurring on positively charged arginines, we could not account for this relationship based only on residue type. Finally, we hypothesize a potential mechanism for a pathogenic mutation on RAF1, based on its proximity to a phosphorylation site, which represents a subtle regulation difference that may explain why its biochemical effect has failed to be uncovered previously. The combination of the API and a dynamically expanding PTM database will make the reanalysis of this question and other systems-level questions easier in the future.</p></div

    Correlation between protein annotations.

    No full text
    <p>Scatter plots for all comparisons of the number of annotations per protein or the length of the protein. Points in blue represent the number of annotations on a protein that does not contain a pathogenic mutation. Red represents a protein with at least one pathogenic mutation, which becomes the set of proteins studied in subsequent analyses. Correlations between numbers of labels on a per protein basis are given as well as the correlation between annotations on the pathogenic set. All correlations were significant with a p-value less than 1E-08. These plots and correlation values, broken down by PTM type, are available in <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0144692#pone.0144692.s004" target="_blank">S1 Fig</a>.</p
    • …
    corecore