47 research outputs found

    PARROT is a flexible recurrent neural network framework for analysis of large protein datasets

    Get PDF
    The rise of high-throughput experiments has transformed how scientists approach biological questions. The ubiquity of large-scale assays that can test thousands of samples in a day has necessitated the development of new computational approaches to interpret this data. Among these tools, machine learning approaches are increasingly being utilized due to their ability to infer complex nonlinear patterns from high-dimensional data. Despite their effectiveness, machine learning (and in particular deep learning) approaches are not always accessible or easy to implement for those with limited computational expertise. Here we present PARROT, a general framework for training and applying deep learning-based predictors on large protein datasets. Using an internal recurrent neural network architecture, PARROT is capable of tackling both classification and regression tasks while only requiring raw protein sequences as input. We showcase the potential uses of PARROT on three diverse machine learning tasks: predicting phosphorylation sites, predicting transcriptional activation function of peptides generated by high-throughput reporter assays, and predicting the fibrillization propensity of amyloid beta with data generated by deep mutational scanning. Through these examples, we demonstrate that PARROT is easy to use, performs comparably to state-of-the-art computational tools, and is applicable for a wide array of biological problems

    SARS-CoV-2 requires cholesterol for viral entry and pathological syncytia formation

    Get PDF
    Many enveloped viruses induce multinucleated cells (syncytia), reflective of membrane fusion events caused by the same machinery that underlies viral entry. These syncytia are thought to facilitate replication and evasion of the host immune response. Here, we report that co-culture of human cells expressing the receptor ACE2 with cells expressing SARS-CoV-2 spike, results in synapse-like intercellular contacts that initiate cell-cell fusion, producing syncytia resembling those we identify in lungs of COVID-19 patients. To assess the mechanism of spike/ACE2-driven membrane fusion, we developed a microscopy-based, cell-cell fusion assay to screen ~6000 drugs and \u3e30 spike variants. Together with quantitative cell biology approaches, the screen reveals an essential role for biophysical aspects of the membrane, particularly cholesterol-rich regions, in spike-mediated fusion, which extends to replication-competent SARS-CoV-2 isolates. Our findings potentially provide a molecular basis for positive outcomes reported in COVID-19 patients taking statins and suggest new strategies for therapeutics targeting the membrane of SARS-CoV-2 and other fusogenic viruses

    Sequence determinants of in cell condensate morphology, dynamics, and oligomerization as measured by number and brightness analysis

    Get PDF
    BACKGROUND: Biomolecular condensates are non-stoichiometric assemblies that are characterized by their capacity to spatially concentrate biomolecules and play a key role in cellular organization. Proteins that drive the formation of biomolecular condensates frequently contain oligomerization domains and intrinsically disordered regions (IDRs), both of which can contribute multivalent interactions that drive higher-order assembly. Our understanding of the relative and temporal contribution of oligomerization domains and IDRs to the material properties of in vivo biomolecular condensates is limited. Similarly, the spatial and temporal dependence of protein oligomeric state inside condensates has been largely unexplored in vivo. METHODS: In this study, we combined quantitative microscopy with number and brightness analysis to investigate the aging, material properties, and protein oligomeric state of biomolecular condensates in vivo. Our work is focused on condensates formed by AUXIN RESPONSE FACTOR 19 (ARF19), a transcription factor integral to the auxin signaling pathway in plants. ARF19 contains a large central glutamine-rich IDR and a C-terminal Phox Bem1 (PB1) oligomerization domain and forms cytoplasmic condensates. RESULTS: Our results reveal that the IDR amino acid composition can influence the morphology and material properties of ARF19 condensates. In contrast the distribution of oligomeric species within condensates appears insensitive to the IDR composition. In addition, we identified a relationship between the abundance of higher- and lower-order oligomers within individual condensates and their apparent fluidity. CONCLUSIONS: IDR amino acid composition affects condensate morphology and material properties. In ARF condensates, altering the amino acid composition of the IDR did not greatly affect the oligomeric state of proteins within the condensate. Video Abstract

    SHEPHARD: A modular and extensible software architecture for analyzing and annotating large protein datasets

    Get PDF
    MOTIVATION: The emergence of high-throughput experiments and high-resolution computational predictions has led to an explosion in the quality and volume of protein sequence annotations at proteomic scales. Unfortunately, sanity checking, integrating, and analyzing complex sequence annotations remains logistically challenging and introduces a major barrier to entry for even superficial integrative bioinformatics. RESULTS: To address this technical burden, we have developed SHEPHARD, a Python framework that trivializes large-scale integrative protein bioinformatics. SHEPHARD combines an object-oriented hierarchical data structure with database-like features, enabling programmatic annotation, integration, and analysis of complex datatypes. Importantly SHEPHARD is easy to use and enables a Pythonic interrogation of largescale protein datasets with millions of unique annotations. We use SHEPHARD to examine three orthogonal proteome-wide questions relating protein sequence to molecular function, illustrating its ability to uncover novel biology. AVAILABILITY AND IMPLEMENTATION: We provided SHEPHARD as both a stand-alone software package (https://github.com/holehouse-lab/shephard), and as a Google Colab notebook with a collection of precomputed proteome-wide annotations (https://github.com/holehouse-lab/shephard-colab)

    Clustering of aromatic residues in prion-like domains can tune the formation, state, and organization of biomolecular condensates

    Get PDF
    In immature oocytes, Balbiani bodies are conserved membraneless condensates implicated in oocyte polarization, the organization of mitochondria, and long-term organelle and RNA storage. I

    RNA-induced conformational switching and clustering of G3BP drive stress granule assembly by condensation

    Get PDF
    Stressed cells shut down translation, release mRNA molecules from polysomes, and form stress granules (SGs) via a network of interactions that involve G3BP. Here we focus on the mechanistic underpinnings of SG assembly. We show that, under non-stress conditions, G3BP adopts a compact auto-inhibited state stabilized by electrostatic intramolecular interactions between the intrinsically disordered acidic tracts and the positively charged arginine-rich region. Upon release from polysomes, unfolded mRNAs outcompete G3BP auto-inhibitory interactions, engendering a conformational transition that facilitates clustering of G3BP through protein-RNA interactions. Subsequent physical crosslinking of G3BP clusters drives RNA molecules into networked RNA/protein condensates. We show that G3BP condensates impede RNA entanglement and recruit additional client proteins that promote SG maturation or induce a liquid-to-solid transition that may underlie disease. We propose that condensation coupled to conformational rearrangements and heterotypic multivalent interactions may be a general principle underlying RNP granule assembly

    Unfolded states under folding conditions accommodate sequence-specific conformational preferences with random coil-like dimensions

    Get PDF
    Proteins are marginally stable molecules that fluctuate between folded and unfolded states. Here, we provide a high-resolution description of unfolded states under refolding conditions for the N-terminal domain of the L9 protein (NTL9). We use a combination of time-resolved Forster resonance energy transfer (FRET) based on multiple pairs of minimally perturbing labels, time-resolved small-angle X-ray scattering (SAXS), all-atom simulations, and polymer theory. Upon dilution from high denaturant, the unfolded state undergoes rapid contraction. Although this contraction occurs before the folding transition, the unfolded state remains considerably more expanded than the folded state and accommodates a range of local and nonlocal contacts, including secondary structures and native and nonnative interactions. Paradoxically, despite discernible sequence-specific conformational preferences, the ensemble-averaged properties of unfolded states are consistent with those of canonical random coils, namely polymers in indifferent (theta) solvents. These findings are concordant with theoretical predictions based on coarse-grained models and inferences drawn from single-molecule experiments regarding the sequence-specific scaling behavior of unfolded proteins under folding conditions

    Directed mutational scanning reveals a balance between acidic and hydrophobic residues in strong human activation domains

    Get PDF
    Acidic activation domains are intrinsically disordered regions of the transcription factors that bind coactivators. The intrinsic disorder and low evolutionary conservation of activation domains have made it difficult to identify the sequence features that control activity. To address this problem, we designed thousands of variants in seven acidic activation domains and measured their activities with a high-throughput assay in human cell culture. We found that strong activation domain activity requires a balance between the number of acidic residues and aromatic and leucine residues. These findings motivated a predictor of acidic activation domains that scans the human proteome for clusters of aromatic and leucine residues embedded in regions of high acidity. This predictor identifies known activation domains and accurately predicts previously unidentified ones. Our results support a flexible acidic exposure model of activation domains in which the acidic residues solubilize hydrophobic motifs so that they can interact with coactivators. A record of this paper\u27s transparent peer review process is included in the supplemental information

    Evolutionary fine-tuning of conformational ensembles in FimH during host-pathogen interactions

    Get PDF
    Positive selection in the two-domain type 1 pilus adhesin FimH enhances Escherichia coli fitness in urinary tract infection (UTI). We report a comprehensive atomic-level view of FimH in two-state conformational ensembles in solution, composed of one low-affinity tense (T) and multiple high-affinity relaxed (R) conformations. Positively selected residues allosterically modulate the equilibrium between these two conformational states, each of which engages mannose through distinct binding orientations. A FimH variant that only adopts the R state is severely attenuated early in a mouse model of uncomplicated UTI but is proficient at colonizing catheterized bladders in vivo or bladder transitional-like epithelial cells in vitro. Thus, the bladder habitat has barrier(s) to R state–mediated colonization possibly conferred by the terminally differentiated bladder epithelium and/or decoy receptors in urine. Together, our studies reveal the conformational landscape in solution, binding mechanisms, and adhesive strength of an allosteric two-domain adhesin that evolved “moderate” affinity to optimize persistence in the bladder during UTI

    The disordered N-terminal tail of SARS-CoV-2 Nucleocapsid protein forms a dynamic complex with RNA

    Get PDF
    The SARS-CoV-2 Nucleocapsid (N) protein is responsible for condensation of the viral genome. Characterizing the mechanisms controlling nucleic acid binding is a key step in understanding how condensation is realized. Here, we focus on the role of the RNA binding domain (RBD) and its flanking disordered N-terminal domain (NTD) tail, using single-molecule Förster Resonance Energy Transfer and coarse-grained simulations. We quantified contact site size and binding affinity for nucleic acids and concomitant conformational changes occurring in the disordered region. We found that the disordered NTD increases the affinity of the RBD for RNA by about 50-fold. Binding of both nonspecific and specific RNA results in a modulation of the tail configurations, which respond in an RNA length-dependent manner. Not only does the disordered NTD increase affinity for RNA, but mutations that occur in the Omicron variant modulate the interactions, indicating a functional role of the disordered tail. Finally, we found that the NTD-RBD preferentially interacts with single-stranded RNA and that the resulting protein:RNA complexes are flexible and dynamic. We speculate that this mechanism of interaction enables the Nucleocapsid protein to search the viral genome for and bind to high-affinity motifs
    corecore