6 research outputs found
A Real-Time All-Atom Structural Search Engine for Proteins
<div><p>Protein designers use a wide variety of software tools for <i>de novo</i> design, yet their repertoire still lacks a fast and interactive all-atom search engine. To solve this, we have built the Suns program: a real-time, atomic search engine integrated into the PyMOL molecular visualization system. Users build atomic-level structural search queries within PyMOL and receive a stream of search results aligned to their query within a few seconds. This instant feedback cycle enables a new “designability”-inspired approach to protein design where the designer searches for and interactively incorporates native-like fragments from proven protein structures. We demonstrate the use of Suns to interactively build protein motifs, tertiary interactions, and to identify scaffolds compatible with hot-spot residues. The official web site and installer are located at <a href="http://www.degradolab.org/suns/" target="_blank">http://www.degradolab.org/suns/</a> and the source code is hosted at <a href="https://github.com/godotgildor/Suns" target="_blank">https://github.com/godotgildor/Suns</a> (PyMOL plugin, BSD license), <a href="https://github.com/Gabriel439/suns-cmd" target="_blank">https://github.com/Gabriel439/suns-cmd</a> (command line client, BSD license), and <a href="https://github.com/Gabriel439/suns-search" target="_blank">https://github.com/Gabriel439/suns-search</a> (search engine server, GPLv2 license).</p></div
Throughput benchmarks.
<p>We compare throughput of search queries for both Suns and Erebus, defined as query time divided by number of models in the data set. Suns throughput is measured against a locally hosted server and the Erebus throughput data is taken from <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1003750#pcbi.1003750-Shirvanyants1" target="_blank">[12]</a>. Detailed query information, including the query size in atoms and the number of matches, is provided in <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1003750#pcbi.1003750.s003" target="_blank">Table S3</a> and the specific query PDB files are included in the benchmark suite of suns-cmd (<a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1003750#pcbi.1003750.s006" target="_blank">Software S2</a>).</p
Searching for calcium binding sites.
<p>(A) Two side chains of the EF-hand of calmodulin suffice to find matching motifs. The search query (black dashes) consists exclusively of two aspartate side chains (D20 and D24) and does not include the calcium ligand. (B) Searching for these two side chains at 0.7 Ă… resolution returns seven results, all of which are EF-hand motifs. Six of these motifs coordinate a matching calcium ion (green sphere), and the seventh motif coordinates a sodium ion (purple sphere).</p
Overview of Suns algorithm and architecture.
<p>(Inputs) The search index is built from two inputs: a set of words to recognize and a set of protein structures to search subdivided into pages. (Index) The two underlying data structures are a forward index that translates words to matching pages and a database of every page which translates matched words to atoms within each page. (Server) Each request to the server is broken into three steps: consult the forward index to find potentially matching pages, filter matching pages by RMSD to the query, and aligning successful matches to the query. (Queue) A message queue forwards requests from clients to servers, and forwards responses from servers to clients. (Clients) Suns provides two client interfaces: a PyMOL search plugin and the suns-cmd command line interface.</p
Building a tertiary interaction.
<p>(A) Three strands are seeded by searching on a valine, which reveals two nearby clusters of valine and tyrosine. (B) Strands are extended one residue in each direction by searching for pairs of residues (colored yellow) in the context of an insertion site, yielding clusters of potential inserts (colored green). (C) The final backbone fragment (green) is fed to MadCaT, which identifies multiple compatible scaffolds. One such scaffold (PDB ID = 1E54, colored light grey) possesses many exact residue/rotamer matches to the assembled fragment (blue highlights) and many close matches (yellow highlights) that differ by a related residue (threonine to serine or valine to isoleucine) or by varying the rotamer.</p
Incremental assembly of a motif.
<p>(A) An initial search for a guanidinium fragment reveals a cluster of nearby carboxylates. (B) Refining the search with one carboxylate from the results reveals a specific linker preference for both the aspartate and arginine involved in the salt bridge. (C) Adding the most common linker for arginine and repeating the search reveals that this salt bridge is part of an N-terminal capping motif. Search queries are represented as thick sticks and search results are shown as thin lines. Grey dashed lines highlight search queries and black dashed lines highlight clusters in the search results, which are filtered to show the specific residue fragments of interest and neighboring water molecules within 3.0 Ă… as red spheres. Search parameters and fragments listed in <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1003750#pcbi.1003750.s002" target="_blank">Table S2</a>.</p