15 research outputs found
Throughput benchmarks.
<p>We compare throughput of search queries for both Suns and Erebus, defined as query time divided by number of models in the data set. Suns throughput is measured against a locally hosted server and the Erebus throughput data is taken from <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1003750#pcbi.1003750-Shirvanyants1" target="_blank">[12]</a>. Detailed query information, including the query size in atoms and the number of matches, is provided in <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1003750#pcbi.1003750.s003" target="_blank">Table S3</a> and the specific query PDB files are included in the benchmark suite of suns-cmd (<a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1003750#pcbi.1003750.s006" target="_blank">Software S2</a>).</p
A Real-Time All-Atom Structural Search Engine for Proteins
<div><p>Protein designers use a wide variety of software tools for <i>de novo</i> design, yet their repertoire still lacks a fast and interactive all-atom search engine. To solve this, we have built the Suns program: a real-time, atomic search engine integrated into the PyMOL molecular visualization system. Users build atomic-level structural search queries within PyMOL and receive a stream of search results aligned to their query within a few seconds. This instant feedback cycle enables a new “designability”-inspired approach to protein design where the designer searches for and interactively incorporates native-like fragments from proven protein structures. We demonstrate the use of Suns to interactively build protein motifs, tertiary interactions, and to identify scaffolds compatible with hot-spot residues. The official web site and installer are located at <a href="http://www.degradolab.org/suns/" target="_blank">http://www.degradolab.org/suns/</a> and the source code is hosted at <a href="https://github.com/godotgildor/Suns" target="_blank">https://github.com/godotgildor/Suns</a> (PyMOL plugin, BSD license), <a href="https://github.com/Gabriel439/suns-cmd" target="_blank">https://github.com/Gabriel439/suns-cmd</a> (command line client, BSD license), and <a href="https://github.com/Gabriel439/suns-search" target="_blank">https://github.com/Gabriel439/suns-search</a> (search engine server, GPLv2 license).</p></div
Searching for calcium binding sites.
<p>(A) Two side chains of the EF-hand of calmodulin suffice to find matching motifs. The search query (black dashes) consists exclusively of two aspartate side chains (D20 and D24) and does not include the calcium ligand. (B) Searching for these two side chains at 0.7 Ă… resolution returns seven results, all of which are EF-hand motifs. Six of these motifs coordinate a matching calcium ion (green sphere), and the seventh motif coordinates a sodium ion (purple sphere).</p
Overview of Suns algorithm and architecture.
<p>(Inputs) The search index is built from two inputs: a set of words to recognize and a set of protein structures to search subdivided into pages. (Index) The two underlying data structures are a forward index that translates words to matching pages and a database of every page which translates matched words to atoms within each page. (Server) Each request to the server is broken into three steps: consult the forward index to find potentially matching pages, filter matching pages by RMSD to the query, and aligning successful matches to the query. (Queue) A message queue forwards requests from clients to servers, and forwards responses from servers to clients. (Clients) Suns provides two client interfaces: a PyMOL search plugin and the suns-cmd command line interface.</p
Building a tertiary interaction.
<p>(A) Three strands are seeded by searching on a valine, which reveals two nearby clusters of valine and tyrosine. (B) Strands are extended one residue in each direction by searching for pairs of residues (colored yellow) in the context of an insertion site, yielding clusters of potential inserts (colored green). (C) The final backbone fragment (green) is fed to MadCaT, which identifies multiple compatible scaffolds. One such scaffold (PDB ID = 1E54, colored light grey) possesses many exact residue/rotamer matches to the assembled fragment (blue highlights) and many close matches (yellow highlights) that differ by a related residue (threonine to serine or valine to isoleucine) or by varying the rotamer.</p
Incremental assembly of a motif.
<p>(A) An initial search for a guanidinium fragment reveals a cluster of nearby carboxylates. (B) Refining the search with one carboxylate from the results reveals a specific linker preference for both the aspartate and arginine involved in the salt bridge. (C) Adding the most common linker for arginine and repeating the search reveals that this salt bridge is part of an N-terminal capping motif. Search queries are represented as thick sticks and search results are shown as thin lines. Grey dashed lines highlight search queries and black dashed lines highlight clusters in the search results, which are filtered to show the specific residue fragments of interest and neighboring water molecules within 3.0 Ă… as red spheres. Search parameters and fragments listed in <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1003750#pcbi.1003750.s002" target="_blank">Table S2</a>.</p
Identification of fungi in shotgun metagenomics datasets
<div><p>Metagenomics uses nucleic acid sequencing to characterize species diversity in different niches such as environmental biomes or the human microbiome. Most studies have used 16S rRNA amplicon sequencing to identify bacteria. However, the decreasing cost of sequencing has resulted in a gradual shift away from amplicon analyses and towards shotgun metagenomic sequencing. Shotgun metagenomic data can be used to identify a wide range of species, but have rarely been applied to fungal identification. Here, we develop a sequence classification pipeline, FindFungi, and use it to identify fungal sequences in public metagenome datasets. We focus primarily on animal metagenomes, especially those from pig and mouse microbiomes. We identified fungi in 39 of 70 datasets comprising 71 fungal species. At least 11 pathogenic species with zoonotic potential were identified, including <i>Candida tropicalis</i>. We identified <i>Pseudogymnoascus</i> species from 13 Antarctic soil samples initially analyzed for the presence of bacteria capable of degrading diesel oil. We also show that <i>Candida tropicalis</i> and <i>Candida loboi</i> are likely the same species. In addition, we identify several examples where contaminating DNA was erroneously included in fungal genome assemblies.</p></div
Species used to generate three simulated read datasets.
<p>Species used to generate three simulated read datasets.</p
<i>Candida loboi</i> and <i>Candida tropicalis</i> are isolates of the same species.
<p>Maximum likelihood tree of a concatenated five-protein alignment from species from the <i>Candida</i> Gene Order Browser (CGOB; [<a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0192898#pone.0192898.ref046" target="_blank">46</a>]) and <i>C</i>. <i>loboi</i>. Five genes (<i>ERG1</i>, <i>MEF1</i>, <i>CEF3</i>, <i>DEG1</i>, <i>GCD14</i>) that are conserved in all CGOB species were chosen at random. All <i>C</i>. <i>loboi</i> orthologs were identified with best BLAST matches using <i>C</i>. <i>tropicalis</i> gene homologs. Protein sequences were aligned using Muscle (v3.8.31, [<a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0192898#pone.0192898.ref047" target="_blank">47</a>]) and concatenated. The tree was generated in SeaView [<a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0192898#pone.0192898.ref048" target="_blank">48</a>] using PhyML with the LG evolution model using Gblocks [<a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0192898#pone.0192898.ref049" target="_blank">49</a>] and 100 bootstraps (shown at nodes). Species abbreviations are displayed at branch leaves.</p
FindFungi v0.23 pipeline overview.
<p>Reads are downloaded in FASTQ format. Low quality reads are removed with Skewer [<a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0192898#pone.0192898.ref037" target="_blank">37</a>]. The remaining reads are converted into FASTA format, which are analyzed by 32 implementations of Kraken, each using a different database [<a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0192898#pone.0192898.ref026" target="_blank">26</a>]. The 32 Kraken predictions for each fungal read are consolidated, and a consensus prediction is assigned. Reads not predicted as fungal are removed. The best hit for each read is mapped to a pseudo-assembly of the relevant genome using BLAST [<a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0192898#pone.0192898.ref021" target="_blank">21</a>]. Species where BLAST displays hits on more than 30% of pseudo-chromosomes are retained. Pearson’s coefficient of skewness is calculated to identify non-randomly distributed reads. Species with a skewness score between -0.2 and 0.2 (minimal skew) are retained. Fungal predictions, statistics and summary plots are written to a PDF file, and fungal prediction statistics are also written to a CSV file.</p