45 research outputs found
The Structure-Function Linkage Database
The Structure–Function Linkage Database (SFLD, http://sfld.rbvi.ucsf.edu/) is a manually curated classification resource describing structure–function relationships for functionally diverse enzyme superfamilies. Members of such superfamilies are diverse in their overall reactions yet share a common ancestor and some conserved active site features associated with conserved functional attributes such as a partial reaction. Thus, despite their different functions, members of these superfamilies ‘look alike’, making them easy to misannotate. To address this complexity and enable rational transfer of functional features to unknowns only for those members for which we have sufficient functional information, we subdivide superfamily members into subgroups using sequence information, and lastly into families, sets of enzymes known to catalyze the same reaction using the same mechanistic strategy. Browsing and searching options in the SFLD provide access to all of these levels. The SFLD offers manually curated as well as automatically classified superfamily sets, both accompanied by search and download options for all hierarchical levels. Additional information includes multiple sequence alignments, tab-separated files of functional and other attributes, and sequence similarity networks. The latter provide a new and intuitively powerful way to visualize functional trends mapped to the context of sequence similarity
The Sorcerer II Global Ocean Sampling Expedition: Expanding the Universe of Protein Families
Metagenomics projects based on shotgun sequencing of populations of micro-organisms yield insight into protein families. We used sequence similarity clustering to explore proteins with a comprehensive dataset consisting of sequences from available databases together with 6.12 million proteins predicted from an assembly of 7.7 million Global Ocean Sampling (GOS) sequences. The GOS dataset covers nearly all known prokaryotic protein families. A total of 3,995 medium- and large-sized clusters consisting of only GOS sequences are identified, out of which 1,700 have no detectable homology to known families. The GOS-only clusters contain a higher than expected proportion of sequences of viral origin, thus reflecting a poor sampling of viral diversity until now. Protein domain distributions in the GOS dataset and current protein databases show distinct biases. Several protein domains that were previously categorized as kingdom specific are shown to have GOS examples in other kingdoms. About 6,000 sequences (ORFans) from the literature that heretofore lacked similarity to known proteins have matches in the GOS data. The GOS dataset is also used to improve remote homology detection. Overall, besides nearly doubling the number of current proteins, the predicted GOS proteins also add a great deal of diversity to known protein families and shed light on their evolution. These observations are illustrated using several protein families, including phosphatases, proteases, ultraviolet-irradiation DNA damage repair enzymes, glutamine synthetase, and RuBisCO. The diversity added by GOS data has implications for choosing targets for experimental structure characterization as part of structural genomics efforts. Our analysis indicates that new families are being discovered at a rate that is linear or almost linear with the addition of new sequences, implying that we are still far from discovering all protein families in nature
SmCL3, a Gastrodermal Cysteine Protease of the Human Blood Fluke Schistosoma mansoni
Parasitic infection caused by blood flukes of the genus Schistosoma is a major global health problem. More than 200 million people are infected. Identifying and characterizing the constituent enzymes of the parasite's biochemical pathways should reveal opportunities for developing new therapies (i.e., vaccines, drugs). Schistosomes feed on host blood, and a number of proteolytic enzymes (proteases) contribute to this process. We have identified and characterized a new protease, SmCL3 (for Schistosoma mansoni cathepsin L3), that is found within the gut tissue of the parasite. We have employed various biochemical and molecular biological methods and sequence similarity analyses to characterize SmCL3 and obtain insights into its possible functions in the parasite, as well as its evolutionary position among cathepsin L proteases in general. SmCL3 hydrolyzes major host blood proteins (serum albumin and hemoglobin) and is expressed in parasite life stages infecting the mammalian host. Enzyme substrate specificity detected by positional scanning-synthetic combinatorial library was confirmed by molecular modeling. A sequence analysis placed SmCL3 to the cluster of other cathepsins L in accordance with previous phylogenetic analyses
A Global Comparison of the Human and <em>T. brucei</em> Degradomes Gives Insights about Possible Parasite Drug Targets
<div><p>We performed a genome-level computational study of sequence and structure similarity, the latter using crystal structures and models, of the proteases of <em>Homo sapiens</em> and the human parasite <em>Trypanosoma brucei</em>. Using sequence and structure similarity networks to summarize the results, we constructed global views that show visually the relative abundance and variety of proteases in the degradome landscapes of these two species, and provide insights into evolutionary relationships between proteases. The results also indicate how broadly these sequence sets are covered by three-dimensional structures. These views facilitate cross-species comparisons and offer clues for drug design from knowledge about the sequences and structures of potential drug targets and their homologs. Two protease groups (“M32” and “C51”) that are very different in sequence from human proteases are examined in structural detail, illustrating the application of this global approach in mining new pathogen genomes for potential drug targets. Based on our analyses, a human ACE2 inhibitor was selected for experimental testing on one of these parasite proteases, TbM32, and was shown to inhibit it. These sequence and structure data, along with interactive versions of the protein similarity networks generated in this study, are available at <a href="http://babbittlab.ucsf.edu/resources.html">http://babbittlab.ucsf.edu/resources.html</a>.</p> </div
Structure similarity network of human and <i>T. brucei</i> proteases using crystal structures and models.
<p>Nodes represent experimentally characterized (crystal structure) or modeled structures and edges represent pairwise structural similarity above the structural similarity threshold (FAST SN score ≥4.5). Nodes for 342 human and 71 <i>T. brucei</i> are shown in the network (total of 413 nodes and 7,234 edges). The two <i>T. brucei-</i>specific families (TbM32 and C51) highlighted in the sequence similarity network shown in <a href="http://www.plosntds.org/article/info:doi/10.1371/journal.pntd.0001942#pntd-0001942-g001" target="_blank">Figure 1</a> are circled in red. (A) Nodes are colored by MEROPS-associated family, revealing cross-family structural relationships. Human structures are represented as circles and <i>T. brucei</i> as triangles. (B) The same structure similarity network as in panel A is painted by species and structure representation. Nodes are color-coded by species and node shape corresponds to type of structure representation for that sequence: square = crystal structure; triangle = ModBase model; diamond = ModWeb model. In contrast to <i>T. brucei,</i> there are a large number of experimentally characterized (crystal) structures for humans, but many <i>T. brucei</i> structures can be modeled.</p
The <i>T. brucei</i> M32 protease model shows active site similarity to a human protease ACE2.
<p>The model of the <i>T. brucei</i> M32 protease (TbM32m, purple) is shown structurally aligned with crystal structure ACE2 (PDB code 1R4L, yellow). Depicted in ball-and-stick representation near the zinc ion are the metal binding residues and catalytic glutamate. ACE2 inhibitor MLN4760 is shown in green and ACE inhibitor lisinopril is in orange stick format (the position of which is from a structural alignment of ACE (1O86) with ACE2). The predicted steric clash of R273 in the ACE2 S1 pocket with lisinopril is marked with an arrow. The R273 CZ of ACE2 is predicted to be 1.5 Å from the lisinopril C9, so that a terminal nitrogen of R273 is in position to overlap with an oxygen of lisinopril. The arginine (R348) from TbM32m that is predicted to be close to the ACE2 R273 is also in ball-and-stick representation. The inset shows the overall structural similarity of the two proteins.</p
Structure alignment of <i>T. brucei</i> C51 model (TbC51m) with a distant structure homolog, human Cathepsin F (CatF).
<p>The superposition shows these two proteins have some general, overall structural similarities, but also large differences near the active site. The TbC51 model is colored in light orange, and the human CatF is in light green. While the catalytic Cys-His dyads are closely superimposed (depicted in ball-and-stick), a striking difference is marked by an arrow indicating the predicted steric clash between the CatF vinyl sulfone inhibitor (red) and the helix of TbC51 that partially obstructs the active site.</p
Global view of predicted active proteases of human and <i>T. brucei</i> showing sequence similarity relationships.
<p>Protease sequences are represented as nodes, and similarity relationships between sequences better than the threshold (BLAST <i>E-</i>value ≤1e-5) are depicted as “edges” or lines between nodes. In the network are represented 594 human and 127 <i>T. brucei</i> sequences (total of 721 nodes and 10,188 edges). (A) Distribution by family of proteases. Nodes for human sequences are represented as circles and for <i>T. brucei</i> sequences as triangles, and are colored by MEROPS-associated family (see <a href="http://www.plosntds.org/article/info:doi/10.1371/journal.pntd.0001942#s2" target="_blank">Methods</a>). Families of some of the larger clusters are labeled, and the parasite-specific C51 and M32 clusters are circled in red. (B) Structure coverage of sequence space is broad in human and <i>T. brucei</i>. The same sequence similarity network as in panel A is shown except that it is color-coded by species and nodes are enlarged and designated by different shapes to denote if a crystal structure or model exists for that sequence. Node shapes: square = crystal structure; triangle = ModBase model; diamond = ModWeb model; small circle = no structure.</p
Distribution by catalytic type of peptidases predicted to be active in humans and <i>T. brucei</i>.
<p>In humans, proteases of catalytic type S (where the catalytic moiety is serine) is dominant, but metallo (type M) and cysteine (type C) peptidases are also abundant. In contrast, in <i>T. brucei,</i> serine peptidases are less abundant, and cysteine and metallo proteases are equally prominent. Other main catalytic types in each organism include the threonine (type T) and aspartatic (type A) proteases. Catalytic types were assigned by catalytic type designated in the family of the closest BLAST hits to MEROPS sequences.</p