67 research outputs found

    Gene3D: Multi-domain annotations for protein sequence and comparative genome analysis

    Get PDF
    Gene3D (http://gene3d.biochem.ucl.ac.uk) is a database of protein domain structure annotations for protein sequences. Domains are predicted using a library of profile HMMs from 2738 CATH superfamilies. Gene3D assigns domain annotations to Ensembl and UniProt sequence sets including >6000 cellular genomes and >20 million unique protein sequences. This represents an increase of 45% in the number of protein sequences since our last publication. Thanks to improvements in the underlying data and pipeline, we see large increases in the domain coverage of sequences. We have expanded this coverage by integrating Pfam and SUPERFAMILY domain annotations, and we now resolve domain overlaps to provide highly comprehensive composite multi-domain architectures. To make these data more accessible for comparative genome analyses, we have developed novel search algorithms for searching genomes to identify related multi-domain architectures. In addition to providing domain family annotations, we have now developed a pipeline for 3D homology modelling of domains in Gene3D. This has been applied to the human genome and will be rolled out to other major organisms over the next year

    An integrated approach to the interpretation of Single Amino Acid Polymorphisms within the framework of CATH and Gene3D

    Get PDF
    Background The phenotypic effects of sequence variations in protein-coding regions come about primarily via their effects on the resulting structures, for example by disrupting active sites or affecting structural stability. In order better to understand the mechanisms behind known mutant phenotypes, and predict the effects of novel variations, biologists need tools to gauge the impacts of DNA mutations in terms of their structural manifestation. Although many mutations occur within domains whose structure has been solved, many more occur within genes whose protein products have not been structurally characterized.<p></p> Results Here we present 3DSim (3D Structural Implication of Mutations), a database and web application facilitating the localization and visualization of single amino acid polymorphisms (SAAPs) mapped to protein structures even where the structure of the protein of interest is unknown. The server displays information on 6514 point mutations, 4865 of them known to be associated with disease. These polymorphisms are drawn from SAAPdb, which aggregates data from various sources including dbSNP and several pathogenic mutation databases. While the SAAPdb interface displays mutations on known structures, 3DSim projects mutations onto known sequence domains in Gene3D. This resource contains sequences annotated with domains predicted to belong to structural families in the CATH database. Mappings between domain sequences in Gene3D and known structures in CATH are obtained using a MUSCLE alignment. 1210 three-dimensional structures corresponding to CATH structural domains are currently included in 3DSim; these domains are distributed across 396 CATH superfamilies, and provide a comprehensive overview of the distribution of mutations in structural space.<p></p> Conclusion The server is publicly available at http://3DSim.bioinfo.cnio.es/ webcite. In addition, the database containing the mapping between SAAPdb, Gene3D and CATH is available on request and most of the functionality is available through programmatic web service access.<p></p&gt

    Estimation of the solubility parameters of model plant surfaces and agrochemicals: a valuable tool for understanding plant surface interactions

    Get PDF
    Background Most aerial plant parts are covered with a hydrophobic lipid-rich cuticle, which is the interface between the plant organs and the surrounding environment. Plant surfaces may have a high degree of hydrophobicity because of the combined effects of surface chemistry and roughness. The physical and chemical complexity of the plant cuticle limits the development of models that explain its internal structure and interactions with surface-applied agrochemicals. In this article we introduce a thermodynamic method for estimating the solubilities of model plant surface constituents and relating them to the effects of agrochemicals. Results Following the van Krevelen and Hoftyzer method, we calculated the solubility parameters of three model plant species and eight compounds that differ in hydrophobicity and polarity. In addition, intact tissues were examined by scanning electron microscopy and the surface free energy, polarity, solubility parameter and work of adhesion of each were calculated from contact angle measurements of three liquids with different polarities. By comparing the affinities between plant surface constituents and agrochemicals derived from (a) theoretical calculations and (b) contact angle measurements we were able to distinguish the physical effect of surface roughness from the effect of the chemical nature of the epicuticular waxes. A solubility parameter model for plant surfaces is proposed on the basis of an increasing gradient from the cuticular surface towards the underlying cell wall. Conclusions The procedure enabled us to predict the interactions among agrochemicals, plant surfaces, and cuticular and cell wall components, and promises to be a useful tool for improving our understanding of biological surface interactions

    Comparing genomic variant identification protocols for Candida auris.

    Get PDF
    Genomic analyses are widely applied to epidemiological, population genetic and experimental studies of pathogenic fungi. A wide range of methods are employed to carry out these analyses, typically without including controls that gauge the accuracy of variant prediction. The importance of tracking outbreaks at a global scale has raised the urgency of establishing high-accuracy pipelines that generate consistent results between research groups. To evaluate currently employed methods for whole-genome variant detection and elaborate best practices for fungal pathogens, we compared how 14 independent variant calling pipelines performed across 35 Candida auris isolates from 4 distinct clades and evaluated the performance of variant calling, single-nucleotide polymorphism (SNP) counts and phylogenetic inference results. Although these pipelines used different variant callers and filtering criteria, we found high overall agreement of SNPs from each pipeline. This concordance correlated with site quality, as SNPs discovered by a few pipelines tended to show lower mapping quality scores and depth of coverage than those recovered by all pipelines. We observed that the major differences between pipelines were due to variation in read trimming strategies, SNP calling methods and parameters, and downstream filtration criteria. We calculated specificity and sensitivity for each pipeline by aligning three isolates with chromosomal level assemblies and found that the GATK-based pipelines were well balanced between these metrics. Selection of trimming methods had a greater impact on SAMtools-based pipelines than those using GATK. Phylogenetic trees inferred by each pipeline showed high consistency at the clade level, but there was more variability between isolates from a single outbreak, with pipelines that used more stringent cutoffs having lower resolution. This project generated two truth datasets useful for routine benchmarking of C. auris variant calling, a consensus VCF of genotypes discovered by 10 or more pipelines across these 35 diverse isolates and variants for 2 samples identified from whole-genome alignments. This study provides a foundation for evaluating SNP calling pipelines and developing best practices for future fungal genomic studies

    Chloroquine and Its Derivatives Exacerbate B19V-Associated Anemia by Promoting Viral Replication

    Get PDF
    Human parvovirus B19 (B19V) is typically associated with a childhood febrile illness known as erythema infectiosum. The infection usually resolves without consequence in healthy individuals. However, in patients with immunologic and/or hematologic disorders, B19V can cause a significant pathology. The virus infects and kills red cell precursors but anemia rarely supervenes unless there is pre-existing anemia such as in children living in malaria-endemic regions. The link between B19V infection and severe anemia has, however, only been confirmed in certain malaria-endemic countries in parallel with chloroquine (CQ) usage. This raises the possibility that CQ may increase the risk of severe anemia by promoting B19V infection. To test this hypothesis, we examined the direct effect of CQ and other commonly used antimalarial drugs on B19V infection in cultured cell lines. Additionally, we examined the correlation between B19V infection, hemoglobin levels and use of CQ in children from Papua New Guinea hospitalized with severe anemia. The results suggest strongly that CQ and its derivatives aggravate B19V-associated anemia by promoting B19V replication. Hence, careful consideration should be given in choosing the drug partnering artemisinin compounds in combination antimalarial therapy in order to minimize contribution of B19V to severe anemia

    Visualizing variation within Global Pneumococcal Sequence Clusters (GPSCs) and country population snapshots to contextualize pneumococcal isolates

    Get PDF
    Knowledge of pneumococcal lineages, their geographic distribution and antibiotic resistance patterns, can give insights into global pneumococcal disease. We provide interactive bioinformatic outputs to explore such topics, aiming to increase dissemination of genomic insights to the wider community, without the need for specialist training. We prepared 12 country-specific phylogenetic snapshots, and international phylogenetic snapshots of 73 common Global Pneumococcal Sequence Clusters (GPSCs) previously defined using PopPUNK, and present them in Microreact. Gene presence and absence defined using Roary, and recombination profiles derived from Gubbins are presented in Phandango for each GPSC. Temporal phylogenetic signal was assessed for each GPSC using BactDating. We provide examples of how such resources can be used. In our example use of a country-specific phylogenetic snapshot we determined that serotype 14 was observed in nine unrelated genetic backgrounds in South Africa. The international phylogenetic snapshot of GPSC9, in which most serotype 14 isolates from South Africa were observed, highlights that there were three independent sub-clusters represented by South African serotype 14 isolates. We estimated from the GPSC9-dated tree that the sub-clusters were each established in South Africa during the 1980s. We show how recombination plots allowed the identification of a 20kb recombination spanning the capsular polysaccharide locus within GPSC97. This was consistent with a switch from serotype 6A to 19A estimated to have occured in the 1990s from the GPSC97-dated tree. Plots of gene presence/absence of resistance genes (tet, erm, cat) across the GPSC23 phylogeny were consistent with acquisition of a composite transposon. We estimated from the GPSC23-dated tree that the acquisition occurred between 1953 and 1975. Finally, we demonstrate the assignment of GPSC31 to 17 externally generated pneumococcal serotype 1 assemblies from Utah via Pathogenwatch. Most of the Utah isolates clustered within GPSC31 in a USA-specific clade with the most recent common ancestor estimated between 1958 and 1981. The resources we have provided can be used to explore to data, test hypothesis and generate new hypotheses. The accessible assignment of GPSCs allows others to contextualize their own collections beyond the data presented here

    Local Function Conservation in Sequence and Structure Space

    Get PDF
    We assess the variability of protein function in protein sequence and structure space. Various regions in this space exhibit considerable difference in the local conservation of molecular function. We analyze and capture local function conservation by means of logistic curves. Based on this analysis, we propose a method for predicting molecular function of a query protein with known structure but unknown function. The prediction method is rigorously assessed and compared with a previously published function predictor. Furthermore, we apply the method to 500 functionally unannotated PDB structures and discuss selected examples. The proposed approach provides a simple yet consistent statistical model for the complex relations between protein sequence, structure, and function. The GOdot method is available online (http://godot.bioinf.mpi-inf.mpg.de)

    The Structural Biology Knowledgebase: a portal to protein structures, sequences, functions, and methods

    Get PDF
    The Protein Structure Initiative’s Structural Biology Knowledgebase (SBKB, URL: http://sbkb.org) is an open web resource designed to turn the products of the structural genomics and structural biology efforts into knowledge that can be used by the biological community to understand living systems and disease. Here we will present examples on how to use the SBKB to enable biological research. For example, a protein sequence or Protein Data Bank (PDB) structure ID search will provide a list of related protein structures in the PDB, associated biological descriptions (annotations), homology models, structural genomics protein target status, experimental protocols, and the ability to order available DNA clones from the PSI:Biology-Materials Repository. A text search will find publication and technology reports resulting from the PSI’s high-throughput research efforts. Web tools that aid in research, including a system that accepts protein structure requests from the community, will also be described. Created in collaboration with the Nature Publishing Group, the Structural Biology Knowledgebase monthly update also provides a research library, editorials about new research advances, news, and an events calendar to present a broader view of structural genomics and structural biology

    Identification and in vitro Analysis of the GatD/MurT Enzyme-Complex Catalyzing Lipid II Amidation in Staphylococcus aureus

    Get PDF
    The peptidoglycan of Staphylococcus aureus is characterized by a high degree of crosslinking and almost completely lacks free carboxyl groups, due to amidation of the D-glutamic acid in the stem peptide. Amidation of peptidoglycan has been proposed to play a decisive role in polymerization of cell wall building blocks, correlating with the crosslinking of neighboring peptidoglycan stem peptides. Mutants with a reduced degree of amidation are less viable and show increased susceptibility to methicillin. We identified the enzymes catalyzing the formation of D-glutamine in position 2 of the stem peptide. We provide biochemical evidence that the reaction is catalyzed by a glutamine amidotransferase-like protein and a Mur ligase homologue, encoded by SA1707 and SA1708, respectively. Both proteins, for which we propose the designation GatD and MurT, are required for amidation and appear to form a physically stable bi-enzyme complex. To investigate the reaction in vitro we purified recombinant GatD and MurT His-tag fusion proteins and their potential substrates, i.e. UDP-MurNAc-pentapeptide, as well as the membrane-bound cell wall precursors lipid I, lipid II and lipid II-Gly5. In vitro amidation occurred with all bactoprenol-bound intermediates, suggesting that in vivo lipid II and/or lipid II-Gly5 may be substrates for GatD/MurT. Inactivation of the GatD active site abolished lipid II amidation. Both, murT and gatD are organized in an operon and are essential genes of S. aureus. BLAST analysis revealed the presence of homologous transcriptional units in a number of gram-positive pathogens, e.g. Mycobacterium tuberculosis, Streptococcus pneumonia and Clostridium perfringens, all known to have a D-iso-glutamine containing PG. A less negatively charged PG reduces susceptibility towards defensins and may play a general role in innate immune signaling
    corecore