49 research outputs found

    Functional classification of protein domain superfamilies for protein function annotation

    Get PDF
    Proteins are made up of domains that are generally considered to be independent evolutionary and structural units having distinct functional properties. It is now well established that analysis of domains in proteins provides an effective approach to understand protein function using a `domain grammar'. Towards this end, evolutionarily-related protein domains have been classified into homologous superfamilies in CATH and SCOP databases. An ideal functional sub-classification of the domain superfamilies into `functional families' can not only help in function annotation of uncharacterised sequences but also provide a useful framework for understanding the diversity and evolution of function at the domain level. This work describes the development of a new protocol (FunFHMMer) for identifying functional families in CATH superfamilies that makes use of sequence patterns only and hence, is unaffected by the incompleteness of function annotations, annotation biases or misannotations existing in the databases. The resulting family classification was validated using known functional information and was found to generate more functionally coherent families than other domain-based protein resources. A protein function prediction pipeline was developed exploiting the functional annotations provided by the domain families which was validated by a database rollback benchmark set of proteins and an independent assessment by CAFA 2. The functional classification was found to capture the functional diversity of superfamilies well in terms of sequence, structure and the protein-context. This aided studies on evolution of protein domain function both at the superfamily level and in specific proteins of interest. The conserved positions in the functional family alignments were found to be enriched in catalytic site residues and ligand-binding site residues which led to the development of a functional site prediction tool. Lastly, the function prediction tools were assessed for annotation of moonlighting functions of proteins and a classification of moonlighting proteins was proposed based on their structure-function relationships

    Computational approaches to predict protein functional families and functional sites.

    Get PDF
    Understanding the mechanisms of protein function is indispensable for many biological applications, such as protein engineering and drug design. However, experimental annotations are sparse, and therefore, theoretical strategies are needed to fill the gap. Here, we present the latest developments in building functional subclassifications of protein superfamilies and using evolutionary conservation to detect functional determinants, for example, catalytic-, binding- and specificity-determining residues important for delineating the functional families. We also briefly review other features exploited for functional site detection and new machine learning strategies for combining multiple features

    CATH functional families predict functional sites in proteins

    Get PDF
    MOTIVATION: Identification of functional sites in proteins is essential for functional characterization, variant interpretation and drug design. Several methods are available for predicting either a generic functional site, or specific types of functional site. Here, we present FunSite, a machine learning predictor that identifies catalytic, ligand-binding and protein-protein interaction functional sites using features derived from protein sequence and structure, and evolutionary data from CATH functional families (FunFams). RESULTS: FunSite's prediction performance was rigorously benchmarked using cross-validation and a holdout dataset. FunSite outperformed other publicly-available functional site prediction methods. We show that conserved residues in FunFams are enriched in functional sites. We found FunSite's performance depends greatly on the quality of functional site annotations and the information content of FunFams in the training data. Finally, we analyse which structural and evolutionary features are most predictive for functional sites. AVAILABILITY: https://github.com/UCL/cath-funsite-predictor. CONTACT: [email protected]. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online

    Affimer Inhibition of Verona Integron-encoded Metallo-β-Lactamase 1

    Get PDF
    Metallo-β-lactamases (MBLs) are a class of enzyme that hydrolyse the β-lactam ring in β-lactam antibiotics through their active site Zn2+ rendering them inactive. Verona Integron-encoded metallo-β-lactamase 1 (VIM-1) is a class B1, group 3, carbapenemase which is easily disseminated through the plasmid blaVIM, and offers bacteria broad-spectrum resistance to almost all known β-lactam antibiotics. A combination of β-lactamase inhibitors with β-lactam antibiotics is currently the most reliable method of dealing with resistant pathogens, but there are currently no clinically available inhibitors of this class of MBL. Affimers, a class of non-antibody binding proteins could potentially offer an alternative source of novel MBL inhibitors. An Affimer that binds and inhibits New Delhi metallo-β-lactamase (NDM-1), a structurally similar MBL to VIM-1, has previously been identified. It was proposed that utilising similar screening methods, Affimer reagents could be raised against VIM-1 also to bind and modulate its activity. Though further study is needed, as a result of this work, an Affimer (Affimer61) was identified that was capable of reducing VIM-1’s hydrolysis of nitrocefin substrate by 47%

    The identification and characterisation of novel antimicrobial resistance genes from human and animal metagenomes

    Get PDF
    Antimicrobial resistance genes are harboured by bacteria in the human oral cavity and ruminant faeces and they are shed in particularly high abundances in calf faeces. Furthermore, bacteriocin (antimicrobial peptide) producing bacteria have been isolated from these environments. In recent times bacteriocins have received much attention as potential alternatives to antibiotics. Human saliva and calf faeces harbour ‘yet-to-be cultured bacteria’ that can only be studied by analysing their DNA. To this end, two metagenomic libraries were created from human saliva and calf faeces metagenomic DNA with the aim of identifying novel antimicrobial resistance and bacteriocin genes. Screening these libraries for tetracycline resistance identified two tetracycline resistant clones. Clone PS9 was also tigecycline resistant and contained a 7,765 bp insert that encoded two half-ABC transporter genes; subcloning of these genes showed that they were responsible for the observed resistance phenotype. As the ABC transporter conferred resistance only to tetracyclines and its putative amino acid sequence showed <80 % identity to known tetracycline resistance proteins, it was named TetAB(60). Clone TT31 contained a 14,226 bp insert. 7, 216 bp of the insert had 97 % nucleotide identity to Tn916 and contained part of tet(M) and a full length tet(L) gene. This gene organisation has not been described in Tn916-like elements and it may represent a novel Tn916-like element. The human saliva library was also screened for antiseptic resistance revealing a CTAB resistant clone. Random transposon mutagenesis of the 19.1 Kb insert and subcloning of a UDP-glucose 4-epimerase revealed it to be solely required for the observed resistance. This study identified novel tetracycline, tigecycline and CTAB resistance genes from the human saliva metagenome, demonstrating the importance of this environment as a source of resistance genes that may compromise the effectiveness of these antibiotics and antimicrobials. Additionally, this work highlights the relevance of house-keeping genes to the development of antimicrobial resistance

    Fusing synthetic biology with nanotechnology: Integrating proteins into carbon nanotube field-effect transistors

    Get PDF
    Proteins are nature’s own nanomachines. Crafted through years of evolution, they are optimised to perform a range of cellular functions. To translate this into a useful nanotechnological application, proteins can be integrated into fundamental electronic devices known as carbon nanotube field-effect transistors (NT-FETs). I do this by engineering in non-natural amino acid p-azido-L-phenylalanine (AzF), which can be activated by UV light to covalently bind the carbon nanotube channel of an NT-FET. This creates an intimate environment for signal transduction, whereby an external biochemical signal (e.g., a chemical reaction, or incoming charge density from a protein-protein interaction) is transduced into an electrical signal. Potential applications for this will be dependent on the protein interfaced, but this thesis will consider two key themes: biosensing and optoelectronic gating. Chapter 3 builds on previous research by the Jones and Palma collaboration to develop a biosensor for antibiotic resistance (ABR). I do this by covalently integrating BLIP-II (Beta-Lactamase Inhibitory Protein II) to an NT-FET, transducing binding events with ABR biomarkers, the class A β-lactamases. NT-FETs were functionalised with defined BLIP-IIAzF variants to sample different orientations of analytes TEM-1 and KPC-2 β-lactamase. The distinct electrical signals generated correlated to the unique electrostatic surface being sampled, providing evidence for electrostatic gating. Chapter 4 builds on the experimental results from Chapter 3 to consider whether the BLIP-IIAzF—NT-FET interface can be effectively modelled to predict AzF mutation site success in mediating proximal analyte sensing. Using molecular dynamics, data on AzF side chain rotamer propensity was extracted, and in silico modelling was performed to assess the possible binding orientations of BLIP-IIAzF variants at the NT-FET interface. Distance and electrostatic potential of incoming β-lactamases were measured and showed correlation to the electrostatic gating observed in Chapter 3. Chapter 5 was devised in collaboration with the Bobrinetskiy lab, as I looked to exploit nature’s own light-responsive elements by covalently integrating sfGFP (superfolder Green Fluorescent Protein) into an NT-FET platform. By defining sfGFP orientation through two distinct AzF anchor sites, light was shown to induce optoelectronic memory and optoelectronic gating. Further novelty was discovered as water regenerated the optoelectronic gating response after six months of protein dehydration

    Using TraDIS to probe the model organism Escherichia coli

    Get PDF
    Transposon-directed insertion-site sequencing (TraDIS) is a high-throughput method that couples transposon mutagenesis with short-fragment DNA sequencing. It is commonly used to identify the essential or conditionally essential genome of an organism, linking phenotype with genotype. A mini Tn5 transposon library was constructed in the model organism Escherichia coli K-12 strain BW25113. With a median distance between insertions of 3 bp, this is one of the most dense libraries published and provides a valuable tool for whole genome screening. Analysis of this library revealed subtle differences between the TraDIS method and the gold-standard gene-deletion method to determine the essential genome of an organism. This included, but was not limited to, the identification of transposon insertion bias, the boundaries of essential domains within a gene, and short essential domains within intragenic regions. Insertion bias was subsequently found to reveal the position of essential promoters and novel regulatory elements. This library was further used to probe the genomic requirement for survival in the presence of the clinically relevant antibiotic polymyxin B at sub-inhibitory concentrations. Among this data a gene of unknown function, yhcB, was identified and found to have a fundamental role in cell envelope biogenesis. The density of this library enabled identification of previously unreported genomic features. The results presented highlight the potential applications of TraDIS as a tool for a wide range of biological questions
    corecore