9 research outputs found

    fDETECT webserver: fast predictor of propensity for protein production, purification, and crystallization

    Get PDF
    Background: Development of predictors of propensity of protein sequences for successful crystallization has been actively pursued for over a decade. A few novel methods that expanded the scope of these predictions to address additional steps of protein production and structure determination pipelines were released in recent years. The predictive performance of the current methods is modest. This is because the only input that they use is the protein sequence and since the experimental annotations of these data might be inconsistent given that they were collected across many laboratories and centers. However, even these modest levels of predictive quality are still practical compared to the reported low success rates of crystallization, which are below 10%. We focus on another important aspect related to a high computational cost of running the predictors that offer the expanded scope. Results: We introduce a novel fDETECT webserver that provides very fast and modestly accurate predictions of the success of protein production, purification, crystallization, and structure determination. Empirical tests on two datasets demonstrate that fDETECT is more accurate than the only other similarly fast method, and similarly accurate and three orders of magnitude faster than the currently most accurate predictors. Our method predicts a single protein in about 120 milliseconds and needs less than an hour to generate the four predictions for an entire human proteome. Moreover, we empirically show that fDETECT secures similar levels of predictive performance when compared with four representative methods that only predict success of crystallization, while it also provides the other three predictions. A webserver that implements fDETECT is available at http://biomine.cs.vcu.edu/servers/ fDETECT/. Conclusions: fDETECT is a computational tool that supports target selection for protein production and X-ray crystallography-based structure determination. It offers predictive quality that matches or exceeds other state-ofthe-art tools and is especially suitable for the analysis of large protein sets

    Disorder prediction methods, their applicability to different protein targets and their usefulness for guiding experimental studies

    Get PDF
    The role and function of a given protein is dependent on its structure. In recent years, however, numerous studies have highlighted the importance of unstructured, or disordered regions in governing a protein’s function. Disordered proteins have been found to play important roles in pivotal cellular functions, such as DNA binding and signalling cascades. Studying proteins with extended disordered regions is often problematic as they can be challenging to express, purify and crystallise. This means that interpretable experimental data on protein disorder is hard to generate. As a result, predictive computational tools have been developed with the aim of predicting the level and location of disorder within a protein. Currently, over 60 prediction servers exist, utilizing different methods for classifying disorder and different training sets. Here we review several good performing, publicly available prediction methods, comparing their application and discussing how disorder prediction servers can be used to aid the experimental solution of protein structure. The use of disorder prediction methods allows us to adopt a more targeted approach to experimental studies by accurately identifying the boundaries of ordered protein domains so that they may be investigated separately, thereby increasing the likelihood of their successful experimental solution

    Prediction of Protein Solubility

    Get PDF
    Proteínová rozpustnosť je úzko spojená s použiteľnosťou proteínov pre účely priemyselného využitia a vo výskume. Predikcia rozpustnosti by preto viedla k značnému ušetreniu finančných prostriedkov. V tejto práci je prezentovaný nový prediktor rozpustnosti Solpex založený na strojovom učení, ktorý na nezávislej testovacej sade dosiahol vyššiu presnosť ako porovnateľné existujúce prediktory. Realizácii prediktoru predchádzalo oboznámenie s biologickou podstatou rozpustnosti, preskúmanie existujúcich prístupov k predikcii, tvorba dátových sád, uskutočnenie experimentov a výber vlastností pre prediktor. Najpodstatnejšou z týchto častí je pravdepodobne tvorba dátových sád, ktoré sú kľúčové pre vytvorenie kvalitného prediktoru. V súvislosti s dátovými sadami je v tejto práci podrobne popísané spracovanie hlavného zdroja ich dát - databázy TargetTrack.Protein solubility is closely related to the usability of proteins in industrial use and research. The successful prediction of solubility would therefore lead to a significant saving of financial resources. This work presents new solubility predictor Solpex based on machine learning that achieved better performance on independent test set than any comparable solubility prediction tool. The predictor implementation was preceded by a study of the biological nature of solubility, evaluation of existing solubility prediction approaches, datasets building, many experiments with novel features and selection of the best features for the predictor. As the most important step in machine learning is the datasets building, this work mainly benefits from own rigorous processing of the main source of solubility data - the TargetTrack database.

    Polymer mediated protein crystallisation

    Get PDF
    Structure elucidation of a macromolecule can lead to the determination of its function. In the case of proteins, knowledge of their three-dimensional structure can be utilised in the identification of active site(s) and consequently in rational drug design. Commonly, X-ray crystallography is implemented on a high quality single crystal of the target macromolecule, in order to elucidate its structure. Moreover, crystallised protein molecules may remain active which can then be used in controlled drug delivery. Unfortunately, the successful crystallisation of a macromolecule can be seen as the most challenging aspect in this endeavour, given that predicting, screening and directing crystallisation remains an elusive goal. A possible solution to this problem is the use of heterogeneous nucleation, where a foreign surface is employed to lower the energy barrier for nucleation to occur. Heteronucleation has been utilised in the crystallisation of small organic molecules, inorganic complexes, extended networks and proteins. Polymeric surfaces, as heteronucleants, in protein crystallisation have been known to increase nucleation density rates and selectively crystallise particular forms of proteins. Moreover, imprinted polymeric surfaces have been successfully used to selectively crystallise inorganic molecules as well as a small range of well studied proteins. This thesis presents the effects of polymers on the crystallisation of two proteins, that of mutant human thioredoxin and wild-type hen egg-white lysozyme (HEWL). Polymers were used in solution, as physically adsorbed films as well as plasma polymers. Shape and size of the protein crystals was altered, while polymorphism was also achieved, in the presence of polymers with various functionalities. This work is a step towards the use of polymers as heteronucleants in directing protein crystallisation

    Polymer mediated protein crystallisation

    Get PDF
    Structure elucidation of a macromolecule can lead to the determination of its function. In the case of proteins, knowledge of their three-dimensional structure can be utilised in the identification of active site(s) and consequently in rational drug design. Commonly, X-ray crystallography is implemented on a high quality single crystal of the target macromolecule, in order to elucidate its structure. Moreover, crystallised protein molecules may remain active which can then be used in controlled drug delivery. Unfortunately, the successful crystallisation of a macromolecule can be seen as the most challenging aspect in this endeavour, given that predicting, screening and directing crystallisation remains an elusive goal. A possible solution to this problem is the use of heterogeneous nucleation, where a foreign surface is employed to lower the energy barrier for nucleation to occur. Heteronucleation has been utilised in the crystallisation of small organic molecules, inorganic complexes, extended networks and proteins. Polymeric surfaces, as heteronucleants, in protein crystallisation have been known to increase nucleation density rates and selectively crystallise particular forms of proteins. Moreover, imprinted polymeric surfaces have been successfully used to selectively crystallise inorganic molecules as well as a small range of well studied proteins. This thesis presents the effects of polymers on the crystallisation of two proteins, that of mutant human thioredoxin and wild-type hen egg-white lysozyme (HEWL). Polymers were used in solution, as physically adsorbed films as well as plasma polymers. Shape and size of the protein crystals was altered, while polymorphism was also achieved, in the presence of polymers with various functionalities. This work is a step towards the use of polymers as heteronucleants in directing protein crystallisation

    Sequence-based prediction of protein crystallization, purification and production propensity

    No full text
    Motivation: X-ray crystallography-based protein structure determination, which accounts for majority of solved structures, is characterized by relatively low success rates. One solution is to build tools which support selection of targets that are more likely to crystallize. Several in silico methods that predict propensity of diffraction-quality crystallization from protein chains were developed. We show that the quality of their predictions drops when applied to more recent crystallization trails, which calls for new solutions. We propose a novel approach that alleviates drawbacks of the existing methods by using a recent dataset and improved protocol to annotate progress along the crystallization process, by predicting the success of the entire process and steps which result in the failed attempts, and by utilizing a compact and comprehensive set of sequence-derived inputs to generate accurate predictions

    Development of crystallographic techniques and their application to several protein targets

    Get PDF
    Since its first use to solve the structure of sodium chloride in 1915 X-ray crystallography has developed significantly to become the premier technique for obtaining 3D structural information of small molecules and macromolecules alike. As the technique continues to develop and focus its attention on weak diffraction from the likes of micro-crystals and poorly packed crystals of membrane proteins and large protein complexes; as well as ultra-high resolution data and weak anomalous signal from native atoms, data quality is becoming more and more important. Data quality is particularly important in the wake of long wavelength macromolecular crystallography (MX) for phasing using anomalous signal from native sulphur and phosphorous atoms in proteins and DNA. This thesis first investigated the use of a new sample handling technique using a humidity controlled stream to preserve macromolecular crystals while excess surrounding solvent is removed (Chapter 2). Following the successful development of this technique the effects of excess surrounding solvent on data quality was assessed when collecting at standard MX X-ray wavelengths (~ 1 Å) and longer X-ray wavelengths (~ 2 Å). Datasets were collected from large populations of control and test crystals at standard and longer wavelengths to allow robust statistical methods to be applied; a practice not widely adopted in method development studies in X-ray crystallography. This made it possible to assess the small differences in data quality in the presence and absence of excess surrounding solvent. The effects of surrounding solvent at longer wavelengths appear to be protein dependent with some proteins tested showing no significant difference and others a significant decrease in data quality at longer wavelengths (Chapter 3). Originally this project aimed to use the new long wavelength in-vacuum MX beamline, I23, at Diamond Light Source UK to carry out phasing experiments using native sulphurs for structure solution. However, the considerable complexity involved in developing in-vacuum MX meant these experiments could not be carried out during the time frame of this thesis. Chapters 4 and 5 outline the production of a novel cancer protein (cancerous inhibitor of protein phosphatase 2A) and two protein targets from the Achromobacter xylosoxidans (Ax) genome intended for sulphur single wavelength anomalous dispersion phasing experiments on I23. Of these proteins the structure of Ax-α/β hydrolase was solved by conventional methods, the structure of which is discussed in Chapter 5. Of the protein crystals used in long wavelength data quality experiments in Chapter 3 the molecular biology of PA3825-EAL, a biofilm regulating protein essential to the swarming ability of Pseudomonas aeruginosa, was investigated further. The crystal structure of PA3825-EAL was solved in the resting, substrate bound and product bound states to high resolution. Comparison of the crystal structures of monomeric and dimeric PA3825-EAL with the inactive dimeric structure of MucR-EAL suggests dimerisation via helix 8 plays a role in inhibition of EAL domains. Prior to this, dimerisation was thought to be an activating factor in EAL domains. The product bound state of PA3825-EAL showed the presence of a previously unreported third metal binding site which may form an essential component of the reaction mechanism of EAL domains. Inability of MucR-EAL to incorporate this third metal due to dimerisation may explain the lack of activity despite possessing the conserved catalytic residues necessary. The fast detector technology and improvements in automated data processing software that allowed diffraction data for large populations of crystals to be collected in Chapters 2 and 3 have also been applied to development of a serial data collection technique. Of 159 datasets collected from 8 crystals of a copper nitrite reductase from Achromobacter cycloclastes, 45 datasets from a single crystal were analysed to observe the reaction mechanism using high resolution crystal structures. X-ray radiolysis initiated the reaction and high resolution data allowed the conversion of nitrite (NO2) to nitric oxide (NO) to be observed in the crystal. Other aspects of the reaction were investigated from the data series including a conserved water chain connecting the copper sites which may act as a proton wire to donate a proton and produce NO. This technique may have wide applications to the study of the reaction mechanisms of other metallo-proteins
    corecore