6 research outputs found

    Аналіз неструктурованих ділянок цитоплазматичної тирозил-тРНК синтетази людини методами біоінформатики

    No full text
    Передбачено неструктуровані ділянки цитоплазматичної тирозил-тРНК синтетази ссавців методами біоінформатики з використанням 15 веб-серверів. Показано високу імовірність неструктурованого стану для рухливої «KMSKS»-петлі каталітичного центра (залишки Рго216–Lys231), яка набуває визначеної конформації під час каталітичного акта. Для ділянки межмодульного лінкера (залишки Asp343–Glu359) показано найбільшу імовірність перебування у неструктурованому стані. Порівняння цих даних з величинами В-факторів Cα-атомів кристалографічних структур N- і С-кінцевих модулів виявило чітку кореляцію даних біоінформатики і рентгеноструктурного аналізу. Наявність гнучкого міжмодульного лінкера є характерною ознакою білків, які містять EMAP ІІ-подібний С-кінцевий модуль. Висловлено гіпотезу стосовно того, що конформаційні перебудови у лінкерній ділянці можуть відігравати суттєву роль при форму- ванні комплексів цих білків з тРНК.Проведено предсказание неструктурированных участков тирозил-тРНК синтетазы млекопитающих методами биоинформатики с использованием 15 веб-серверов. Показана высокая вероятность неструктурированного состояния для подвижной «KMSKS»-петли каталитического центра (остатки Pro216–Lys231), приобретающей определенную конформацию во время каталитического акта. Для участка межмодульного линкера (остатки Asp343–Glu359) определена наибольшая возможность его неструктурированного состояния. Сравнение этих данных с величинами В-факторов Сα-атомов кристаллографических структур N- и С-концевых модулей демонстрирует удовлетворительную корреляцию с результатами кристалло- графического анализа. Наличие гибкого межмодульного линкера является характерной особенностью белков, содержащих EMAP II-подобный С-концевой модуль. Предложена гипотеза о том, что конформационные перестройки в линкерной области могут играть существенную роль при формировании комплек- сов этих белков с тРНК.The prediction of unstructured regions of mammalian cytoplasmic tyrosyl-tRNA synthetase is carried out by bioinformatics methods using 15 web-servers. High probability of unfolded state for flexible “KMSKS» loop of catalytic centre (residue Pro216-Lys231), which getting certain conformation during catalytic act is shown. For the region of intermodule linker (residues Asp343-Glu359) the highest probability of its unstructured state is shown. The comparison of these data with B-factor values for Ca-atoms of crystallographic structures of N- and C-terminal modules shows a strong correlation between the bioinformatics and X-ray analyses data. The presence of flexible intermodular linker is a characteristic feature of proteins which contain the EMAP II-like-terminal module. The hypothesis is proposed about a possible conformational rearrangement of this linker region which may be essential upon the complex formation between these proteins and tRNAs

    Large-scale prediction of long disordered regions in proteins using random forests

    Get PDF
    Background: Many proteins contain disordered regions that lack fixed three-dimensional (3D) structure under physiological conditions but have important biological functions. Prediction of disordered regions in protein sequences is important for understanding protein function and in high-throughput determination of protein structures. Machine learning techniques, including neural networks and support vector machines have been widely used in such predictions. Predictors designed for long disordered regions are usually less successful in predicting short disordered regions. Combining prediction of short and long disordered regions will dramatically increase the complexity of the prediction algorithm and make the predictor unsuitable for large-scale applications. Efficient batch prediction of long disordered regions alone is of greater interest in large-scale proteome studies. Results: A new algorithm, IUPforest-L, for predicting long disordered regions using the random forest learning model is proposed in this paper. IUPforest-L is based on the Moreau-Broto auto-correlation function of amino acid indices (AAIs) and other physicochemical features of the primary sequences. In 10-fold cross validation tests, IUPforest-L can achieve an area of 89.5% under the receiver operating characteristic (ROC) curve. Compared with existing disorder predictors, IUPforest-L has high prediction accuracy and is efficient for predicting long disordered regions in large-scale proteomes. Conclusion: The random forest model based on the auto-correlation functions of the AAIs within a protein fragment and other physicochemical features could effectively detect long disordered regions in proteins. A new predictor, IUPforest-L, was developed to batch predict long disordered regions in proteins, and the server can be accessed from http://dmg.cs.rmit.edu.au/IUPforest/IUPforest-L.php

    Prediction of natively disordered regions in proteins using a bio-basis function neural network

    No full text
    Recent studies have found that many proteins contain regions that do not form well defined three-dimensional structures in their native states. The study and detection of such disordered regions is very important both for facilitating structural analysis and to aid understanding of protein function. A newly developed pattern recognition algorithm termed a "Bio-basis Function Neural Network" has been applied to the detection of disordered regions in proteins. Different models were trained studying the effect of changing the size of the window used for residue classification. Ten-fold cross validation showed that the estimated prediction accuracy was 95.2% for a window size of 21 residues and an overlap threshold of 30%. Blind tests using the trained models on a data set unrelated to the training set gave a regional prediction accuracy of 81.4% (+/-0.9%)

    Structural studies of putative general stress and related proteins from Deinococcus radiodurans

    Get PDF
    This study describes the cloning, expression, purification, biophysical characterisation and crystallisation of DR_1146; a putative general stress protein from the extremophilic bacterium Deinococcus radiodurans (R1). The extraordinary ability of D. radiodurans to resist mutation or apoptosis on exposure to high does of ionising radiation has formed the basis of a structural genomics project underway at the European Synchrotron Radiation Facility (ESRF), Grenoble, France. The work presented in this study forms part of the ESRF’s D. radiodurans initiative, and was funded by the Biotechnology and Biological Sciences Research Council (BBSRC) and the ESRF as an Industrial Cooperative Award in Science and Engineering (CASE) PhD studentship. A period of one-year was spent on secondment at the ESRF, working within the Macromolecular Crystallography Group. Several constructs of the dr_1146 gene have been successfully overexpressed in E. coli cells to give high yields of target protein. Purification by immobilised metal affinity chromatography (IMAC) was facilitated by the incorporation of a 6xHis tag and supplemented by a final gel filtration step. Although high purity levels were achieved, imaging by SDS-PAGE analysis identified that DR_1146 was susceptible to stringent proteolysis. It is thought that initial crystallisation trials were unsuccessful due to inhomogeneity of the sample caused by reported degradation of the target protein. Biophysical characterisation of DR_1146 by isothermal titration calorimetry (ITC) and fluorescence spectroscopy (FS) identified a moderate affinity of 4-11 μM for the flavin molecules, riboflavin, flavin mononucleotide (FMN) and flavin adenine dinucleotide (FAD). Differential scanning calorimetry (DSC) and circular dichroism (CD) experiments demonstrated an increase in chemical and thermal stability of the protein on binding to the flavin molecule, FMN. Analytical ultracentrifugation (AUC) and Nuclear magnetic resonance (NMR) spectroscopy were employed to investigate the solution behaviour of DR_1146 in the presence of FMN. AUC results uncovered a monomer-dimer equilibrium; with DR_1146 self-associating to form a dimer at a concentration of 7.67 μM. NMR spectroscopy depicted that global changes occur within the structure of DR_1146 on binding to FMN. The high quality of spectra obtained showed potential for 3-D structure determination by NMR if ordered crystals could not be obtained for X-ray diffraction. Interestingly, analysis of NMR spectra proved to be integral to identifying a homogenous sample for successful crystallisation of DR_1146. By monitoring chemical shifts it was possible to determine the time needed for degradation of DR_1146 to cease, and the amount of FMN needed to ensure saturation of binding sites. From this particular sample, a stable 28 kDa fragment was isolated by gel filtration. Automated sitting-drop vapour-diffusion experiments resulted in the growth of yellow DR_1146-FMN crystals for which, although poor in quality, X-ray diffraction was obtained. Overall this study reflects the importance and advantage of incorporating information gained from biophysical characterisation into the strategies employed for successful protein crystallisation. The characterisation of DR_1146 as a flavoprotein points towards a possible role in electron transfer due to the extensive redox capacity of flavin. This could implicate the protein in the production of damaging reactive oxygen species (ROS) as a result of irradiation, contributing to oxidative stress levels. Alternatively, if DR_1146 is identified as a FMN-binding pyridoxine 5'-phosphate oxidase (PNPOx) enzyme, as sequence homology suggests, it could play a role in detoxification and stress response through production of pyridoxal 5'-phosphate (PLP), a known scavenger of ROS. Only further characterisation and elucidation of a 3-D structure would confirm or dispel these functional hypotheses and ultimately provide a greater understanding of how D. radiodurans is able to deal with such oxidising conditions. Simultaneously, experiments were carried out on other soluble and membrane protein targets from D. radiodurans and their corresponding homologues from Streptococcus pneumoniae (TIGR4). The aim of comparable studies was to identify key structural or functional differences between the two Gram-positive bacterial strains. Identification of features unique to D. radiodurans, but unconserved in S. pneumoniae, could contribute to further understanding of bacterial radioresistance. SP_1651 is a thiol peroxidase which forms part of the Mn-ABC transport system in S. pneumonia. Its homologue from D. radiodurans, DR_2242 is a putative thiol-specific antioxidant protein, the structure of which has been solved by Dr. Dave Hall as part of the ESRF’s structural genomics project (unpublished). The aim of this part of the project was to elucidate the structure of SP_1651 so that a comparison with DR_2242 could be made. The sp_1651 gene (psaD) was successfully expressed and purified to homogeneity by IMAC and gel filtration. After the proteolytic removal of a 6xHis tag, the purified protein was crystallised by sitting-drop vapour-diffusion. Preliminary diffraction with a resolution limit of 3.2 Å was obtained, however data showed high mosaic spread. Unfortunately, attempts to reproduce initial crystals failed and hence, structural comparisons with DR_2242 could not be made. DR_0463 is a 108 kDa maltooligosyltrehalose synthase (MTSase) which has been shown to catalyse the breakdown of maltooligosaccharide (or starch) into the disaccharide, trehalose. The full length gene was expressed in BL21(DE3)pLysS cells, producing large yields of insoluble target protein. DR_0463 was solubilised with 8 M Urea and then purified by IMAC in the presence of the denaturant. The low affinity of DR_0463 for the Ni2+ matrix of the HisTrap column proved to be problematic when trying to obtain homogeneity. However, by sequentially repeating IMAC purification up to three times with the same protein sample, a large proportion of impurities were removed. SP_1648 (PsaB) is an ATP-binding protein that forms part of the Mn-ATP transport system in S. pneumoniae and its homologue from D. radiodurans, DR_2284 is predicted to share similar function. Purification of soluble SP_1648, expressed in B834(DE3) cells, was complicated by an inability to bind the protein to the column matrix for IMAC. In the case of DR_2284, expression trials yielded only a minute amount of insoluble protein in BL21-AI competent cells. The bottlenecks in early expression and purification stages provided valuable experience in dealing with problematic proteins. As an introduction to molecular cloning, two genes predicted to encode integral membrane proteins from D. radiodurans, were cloned for preliminary expression trials. This work was carried out at the ESRF and contributed to an extension of the structural genomics project, to incorporate membrane protein targets from D. radiodurans. Full length forms of the genes thought to encode an undecaprenyl diphosphatase (UDP) and a diacylglycerol kinase (DGKA) were successfully cloned in to pET-28b, with incorporation of separate N- and C- terminal 6xHis tags

    The accurate prediction of disordered regions in protein sequences using machine learning approaches

    Get PDF
    A major challenge in the post-genome era is to determine the function of proteins. The traditional structure-function paradigm assumes that the function of a protein is contingent on it folding into a stable three-dimensional structure. However many proteins contain intrinsic unstructured or Disordered Regions (DRs) under physiological conditions, and yet they still carry important functions. Determination of the disordered regions in proteins is therefore an important step towards the determination of their functions. Traditional experimental approaches are generally time consuming and expensive. The efficient and cost-effective computer aided automatic prediction of DRs is thus an attractive alternative. To this end, we propose the novel application of machine learning models and physicochemical features extracted from protein sequences for predicting long, short and global disorder in proteins. To improve the understandability of disorder prediction, rule based predictors are proposed, which are not only able to predict DRs, but can also quantify previously unknown associations between order disorder status and sequences. The prediction process is transparent and simple to explain. As DRs of different lengths possess different properties, to achieve a high accuracy of prediction, we propose predictors specific to long, short and global disorder prediction. These predictors are distinct from each other in terms of their features, the machine learning models used, and the methods of prediction. We thoroughly investigate the database of physicochemical properties of amino acid indices and select the indices most correlated with disorder. Based on these properties, novel feature transforms including autocorrelation and wavelet transforms (WTs) are applied to DR prediction. According to the results of cross-validation tests, our long DR predictor based on autocorrelation achieves the highest accuracy of prediction among long DR predictors at an AUC (Area Under ROC Curve) value of 89.5%. A short DR predictor based on WTs achieves an AUC value of 88.7%, which is comparable to the most accurate short DR predictors. The global DR predictor achieves an AUC value of 96.1%, close to the optimal value. A major bottleneck of large scale DR prediction is the time efficiency constraint that is attributed to slow feature generation stages and complicated prediction methods. Both our long and short DR predictors are built from simple methods of prediction and feature space. Our web service for long DR prediction can process an uploaded file of multiple sequences

    A bioinformatical approach for a reliable determination of short motifs for SUMO and Atg8 interaction in Saccharomyces cerevisiae

    Get PDF
    Regulatory processes are initiated by posttranslational modifications of proteins which alter their activity, stability, localization or their interaction with other proteins. Among many other processes, decoding the SUMOylation signal or the recognition of lipidated Atg8 represent starting points to effective downstream signalling pathways, initiated by SUMO interacting motifs (SIMs) and Atg8 interacting motifs (AIMs) in protein sequences. The low information contents of SIMs and AIMs prevent their detection from spurious sequences. This thesis is about a detection approach for so far unknown SIMs and AIMs with bioinformatical methods. The first part of this thesis describes the bioinformatical SIM detection method. The overall method is common for all SIM types, whereas the single SIM detection screens apply to the characteristics for SIMa, SIMb and SIMr. Two sets of phylogenetic distances from budding yeast in combination with information theoretical approaches and sliding averages serve as a conservation measure. A combination of bioinformatical tools is used for an estimation whether a protein segment is unstructured, non-globular or globular. The bioinformatical approach in this thesis uses these conservation and structural features for the evolution of a functionality scoring measure for unknown SIM instances. Experimental interaction studies show so far unknown SUMO interaction for Dbp10, Drs1, Rfc1, Rad18 and Tdp1 from the bioinformatical SIM detection screens. Dbp10 and Drs1 are involved in ribosomal biogenesis in the nucleolus. A SIM in this biological context has not yet been reported, whereas SUMOylation is involved in the release of pre-ribosomal particles into the nucleoplasm. Tdp1, Rfc1 and Rad18 are involved in DNA replication and damage repair, where SUMOylation is a crucial activity factor for other proteins. The motif in Rfc1 identified from the bioinformatical detection screen is shown responsible for SUMO interaction in mutation studies. This motif also causes observable growth phenotype under chemically induced DNA damage stress. The motif in Rad18 was meanwhile identified by Parker and Ulrich. The second part of this study describes the application of analogous methods for a bioinformatical AIM detection approach. The conservation characteristics of established AIM are found similar to those for SIMs, whereas the structural context is harder to be represented with bioinformatical methods
    corecore