1,307 research outputs found

    Clathrin Adaptor Complex-interacting Protein Irc6 Functions through the Conserved C-Terminal Domain.

    Get PDF
    Clathrin coats drive transport vesicle formation from the plasma membrane and in pathways between the trans-Golgi network (TGN) and endosomes. Clathrin adaptors play central roles orchestrating assembly of clathrin coats. The yeast clathrin adaptor-interacting protein Irc6 is an orthologue of human p34, which is mutated in the inherited skin disorder punctate palmoplantar keratoderma type I. Irc6 and p34 bind to clathrin adaptor complexes AP-1 and AP-2 and are members of a conserved family characterized by a two-domain architecture. Irc6 is required for AP-1-dependent transport between the TGN and endosomes in yeast. Here we present evidence that the C-terminal two amino acids of Irc6 are required for AP-1 binding and transport function. Additionally, like the C-terminal domain, the N-terminal domain when overexpressed partially restores AP-1-mediated transport in cells lacking full-length Irc6. These findings support a functional role for Irc6 binding to AP-1. Negative genetic interactions with irc6∆ are enriched for genes related to membrane traffic and nuclear processes, consistent with diverse cellular roles for Irc6

    Identification of RNA Binding Proteins and RNA Binding Residues Using Effective Machine Learning Techniques

    Get PDF
    Identification and annotation of RNA Binding Proteins (RBPs) and RNA Binding residues from sequence information alone is one of the most challenging problems in computational biology. RBPs play crucial roles in several fundamental biological functions including transcriptional regulation of RNAs and RNA metabolism splicing. Existing experimental techniques are time-consuming and costly. Thus, efficient computational identification of RBPs directly from the sequence can be useful to annotate RBP and assist the experimental design. Here, we introduce AIRBP, a computational sequence-based method, which utilizes features extracted from evolutionary information, physiochemical properties, and disordered properties to train a machine learning method designed using stacking, an advanced machine learning technique, for effective prediction of RBPs. Furthermore, it makes use of efficient machine learning algorithms like Support Vector Machine, Logistic Regression, K-Nearest Neighbor and XGBoost (Extreme Gradient Boosting Algorithm). In this research work, we also propose another predictor for efficient annotation of RBP residues. This RBP residue predictor also uses stacking and evolutionary algorithms for efficient annotation of RBPs and RNA Binding residue. The RNA-binding residue predictor also utilizes various evolutionary, physicochemical and disordered properties to train a robust model. This thesis presents a possible solution to the RBP and RNA binding residue prediction problem through two independent predictors, both of which outperform existing state-of-the-art approaches

    Identification of RNA Binding Proteins and RNA Binding Residues Using Effective Machine Learning Techniques

    Get PDF
    Identification and annotation of RNA Binding Proteins (RBPs) and RNA Binding residues from sequence information alone is one of the most challenging problems in computational biology. RBPs play crucial roles in several fundamental biological functions including transcriptional regulation of RNAs and RNA metabolism splicing. Existing experimental techniques are time-consuming and costly. Thus, efficient computational identification of RBPs directly from the sequence can be useful to annotate RBP and assist the experimental design. Here, we introduce AIRBP, a computational sequence-based method, which utilizes features extracted from evolutionary information, physiochemical properties, and disordered properties to train a machine learning method designed using stacking, an advanced machine learning technique, for effective prediction of RBPs. Furthermore, it makes use of efficient machine learning algorithms like Support Vector Machine, Logistic Regression, K-Nearest Neighbor and XGBoost (Extreme Gradient Boosting Algorithm). In this research work, we also propose another predictor for efficient annotation of RBP residues. This RBP residue predictor also uses stacking and evolutionary algorithms for efficient annotation of RBPs and RNA Binding residue. The RNA-binding residue predictor also utilizes various evolutionary, physicochemical and disordered properties to train a robust model. This thesis presents a possible solution to the RBP and RNA binding residue prediction problem through two independent predictors, both of which outperform existing state-of-the-art approaches

    HseSUMO: Sumoylation site prediction using half - sphere exposures of amino acids residues

    Get PDF
    Background Post-translational modifications are viewed as an important mechanism for controlling protein function and are believed to be involved in multiple important diseases. However, their profiling using laboratory-based techniques remain challenging. Therefore, making the development of accurate computational methods to predict post-translational modifications is particularly important for making progress in this area of research. Results This work explores the use of four half-sphere exposure-based features for computational prediction of sumoylation sites. Unlike most of the previously proposed approaches, which focused on patterns of amino acid co-occurrence, we were able to demonstrate that protein structural based features could be sufficiently informative to achieve good predictive performance. The evaluation of our method has demonstrated high sensitivity (0.9), accuracy (0.89) and Matthew’s correlation coefficient (0.78–0.79). We have compared these results to the recently released pSumo-CD method and were able to demonstrate better performance of our method on the same evaluation dataset. Conclusions The proposed predictor HseSUMO uses half-sphere exposures of amino acids to predict sumoylation sites. It has shown promising results on a benchmark dataset when compared with the state-of-the-art method

    Protein fold recognition using HMM–HMM alignment and dynamic programming

    Get PDF
    Detecting three dimensional structures of protein sequences is a challenging task in biological sciences. For this purpose, protein fold recognition has been utilized as an intermediate step which helps in classifying a novel protein sequence into one of its folds. The process of protein fold recognition encompasses feature extraction of protein sequences and feature identification through suitable classi- fiers. Several feature extractors are developed to retrieve useful information from protein sequences. These features are generally extracted by constituting protein’s sequential, physicochemical and evolutionary properties. The performance in terms of recognition accuracy has also been gradually improved over the last decade. However, it is yet to reach a well reasonable and accepted level. In this work, we first applied HMM–HMM alignment of protein sequence from HHblits to extract profile HMM (PHMM) matrix. Then we computed the distance between respective PHMM matrices using kernalized dynamic programming. We have recorded significant improvement in fold recognition over the state-of-the-art feature extractors. The improvement of recognition accuracy is in the range of 2.7–11.6% when experimented on three benchmark datasets from Structural Classification of Proteins

    MESSM: a framework for protein threading by neural networks and support vector machines

    Get PDF
    Protein threading, which is also referred to as fold recognition, aligns a probe amino acid sequence onto a library of representative folds of known structure to identify a structural similarity. Following the threading technique of the structural profile approach, this research focused on developing and evaluating a new framework - Mixed Environment Specific Substitution Mapping (MESSM) - for protein threading by artificial neural networks (ANNs) and support vector machines (SVMs). The MESSM presents a new process to develop an efficient tool for protein fold recognition. It achieved better efficiency while retained the effectiveness on protein prediction. The MESSM has three key components, each of which is a step in the protein threading framework. First, building the fold profile library-given a protein structure with a residue level environmental description, Neural Networks are used to generate an environment-specific amino acid substitution (3D-1D) mapping. Second, mixed substitution mapping--a mixed environment-specific substitution mapping is developed by combing the structural-derived substitution score with sequence profile from well-developed amino acid substitution matrices. Third, confidence evaluation--a support vector machine is employed to measure the significance of the sequence-structure alignment. Four computational experiments are carried out to verify the performance of the MESSM. They are Fischer, ProSup, Lindahl and Wallner benchmarks. Tested on Fischer, Lindahl and Wallner benchmarks, MESSM achieved a comparable performance on fold recognition to those energy potential based threading models. For Fischer benchmark, MESSM correctly recognise 56 out of 68 pairs, which has the same performance as that of COBLATH and SPARKS. The computational experiments show that MESSM is a fast program. It could make an alignment between probe sequence (150 amino acids) and a profile of 4775 template proteins in 30 seconds on a PC with IG memory Pentium IV. Also, tested on ProSup benchmark, the MESSM achieved alignment accuracy of 59.7%, which is better than current models. The research work was extended to develop a threading score following the threading technique of the contact potential approach. A TES (Threading with Environment-specific Score) model is constructed by neural networks

    Proteome-wide structural analysis identifies warhead- and coverage-specific biases in cysteine-focused chemoproteomics

    Get PDF
    Covalent drug discovery has undergone a resurgence over the past two decades and reactive cysteine profiling has emerged in parallel as a platform for ligand discovery through on- and off-target profiling; however, the scope of this approach has not been fully explored at the whole-proteome level. We combined AlphaFold2-predicted side-chain accessibilities for >95% of the human proteome with a meta-analysis of eighteen public cysteine profiling datasets, totaling 44,187 unique cysteine residues, revealing accessibility biases in sampled cysteines primarily dictated by warhead chemistry. Analysis of >3.5 million cysteine-fragment interactions further showed that hit elaboration and optimization drives increased bias against buried cysteine residues. Based on these data, we suggest that current profiling approaches cover a small proportion of potential ligandable cysteine residues and propose future directions for increasing coverage, focusing on high-priority residues and depth. All analysis and produced resources are freely available and extendable to other reactive amino acids

    The Advancement of Mass Spectrometry-based Hydroxyl Radical Protein Footprinting: Application of Novel Analysis Methods to Model Proteins and Apolipoprotein E

    Get PDF
    Fast photochemical oxidation of proteins: FPOP) has shown great promise in the elucidation of the regions of a protein\u27s structure that are changed upon interaction with other macromolecules, ligands, or by folding. The advantage of this protein footprinting method is that it utilizes the reactivity of hydroxyl radicals to stably modify solvent accessible residues non-specifically in a microsecond. The extent of *OH labeling at sites assays their solvent accessibility. We have corroborated the predicted profoundly short timescale of labeling empirically, by FPOP-labeling three oxidation-sensitive proteins and examining their global FPOP product outcomes. The novel test developed to validate conformational invariance during labeling can be applied generally to any footprinting methodology where perturbation to protein structure by the footprint labeling is suspected. The stable modifications can be detected and quantified by the same proteolysis, chromatography, and mass spectrometry techniques employed in proteomics studies; however, proteomics software does not automatically report the residue-resolved full-sequence-coverage footprint information found in proteomics-like FPOP data. Here we report the development of software tools to facilitate a comprehensive and efficient analysis of FPOP data, and demonstrate their use in a study of barstar in its unfolded and native states. We next show that SO4-* can serve as an alternative non-specific labeling agent that can be generated by the FPOP apparatus on the same fast timescale as *OH. This demonstrates the tunable nature of FPOP. We have used FPOP to characterize the oligomeric structures of three human apolipoprotein E: ApoE) isoforms and a monomeric mutant in their lipid-free states. Only one isoform of ApoE is strongly associated with Alzheimer\u27s disease; unfortunately, the structural reason for this association is not known, in part because no high resolution structure exists of any isoform. We find that the three common isoforms of ApoE are very similar in their solvent accessible footprint, that their oligomeric interactions involve several regions in the C-terminal domain, and that the N-terminal domain of each resembles the monomeric mutant\u27s N-terminal domain, the truncated form of which has been characterized as a four-helix bundle. Finally, we find by FPOP that ApoE interacts with beta-amyloid peptide 1-42 at a specific site in its N-terminal domain
    • …
    corecore