31 research outputs found

    RSLpred: an integrative system for predicting subcellular localization of rice proteins combining compositional and evolutionary information

    Get PDF
    The attainment of complete map-based sequence for rice (Oryza sativa) is clearly a major milestone for the research community. Identifying the localization of encoded proteins is the key to understanding their functional characteristics and facilitating their purification. Our proposed method, RSLpred, is an effort in this direction for genome-scale subcellular prediction of encoded rice proteins. First, the support vector machine (SVM)-based modules have been developed using traditional amino acid-, dipeptide- (i+1) and four parts-amino acid composition and achieved an overall accuracy of 81.43, 80.88 and 81.10%, respectively. Secondly, a similarity search-based module has been developed using position-specific iterated-basic local alignment search tool and achieved 68.35% accuracy. Another module developed using evolutionary information of a protein sequence extracted from position-specific scoring matrix achieved an accuracy of 87.10%. In this study, a large number of modules have been developed using various encoding schemes like higher-order dipeptide composition, N- and C-terminal, splitted amino acid composition and the hybrid information. In order to benchmark RSLpred, it was tested on an independent set of rice proteins where it outperformed widely used prediction methods such as TargetP, Wolf-PSORT, PA-SUB, Plant-Ploc and ESLpred. To assist the plant research community, an online web tool 'RSLpred' has been developed for subcellular prediction of query rice proteins, which is freely accessible at http://www.imtech.res.in/raghava/rslpred

    Cheminformatics Tools to Explore the Chemical Space of Peptides and Natural Products

    Get PDF
    Cheminformatics facilitates the analysis, storage, and collection of large quantities of chemical data, such as molecular structures and molecules' properties and biological activity, and it has revolutionized medicinal chemistry for small molecules. However, its application to larger molecules is still underrepresented. This thesis work attempts to fill this gap and extend the cheminformatics approach towards large molecules and peptides. This thesis is divided into two parts. The first part presents the implementation and application of two new molecular descriptors: macromolecule extended atom pair fingerprint (MXFP) and MinHashed atom pair fingerprint of radius 2 (MAP4). MXFP is an atom pair fingerprint suitable for large molecules, and here, it is used to explore the chemical space of non-Lipinski molecules within the widely used PubChem and ChEMBL databases. MAP4 is a MinHashed hybrid of substructure and atom pair fingerprints suitable for encoding small and large molecules. MAP4 is first benchmarked against commonly used atom pairs and substructure fingerprints, and then it is used to investigate the chemical space of microbial and plants natural products with the aid of machine learning and chemical space mapping. The second part of the thesis focuses on peptides, and it is introduced by a review chapter on approaches to discover novel peptide structures and describing the known peptide chemical space. Then, a genetic algorithm that uses MXFP in its fitness function is described and challenged to generate peptide analogs of peptidic or non-peptidic queries. Finally, supervised and unsupervised machine learning is used to generate novel antimicrobial and non-hemolytic peptide sequences

    Post-Translational Protein Modifications involved in Exo- and Endocytosis of Synaptic Vesicles

    Get PDF
    Neurotransmitter release is a key step that enables information flow between the pre- and post-synapse. However, regulation of the neurotransmitter release remains an intricate and widely unexplored matter despite recent advances in the understanding of the neurotransmitter release machinery and the analysis of the synaptic proteome and protein modifications. Indeed, post-translational protein modifications such as phosphorylation are suitable to quickly fine-tune the neurotransmitter release “in place” via affecting tertiary protein structures and protein-protein interactions, and globally, via modulating signaling pathways. Here, the investigations were focused on the dependence of protein phosphorylation in synaptosomes on the synaptic vesicle (SV) cycling, determining kinase-substrate interactions, and modulatory effects of selected sites on exo- and endocytosis. The analysis of synaptic phosphoproteome was conducted using TiO2-based enrichment of phosphorylated peptides with subsequent chemical labeling by isobaric mass tags (TMT) and a mass spectrometry-based quantification. Synaptosomes were employed as a functional model of a synapse as they contain the required neurotransmitter release machinery and respond to stimulation. First, the applicability of electrical stimulation was tested. The field- stimulation evoked reproducible glutamate release that was significantly suppressed in the absence of Ca2+, though it remained uncertain, to which degree the release is governed by exocytosis. Therefore, another approach using a KCl-induced depolarization and treatment with botulinum neurotoxins (BoNTs) was used to identify phosphorylation events that depend on SV cycling. BoNTs cleave specifically SNARE proteins and thus block exocytosis and SV cycling, but do not impede Ca2+-influx evoked by the plasma membrane depolarization. Comparison of phosphorylation events in synaptosomes stimulated in the presence of Ca2+, EGTA (0 net Ca2+) or pre-treated with BoNTs identified sites that were differentially phosphorylated following BoNT treatment, i.e., SV-cycling-dependent sites, and sites that were differentially phosphorylated when comparing Ca and EGTA conditions, but did not change under BoNT treatment, i.e., primarily Ca2+-dependent sites. Further differential expression analysis revealed that BoNT-treatment mostly caused de-phosphorylation of synaptic proteins. A kinase-substrate analysis showed that >25% of BoNT-responsive sites are predicted MAPK substrates and 20% of primarily Ca2+-dependent sites are presumably regulated by CaMKII, which corroborates Ca2+- dependence of these phosphorylation events. SV-cycling-dependent phosphorylation sites on syntaxin-1 (T21/T23-Stx1), synaptobrevin (S75-Vamp2), and cannabinoid receptor-1 (S314/T322-Cnr1) were further investigated for their impact on exo- and endocytosis. In collaboration with Dr. Eugenio Fornasiero and Prof. Dr. Silvio O. Rizzoli, corresponding phosphomimetic and non-phosphorylatable variants of the proteins were expressed in cultured hippocampal neurons. Imaging of the pH-sensor pHluorine coupled to synaptobrevin-2 revealed that the expression of phosphomimetic and non-phosphorylatable sites affected exo- and endocytosis in neurons. This work is first to investigate the electrical stimulation in relation to the Ca2+-dependent neurotransmitter release and exocytosis in synaptosomes. It further provides a comprehensive draft of synaptosomal phosphoproteome and is first to demonstrate its global dependence on an active SV cycling. The analysis of cultured hippocampal neurons expressing non-phosphorylatable and phosphomimetic mutants of pre-synaptic proteins syntaxin-1, synaptobrevin-2, and cannabinoid receptor-1 further demonstrates that the identified SV-cycling-dependent sites affect exo- and endocytosis.2021-11-0

    VARIATIONS IN MICROARRAY BASED GENE EXPRESSION PROFILING: IDENTIFYING SOURCES AND IMPROVING RESULTS

    Get PDF
    Two major issues hinder the application of microarray based gene expression profiling in clinical laboratories as a diagnostic or prognostic tool. The first issue is the sheer volume and high-dimensionality of gene expression data from microarray experiments, which require advanced algorithms to extract meaningful gene expression patterns that correlate with biological impact. The second issue is the substantial amount of variation in microarray gene expression data, which impairs the performance of analysis method and makes sharing or integrating microarray data very difficult. Variations can be introduced by all possible sources including the DNA microarray technology itself and the experimental procedures. Many of these variations have not been characterized, measured, or linked to the sources. In the first part of this dissertation, a decision tree learning method was demonstrated to perform as well as more popularly accepted classification methods in partitioning cancer samples with microarray data. More importantly, results demonstrate that variation introduced into microarray data by tissue sampling and tissue handling compromised the performance of classification methods. In the second part of this dissertation, variations introduced by the T7 based in vitro transcription labeling methods were investigated in detail. Results demonstrated that individual amplification methods significantly biased gene expression data even though the methods compared in this study were all derivatives of the T7 RNA polymerase based in vitro transcription labeling approach. Variations observed can be partially explained by the number of biotinylated nucleotides used for labeling and the incubation time of the in vitro transcription experiments. These variations can generate discordant gene expression results even using the same RNA samples and cannot be corrected by post experiment analysis including advanced normalization techniques. Studies in this dissertation stress the concept that experimental and analytical methods must work together. This dissertation also emphasizes the importance of standardizing the DNA microarray technology and experimental procedures in order to optimize gene expression analysis and create quality standards compatible with the clinical application of this technology. These findings should be taken into account especially when comparing data from different platforms, and in standardizing protocols for clinical applications in pathology

    Protein Domain Linker Prediction: A Direction for Detecting Protein – Protein Interactions

    Get PDF
    Protein chains are generally long and consist of multiple domains. Domains are the basic of elements of protein structures that can exist, evolve and function independently. The accurate and reliable identification of protein domains and their interactions has very important impacts in several protein research areas. The accurate prediction of protein domains is a fundamental stage in both experimental and computational proteomics. The knowledge is an initial stage of protein tertiary structure prediction which can give insight into the way in which protein works. The knowledge of domains is also useful in classifying the proteins, understanding their structures, functions and evolution, and predicting protein-protein interactions (PPI). However, predicting structural domains within proteins is a challenging task in computational biology. A promising direction of domain prediction is detecting inter-domain linkers and then predicting the reigns of the protein sequence in which the structural domains are located accordingly. Protein-protein interactions occur at almost every level of cell function. The identification of interaction among proteins and their associated domains provide a global picture of cellular functions and biological processes. It is also an essential step in the construction of PPI networks for human and other organisms. PPI prediction has been considered as a promising alternative to the traditional drug design techniques. The identification of possible viral-host protein interaction can lead to a better understanding of infection mechanisms and, in turn, to the development of several medication drugs and treatment optimization. In this work, a compact and accurate approach for inter-domain linker prediction is developed based solely on protein primary structure information. Then, inter-domain linker knowledge is used in predicting structural domains and detecting PPI. The research work in this dissertation can be summarized in three main contributions. The first contribution is predicting protein inter-domain linker regions by introducing the concept of amino acid compositional index and refining the prediction by using the Simulated Annealing optimization technique. The second contribution is identifying structural domains based on inter-domain linker knowledge. The inter-domain linker knowledge, represented by the compositional index, is enhanced by the in cooperation of biological knowledge, represented by amino acid physiochemical properties. To develop a well optimized Random Forest classifier for predicting novel domain and inter-domain linkers. In the third contribution, the domain information knowledge is utilized to predict protein-protein interactions. This is achieved by characterizing structural domains within protein sequences, analyzing their interactions, and predicting protein interaction based on their interacting domains. The experimental studies and the higher accuracy achieved is a valid argument in favor of the proposed framework

    Relation Prediction over Biomedical Knowledge Bases for Drug Repositioning

    Get PDF
    Identifying new potential treatment options for medical conditions that cause human disease burden is a central task of biomedical research. Since all candidate drugs cannot be tested with animal and clinical trials, in vitro approaches are first attempted to identify promising candidates. Likewise, identifying other essential relations (e.g., causation, prevention) between biomedical entities is also critical to understand biomedical processes. Hence, it is crucial to develop automated relation prediction systems that can yield plausible biomedical relations to expedite the discovery process. In this dissertation, we demonstrate three approaches to predict treatment relations between biomedical entities for the drug repositioning task using existing biomedical knowledge bases. Our approaches can be broadly labeled as link prediction or knowledge base completion in computer science literature. Specifically, first we investigate the predictive power of graph paths connecting entities in the publicly available biomedical knowledge base, SemMedDB (the entities and relations constitute a large knowledge graph as a whole). To that end, we build logistic regression models utilizing semantic graph pattern features extracted from the SemMedDB to predict treatment and causative relations in Unified Medical Language System (UMLS) Metathesaurus. Second, we study matrix and tensor factorization algorithms for predicting drug repositioning pairs in repoDB, a general purpose gold standard database of approved and failed drug–disease indications. The idea here is to predict repoDB pairs by approximating the given input matrix/tensor structure where the value of a cell represents the existence of a relation coming from SemMedDB and UMLS knowledge bases. The essential goal is to predict the test pairs that have a blank cell in the input matrix/tensor based on the shared biomedical context among existing non-blank cells. Our final approach involves graph convolutional neural networks where entities and relation types are embedded in a vector space involving neighborhood information. Basically, we minimize an objective function to guide our model to concept/relation embeddings such that distance scores for positive relation pairs are lower than those for the negative ones. Overall, our results demonstrate that recent link prediction methods applied to automatically curated, and hence imprecise, knowledge bases can nevertheless result in high accuracy drug candidate prediction with appropriate configuration of both the methods and datasets used

    Bioinformatic analysis of bacterial and eukaryotic amino- terminal signal peptides

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH

    Functional prediction of bioactive toxins in scorpion venom through bioinformatics

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH
    corecore