689 research outputs found

    Peptide classification using optimal and information theoretic syntactic modeling

    Get PDF
    We consider the problem of classifying peptides using the information residing in their syntactic representations. This problem, which has been studied for more than a decade, has typically been investigated using distance-based metrics that involve the edit operations required in the peptide comparisons. In this paper, we shall demonstrate that the Optimal and Information Theoretic (OIT) model of Oommen and Kashyap [22] applicable for syntactic pattern recognition can be used to tackle peptide classification problem. We advocate that one can model the differences between compared strings as a mutation model consisting of random substitutions, insertions and deletions obeying the OIT model. Thus, in this paper, we show that the probability measure obtained from the OIT model can be perceived as a sequence similarity metric, using which a support vector machine (SVM)-based peptide classifier can be devised. The classifier, which we have built has been tested for eight different substitution matrices and for two different data sets, namely, the HIV-1 Protease cleavage sites and the T-cell epitopes. The results show that the OIT model performs significantly better than the one which uses a Needleman-Wunsch sequence alignment score, it is less sensitive to the substitution matrix than the other methods compared, and that when combined with a SVM, is among the best peptide classification methods availabl

    The importance of physicochemical characteristics and nonlinear classifiers in determining HIV-1 protease specificity

    Get PDF
    This paper reviews recent research relating to the application of bioinformatics approaches to determining HIV-1 protease specificity, outlines outstanding issues, and presents a new approach to addressing these issues. Leading machine learning theory for the problem currently suggests that the direct encoding of the physicochemical properties of the amino acid substrates is not required for optimal performance. A number of amino acid encoding approaches which incorporate potentially relevant physicochemical properties of the substrate are identified, and are evaluated using a nonlinear task decomposition based neuroevolution algorithm. The results are evaluated, and compared against a recent benchmark set on a nonlinear classifier using only amino acid sequence and identity information. Ensembles of these nonlinear classifiers using the physicochemical properties of the substrate are demonstrated to consistently outperform the recently published state-of-the-art linear support vector machine based approach in out-of-sample evaluations

    Unify Markov model for Rational Design and Synthesis of More Safe Drugs. Predicting Multiple Drugs Side Effects

    Get PDF
    The 9th International Electronic Conference on Synthetic Organic Chemistry session Computational ChemistryMost of present mathematical models for rational design and synthesis of new drugs consider just the molecular structure. In the present article we pretend extending the use of Markov Chain models to define novel molecular descriptors, which consider in addition other parameters like target site or biological effect. Specifically, this model takes into consideration not only the molecular structure but the specific biological system the drug affects too. Herein, it is developed a general Markov model that describes 19 different drugs side effects grouped in 8 affected biological systems for 178 drugs, being 270 cases finally. The data was processed by Linear Discriminant Analysis (LDA) classifying drugs according to their specific side effects, forward stepwise was fixed as strategy for variables selection. The average percentage of good classification and number of compounds used in the training/predicting sets were 100/95.8% for endocrine manifestations(18 out of 18)/(13 out of 14); 90.5/92.3% for gastrointestinal manifestations (38 out of 42)/(30 out of 32); 88.5/86.5% for systemic phenomena (23 out of 26)/(17 out of 20); 81.8/77.3% for neurological manifestations (27 out of 33)/(19 out of 25); 81.6/86.2% for dermal manifestations (31 out of 38)/(25 out of 29); 78.4/85.1% for cardiovascular manifestation (29 out of 37)/(24 out of 28); 77.1/75.7% for breathing manifestations (27 out of 35)/(20 out of 26) and 75.6/75% for psychiatric manifestations (31 out of 41)/(23 out of 31). Additionally a Back-Projection Analysis (BPA) was carried out for two ulcerogenic drugs to prove in structural terms the physic interpretation of the models obtained. This article develops a model that encompasses a large number of drugs side effects grouped in specifics biological systems using stochastic absolute probabilities of interaction (Apk (j)) by the first time

    Structural investigation of HIV-1 GP160 and gag-pol polyprotein recognition site with HIV-1 protease

    Get PDF
    HIV (human immunodeficiency virus) is an infectious virus that if left untreated can progress into AIDS (acquired immunodeficiency syndrome) and is a devastating and lethal disease with millions of related fatalities across the globe since its discovery. The viral particle is made up of multiple mature proteins which need to be processed for it to form. Two major precursor proteins are the polyprotein gag-pol and gp160, gag-pol is cleaved by the viral enzyme HIV-1 protease into the major structural proteins and all viral enzymes used by HIV and gp160 is cleaved by the host enzyme furin creating the transmembrane/cell surface protein complex gp41/gp120 responsible for the binding to host cells. Currently there are partial crystal structure available for gp160 but missing the furin cleavage site and no full structure of gag-pol is available, with partial structures made up of its constituent proteins being accesible. This study focuses on the areas of these proteins that are cleaved to create the key proteins used by HIV. Homology models were generated for the entirety of gag-pol and gp160 using the I-Tasser webserver. The area of interest in gp160, the loop cleaved by furin and the surrounding area, was refined via loop modelling. The gag-pol structure and the loop model were then further refined using molecular dynamic simulation. The area of particular interest on gag-pol, the cleave sites, was further investigated by binding HIV-1 protease to the 4 most conserved sites. The sites were analysed, and the best binding interaction was taken into further MD to observe it over time. This study overall has produced a structure of the gp160 furin cleavage site, the first structure of gagpol in its entirety simulated for 100ns as well as information on the docking of HIV-1 protease with gag-pol and observations of the binding interactions over time. This data creates a basis for further study into the processing of HIV proteins as well as a starting point in the design of possible targeted therapies aimed at preventing the cleavage of the precursor proteins

    Estimating evolutionary dynamics of cleavage site peptides among H5HA avian influenza employing mathematical information theory approaches

    Get PDF
    Estimating evolutionary conservation of cleavage site peptides among HA protein of all strains facilitates vaccine development against pandemic influenza. Conserved epitopes may be useful for diagnosis of animals infected with the influenza virus, and preventing their spread in other regions [ 1]. In the preliminary stage of this study, in silico analysis of hemagglutinin was applied to predict potential cleavage sites of each strain employing SigCleave [2] and SignalP 3.0 server [3]. The second stage of the study focused on analyzing the structure of connecting peptides of hemagglutinin cleavage sites based on the availability of the existing experimental data. Our result divulges higher frequency of base amino acids, essential for processing by the cellular protease, among pathogenic strains compared with non/low pathogenic strains. In addition, two complementary methods for identifying conserved amino acids were applied: statistical entropy based method, possibly the most sensitive tool to estimate the diversity of peptides [5], and relative entropy estimation. Analysis of both methods demonstrates that the connecting peptide of HA cleavage site of AIV in the United States were highly conserved over long periods of time. Entropy values aid to select those sequences that have the highest potential for mutation in a broad spectrum of avian population. Position 340 among our group of strains with the entropy value of 0.877928 has the highest bit of information value where highly conserved positions are those with

    An overview of bioinformatics tools for epitope prediction: Implications on vaccine development

    Get PDF
    AbstractExploitation of recombinant DNA and sequencing technologies has led to a new concept in vaccination in which isolated epitopes, capable of stimulating a specific immune response, have been identified and used to achieve advanced vaccine formulations; replacing those constituted by whole pathogen-formulations. In this context, bioinformatics approaches play a critical role on analyzing multiple genomes to select the protective epitopes in silico. It is conceived that cocktails of defined epitopes or chimeric protein arrangements, including the target epitopes, may provide a rationale design capable to elicit convenient humoral or cellular immune responses. This review presents a comprehensive compilation of the most advantageous online immunological software and searchable, in order to facilitate the design and development of vaccines. An outlook on how these tools are supporting vaccine development is presented. HIV and influenza have been taken as examples of promising developments on vaccination against hypervariable viruses. Perspectives in this field are also envisioned

    Studying protein-ligand interactions using a Monte Carlo procedure

    Get PDF
    [eng] Biomolecular simulations have been widely used in the study of protein-ligand interactions; comprehending the mechanisms involved in the prediction of binding affinities would have a significant repercussion in the pharmaceutical industry. Notwithstanding the intrinsic difficulty of sampling the phase space, hardware and methodological developments make computer simulations a promising candidate in the resolution of biophysically relevant problems. In this context, the objective of the thesis is the development of a protocol that permits studying protein-ligand interactions, in view to be applied in drug discovery pipelines. The author contributed to the rewriting PELE, our Monte Carlo sampling procedure, using good practices of software development. These involved testing, improving the readability, modularity, encapsulation, maintenance and version control, just to name a few. Importantly, the recoding resulted in a competitive cutting-edge software that is able to integrate new algorithms and platforms, such as new force fields or a graphical user interface, while being reliable and efficient. The rest of the thesis is built upon this development. At this point, we established a protocol of unbiased all-atom simulations using PELE, often combined with Markov (state) Models (MSM) to characterize the energy landscape exploration. In the thesis, we have shown that PELE is a suitable tool to map complex mechanisms in an accurate and efficient manner. For example, we successfully conducted studies of ligand migration in prolyl oligopeptidases and nuclear hormone receptors (NHRs). Using PELE, we could map the ligand migration and binding pathway in such complex systems in less than 48 hours. On the other hand, with this technique we often run batches of 100s of simulations to reduce the wall-clock time. MSM is a useful technique to join these independent simulations in a unique statistical model, as individual trajectories only need to characterize the energy landscape locally, and the global characterization can be extracted from the model. We successfully applied the combination of these two methodologies to quantify binding mechanisms and estimate the binding free energy in systems involving NHRs and tyorsinases. However, this technique represents a significant computational effort. To reduce the computational load, we developed a new methodology to overcome the sampling limitations caused by the ruggedness of the energy landscape. In particular, we used a procedure of iterative simulations with adaptive spawning points based on reinforcement learning ideas. This permits sampling binding mechanisms at a fraction of the cost, and represents a speedup of an order of magnitude in complex systems. Importantly, we show in a proof-of-concept that it can be used to estimate absolute binding free energies. Overall, we hope that the methodologies presented herein help streamline the drug design process.[spa] Las simulaciones biomoleculares se han usado ampliamente en el estudio de interacciones proteína-ligando. Comprender los mecanismos involucrados en la predicción de afinidades de unión tiene una gran repercusión en la industria farmacéutica. A pesar de las dificultades intrínsecas en el muestreo del espacio de fases, mejoras de hardware y metodológicas hacen de las simulaciones por ordenador un candidato prometedor en la resolución de problemas biofísicos con alta relevancia. En este contexto, el objetivo de la tesis es el desarrollo de un protocolo que introduce un estudio más eficiente de las interacciones proteína-ligando, con vistas a diseminar PELE, un procedimiento de muestreo de Monte Carlo, en el diseño de fármacos. Nuestro principal foco ha sido sobrepasar las limitaciones de muestreo causadas por la rugosidad del paisaje de energías, aplicando nuestro protocolo para hacer analsis detallados a nivel atomístico en receptores nucleares de hormonas, receptores acoplados a proteínas G, tirosinasas y prolil oligopeptidasas, en colaboración con una compañía farmacéutica y de varios laboratorios experimentales. Con todo ello, esperamos que las metodologías presentadas en esta tesis ayuden a mejorar el diseño de fármacos

    MYRbase: analysis of genome-wide glycine myristoylation enlarges the functional spectrum of eukaryotic myristoylated proteins

    Get PDF
    We evaluated the evolutionary conservation of glycine myristoylation within eukaryotic sequences. Our large-scale cross-genome analyses, available as MYRbase, show that the functional spectrum of myristoylated proteins is currently largely underestimated. We give experimental evidence for in vitro myristoylation of selected predictions. Furthermore, we classify five membrane-attachment factors that occur most frequently in combination with, or even replacing, myristoyl anchors, as some protein family examples show
    corecore