994 research outputs found

    Prediction of peptides binding to MHC class I alleles by partial periodic pattern mining

    Get PDF
    MHC (Major Histocompatibility Complex) is a key player in the immune response of an organism. It is important to be able to predict which antigenic peptides will bind to a spe-cific MHC allele and which will not, creating possibilities for controlling immune response and for the applications of immunotherapy. However a problem encountered in the computational binding prediction methods for MHC class I is the presence of bulges and loops in the peptides, changing the total length. Most machine learning methods in use to-day require the sequences to be of same length to success-fully mine the binding motifs. We propose the use of time-based data mining methods in motif mining to be able to mine motifs position-independently. Also, the information for both binding and non-binding peptides are used on the contrary to the other methods which only rely on binding peptides. The prediction results are between 70-80% for the tested alleles

    Discovering Valuable Items from Massive Data

    Full text link
    Suppose there is a large collection of items, each with an associated cost and an inherent utility that is revealed only once we commit to selecting it. Given a budget on the cumulative cost of the selected items, how can we pick a subset of maximal value? This task generalizes several important problems such as multi-arm bandits, active search and the knapsack problem. We present an algorithm, GP-Select, which utilizes prior knowledge about similarity be- tween items, expressed as a kernel function. GP-Select uses Gaussian process prediction to balance exploration (estimating the unknown value of items) and exploitation (selecting items of high value). We extend GP-Select to be able to discover sets that simultaneously have high utility and are diverse. Our preference for diversity can be specified as an arbitrary monotone submodular function that quantifies the diminishing returns obtained when selecting similar items. Furthermore, we exploit the structure of the model updates to achieve an order of magnitude (up to 40X) speedup in our experiments without resorting to approximations. We provide strong guarantees on the performance of GP-Select and apply it to three real-world case studies of industrial relevance: (1) Refreshing a repository of prices in a Global Distribution System for the travel industry, (2) Identifying diverse, binding-affine peptides in a vaccine de- sign task and (3) Maximizing clicks in a web-scale recommender system by recommending items to users

    Vaccine Development

    Get PDF
    Vaccination is the most effective and scientifically based means of protection against infectious diseases, especially in this era of the COVID-19 pandemic. This book examines several issues related to the development of vaccines against viral, bacterial, and parasitic infections

    Discovering discriminative and class-specific sequence and structural motifs in proteins

    Get PDF
    Finding recurring motifs is an important problem in bioinformatics. Such motifs can be used for any number of problems including sequence classi cation, label prediction, knowledge discovery and biological engineering of proteins t for a speci c purpose. Our motivation is to create a better foundation for the research and development of novel motif mining and machine learning methods that can extract class-speci c and discriminative motifs using both sequence and structural features. We propose the building blocks of a general machine learning framework to act on a biological input. This thesis present a combination of elements that are aimed to be applicable to a variety of biological problems. Ideally, the learner should only require a number of biological data instances as input that are classi- ed into a number of di erent classes as de ned by the researchers. The output should be the factors and motifs that discriminate between those classes (for reasonable, non-random class de nitions). This ideal work ow requires two main steps. First step is the representation of the biological input with features that contain the signi cant information the researcher is looking for. Due to the complexity of the macromolecules, abstract representations are required to convert the real world representation into quanti able descriptors that are suitable for motif mining and machine learning. The second step of the proposed work ow is the motif mining and knowledge discovery step. Using these informative representations, an algorithm should be able to nd discriminative, class-speci c motifs that are over-represented in one class and under-represented in the other. This thesis presents novel procedures for representation of the proteins to be used in a variety of machine learning algorithms, and two separate motif mining algorithms, one based on temporal motif mining, and the other on deep learning, that can work with the given biological data. The descriptors and the learners are applied to a wide range of computational problems encountered in life sciences

    Frequent associations between CTL and T-Helper epitopes in HIV-1 genomes and implications for multi-epitope vaccine designs

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Epitope vaccines have been suggested as a strategy to counteract viral escape and development of drug resistance. Multiple studies have shown that Cytotoxic T-Lymphocyte (CTL) and T-Helper (Th) epitopes can generate strong immune responses in Human Immunodeficiency Virus (HIV-1). However, not much is known about the relationship among different types of HIV epitopes, particularly those epitopes that can be considered potential candidates for inclusion in the multi-epitope vaccines.</p> <p>Results</p> <p>In this study we used association rule mining to examine relationship between different types of epitopes (CTL, Th and antibody epitopes) from nine protein-coding HIV-1 genes to identify strong associations as potent multi-epitope vaccine candidates. Our results revealed 137 association rules that were consistently present in the majority of reference and non-reference HIV-1 genomes and included epitopes of two different types (CTL and Th) from three different genes (<it>Gag, Pol </it>and <it>Nef</it>). These rules involved 14 non-overlapping epitope regions that frequently co-occurred despite high mutation and recombination rates, including in genomes of circulating recombinant forms. These epitope regions were also highly conserved at both the amino acid and nucleotide levels indicating strong purifying selection driven by functional and/or structural constraints and hence, the diminished likelihood of successful escape mutations.</p> <p>Conclusions</p> <p>Our results provide a comprehensive systematic survey of CTL, Th and Ab epitopes that are both highly conserved and co-occur together among all subtypes of HIV-1, including circulating recombinant forms. Several co-occurring epitope combinations were identified as potent candidates for inclusion in multi-epitope vaccines, including epitopes that are immuno-responsive to different arms of the host immune machinery and can enable stronger and more efficient immune responses, similar to responses achieved with adjuvant therapies. Signature of strong purifying selection acting at the nucleotide level of the associated epitopes indicates that these regions are functionally critical, although the exact reasons behind such sequence conservation remain to be elucidated.</p

    APPLICATION OF MACHINE LEARNING APPROACHES TO EMPOWER DRUG DEVELOPMENT

    Get PDF
    Human health, one of the major topics in Life Science, is facing intensified challenges, including cancer, pandemic outbreaks, and antimicrobial resistance. Thus, new medicines with unique advantages, including peptide-based vaccines and permeable small molecule antimicrobials, are in urgent need. However, the drug development process is long, complex, and risky with no guarantee of success. Also, the improvements in techniques applied in genomics, proteomics, computational biology, and clinical trials significantly increase the data complexity and volume, which imposes higher requirements on the drug development pipeline. In recent years, machine learning (ML) methods were employed to support drug development in various aspects and were shown to be highly effective. Here, we explored the application of advanced ML approaches to empower the development of peptide-based vaccines and permeable antimicrobials. First, the peptide-based vaccines targeting pancreatic cancer and COVID-19 were predicted and screened via multiple approaches. Next, novel structure-based methods to improve the performance of peptide: MHC binding affinity prediction were developed, including an HLA modeling pipeline that provides structures for docking-based peptide binder validation, and hierarchical clustering of HLA I into supertypes and subtypes that have similar peptide binding specificity. Finally, the physicochemical properties governing the permeability of small molecules into multidrug-resistant Pseudomonas aeruginosa cells were selected using a random forest model. In conclusion, the use of machine learning methods could accelerate the drug development process at a lower cost and promote data-based decision-making if used properly

    A Novel Empirical Free Energy Function That Explains And Predicts Protein–Protein Binding Affinities

    Get PDF
    A free energy function can be defined as a mathematical expression that relates macroscopic free energy changes to microscopic or molecular properties. Free energy functions can be used to explain and predict the affinity of a ligand for a protein and to score and discriminate between native and non-native binding modes. However, there is a natural tension between developing a function fast enough to solve the scoring problem but rigorous enough to explain and predict binding affinities. Here, we present a novel, physics-based free energy function that is computationally inexpensive, yet explanatory and predictive. The function results from a derivation that assumes the cost of polar desolvation can be ignored and that includes a unique and implicit treatment of interfacial water-bridged interactions. The function was parameterized on an internally consistent, high quality training set giving R 2 =0.97 and Q 2 =0.91. We used the function to blindly and successfully predict binding affinities for a diverse test set of 31 wild-type protein–protein and protein–peptide complexes (R 2 =0.79, rmsd=1.2 kcal mol−1). The function performed very well in direct comparison with a recently described knowledge-based potential and the function appears to be transferable. Our results indicate that our function is well suited for solving a wide range of protein/peptide design and discovery problems

    Induction of T-cell responses against mutation-specific peptides from malignant pediatric brain tumor samples

    Get PDF
    Medulloblastoma is the most common malignant brain tumor in childhood and adolescence and constitutes an important cause for cancer-related death in pediatric patients. Although standard therapy including surgery, chemotherapy and radiation can cure up to 80 % of average-risk patients, they imply severe cognitive long-term adverse effects and are unsatisfactory in advanced tumors. Therefore, alternative treatment strategies need to be established. Immunotherapeutic approaches like peptide vaccination and adoptive T-cell transfer (ATT) aim at enhancing self-protection through detection and elimination of malignant cells. Tumor-specific neoepitopes are promising targets for ATT as they are expressed exclusively by cancer tissue. Moreover, administration of mutation-derived peptide vaccines allows augmenting the endogenous immune response through abundant presentation of tumor antigen. In this proof-of-concept study we demonstrate a highly individualized approach where patient-specific neoepitopes are determined and tested for immunogenicity. Primary tumor samples from two pediatric medulloblastoma patients were analyzed in this project. Tumor-specific mutations were identified by next generation sequencing of tumor tissue and whole blood. Variants were confirmed by deep sequencing. In order to identify neoepitope peptides presented by the patients’ human leucocyte antigen (HLA) molecules, HLA binding affinity was predicted in silico by netMHC database. Respective peptides were synthesized and blood cells from healthy donors matching the patients’ HLA types were used to provide T lymphocytes and dendritic cells for antigen presentation. After seven restimulations in vitro, CD8+ cytotoxic T-cell reactivity against neoepitopes was assessed via flow-cytometric analysis of Interferon gamma and Tumor Necrosis Factor alpha release. A successful de novo T-cell response was induced for 9 of 19 tested peptides. In this proof-of-principle study we show that induction of a T-cell response against medullobastoma-derived neoantigens is feasible despite low mutational burden and low immunogenicity. In the future, this strategy can be used to synthesize individualized peptide cocktails for peptide vaccination or identify medulloblastoma-specific T-cell receptors for ATT. Long-term aims of this study are the identification of medulloblastoma/T-cell interaction and improvement of current treatment options for pediatric patients with advanced medulloblastoma
    corecore