16 research outputs found

    Predicting receptor-ligand pairs through kernel learning

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Regulation of cellular events is, often, initiated via extracellular signaling. Extracellular signaling occurs when a circulating ligand interacts with one or more membrane-bound receptors. Identification of receptor-ligand pairs is thus an important and specific form of PPI prediction.</p> <p>Results</p> <p>Given a set of disparate data sources (expression data, domain content, and phylogenetic profile) we seek to predict new receptor-ligand pairs. We create a combined kernel classifier and assess its performance with respect to the Database of Ligand-Receptor Partners (DLRP) 'golden standard' as well as the method proposed by Gertz <it>et al. </it>Among our findings, we discover that our predictions for the tgfĪ² family accurately reconstruct over 76% of the supported edges (0.76 recall and 0.67 precision) of the receptor-ligand bipartite graph defined by the DLRP "golden standard". In addition, for the tgfĪ² family, the combined kernel classifier is able to relatively improve upon the Gertz <it>et al. </it>work by a factor of approximately 1.5 when considering that our method has an <it>F</it>-measure of 0.71 while that of Gertz <it>et al. </it>has a value of 0.48.</p> <p>Conclusions</p> <p>The prediction of receptor-ligand pairings is a difficult and complex task. We have demonstrated that using kernel learning on multiple data sources provides a stronger alternative to the existing method in solving this task.</p

    ReLiance: a machine learning and literature-based prioritization of receptorā€”ligand pairings.

    Get PDF
    Motivation: The prediction of receptorā€”ligand pairings is an important area of research as intercellular communications are mediated by the successful interaction of these key proteins. As the exhaustive assaying of receptorā€”ligand pairs is impractical, a computational approach to predict pairings is necessary. We propose a workflow to carry out this interaction prediction task, using a text mining approach in conjunction with a state of the art prediction method, as well as a widely accessible and comprehensive dataset. Among several modern classifiers, random forests have been found to be the best at this prediction task. The training of this classifier was carried out using an experimentally validated dataset of Database of Ligand-Receptor Partners (DLRP) receptorā€”ligand pairs. New examples, co-cited with the training receptors and ligands, are then classified using the trained classifier. After applying our method, we find that we are able to successfully predict receptorā€”ligand pairs within the GPCR family with a balanced accuracy of 0.96. Upon further inspection, we find several supported interactions that were not present in the Database of Interacting Proteins (DIPdatabase). We have measured the balanced accuracy of our method resulting in high quality predictions stored in the available database ReLiance. Availability: http://homes.esat.kuleuven.be/?bioiuser/ReLianceDB/ index.php Contact: [email protected]; ernesto.iacucci@gmail. com Supplementary information: Supplementary data are available at Bioinformatics onlin

    Which clustering algorithm is better for predicting protein complexes?

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Protein-Protein interactions (PPI) play a key role in determining the outcome of most cellular processes. The correct identification and characterization of protein interactions and the networks, which they comprise, is critical for understanding the molecular mechanisms within the cell. Large-scale techniques such as pull down assays and tandem affinity purification are used in order to detect protein interactions in an organism. Today, relatively new high-throughput methods like yeast two hybrid, mass spectrometry, microarrays, and phage display are also used to reveal protein interaction networks.</p> <p>Results</p> <p>In this paper we evaluated four different clustering algorithms using six different interaction datasets. We parameterized the MCL, Spectral, RNSC and Affinity Propagation algorithms and applied them to six PPI datasets produced experimentally by Yeast 2 Hybrid (Y2H) and Tandem Affinity Purification (TAP) methods. The predicted clusters, so called protein complexes, were then compared and benchmarked with already known complexes stored in published databases.</p> <p>Conclusions</p> <p>While results may differ upon parameterization, the MCL and RNSC algorithms seem to be more promising and more accurate at predicting PPI complexes. Moreover, they predict more complexes than other reviewed algorithms in absolute numbers. On the other hand the spectral clustering algorithm achieves the highest valid prediction rate in our experiments. However, it is nearly always outperformed by both RNSC and MCL in terms of the geometrical accuracy while it generates the fewest valid clusters than any other reviewed algorithm. This article demonstrates various metrics to evaluate the accuracy of such predictions as they are presented in the text below. Supplementary material can be found at: <url>http://www.bioacademy.gr/bioinformatics/projects/ppireview.htm</url></p

    Computational Prediction and Prioritization of Receptor-Ligand Pairs (Computationele voorspelling en prioritisatie van receptor-ligand paren)

    No full text
    We have worked on the receptor-ligand pairing problem in three main studies. In our first study, using a LS-SVM classifier, we show that we are able to more aptly match members of the chemokine and tgfƟ families than a previously published method. Notably, we are able to achieve an increase in recall of 0.76 over the 0.44 for the matching of receptor-ligands in the tgfƟ family. In our subsequent study, we benchmarked several machine learning techniques, and essayed several parameters, on the receptior-ligand interaction prediction task. We found that we could reach a balanced accuracy of 0.84. In our final work, we produce a publicly available database of our results with respect to a text-based in silico prediction workflow. The resulting database, contains several key findings, particularly predictions in the GPCR family with a balanced accuracy of 0.96. The receptor-ligand prediction task is an essential one, as the challenge of predicting such pairs is an important issue in wet-labs, biotech, and pharmaceutical companies. Through several studies, we have determined the most appropriate methodology to predict the receptor-ligand pairs and have made available high-quality predictions at our ReLianceDB website (http://homes.esat.kuleuven.be/~bioiuser/ReLianceDB), a tool to aid in performing effective and targeted research.status: publishe

    Ontological characterization of high through-put biological data

    No full text
    A result of high-throughput experimentation is the demand to summarize and profile results in a meaningful and comparative form. Such experimentation often includes the production of a set of distinguished genes. For example, this distinguished set may correspond to a cluster of co-expressed genes over many conditions or a set of genes from a large scale yeast two-hybrid study. Understanding the biological relevance of this set will encompass annotation of the genes followed by investigation of shared properties found among these annotations. While the set of distinguished genes might have hundreds of annotations associated with them, only a portion of these annotations will represent meaningful aspects associated with the experiment. Identification of the meaningful aspects can be focused by application of a statistic to an annotation resource. One such annotation resource is Gene Ontology (GO), a controlled vocabulary which hierarchically structures annotation terms (classifications) onto which genes can be mapped. Given a distinguished set of genes and a classification, we wish to determine if the number of distinguished genes mapped to that classification is significantly greater or less than would be expected by chance. In estimating these probabilities, researchers have employed the hypergeometric model under differing frameworks. Assumptions made in these frameworks have ignored key issues regarding the mapping of genes to GO and have resulted in inaccurate p-values. Here we show how dynamic programming can be used to compute exact p-values for enrichment or depletion of a particular GO classification. This removes the necessity of approximating the statistics or p-values, as has been the common practice. We apply our methods to a dataset describing labour and compare p-values based on exact and approximate computations of several different statistics for measuring enrichment. We find significant disagreement between commonly emplo

    Mobilization of pre-existing polyclonal T cells specific to neoantigens but not self-antigens during treatment of a patient with melanoma with bempegaldesleukin and nivolumab

    No full text
    T cells that recognize self-antigens and mutated neoantigens are thought to mediate antitumor activity of immune checkpoint blockade (ICB) in melanoma. Few studies have analyzed self and neoantigen-specific T cell responses in patients responding to ICB. Here, we report a patient with metastatic melanoma who had a durable clinical response after treatment with the programmed cell death protein 1 inhibitor, nivolumab, combined with the first-in-class CD122-preferential interleukin-2 pathway agonist, bempegaldesleukin (BEMPEG, NKTR-214). We used a combination of antigen-specific T cell expansion and measurement of interferon-Ī³ secretion to identify multiple CD4+ and CD8+ T cell clones specific for neoantigens, lineage-specific antigens and cancer testis antigens in blood and tumor from this patient prior to and after therapy. Polyclonal CD4+ and CD8+ T cells specific to multiple neoantigens but not self-antigens were highly enriched in pretreatment tumor compared with peripheral blood. Neoantigen, but not self-antigen-specific T cell clones expanded in frequency in the blood during successful treatment. There was evidence of dramatic immune infiltration into the tumor on treatment, and a modest increase in the relative frequency of intratumoral neoantigen-specific T cells. These observations suggest that diverse CD8+ and CD4+ T cell clones specific for neoantigens present in tumor before treatment had a greater role in immune tumor rejection as compared with self-antigen-specific T cells in this patient. Trial registration number: NCT02983045

    Unraveling genomic variation from next generation sequencing data

    Get PDF
    Elucidating the content of a DNA sequence is critical to deeper understand and decode the genetic information for any biological system. As next generation sequencing (NGS) techniques have become cheaper and more advanced in throughput over time, great innovations and breakthrough conclusions have been generated in various biological areas. Few of these areas, which get shaped by the new technological advances, involve evolution of species, microbial mapping, population genetics, genome-wide association studies (GWAs), comparative genomics, variant analysis, gene expression, gene regulation, epigenetics and personalized medicine. While NGS techniques stand as key players in modern biological research, the analysis and the interpretation of the vast amount of data that gets produced is a not an easy or a trivial task and still remains a great challenge in the field of bioinformatics. Therefore, efficient tools to cope with information overload, tackle the high complexity and provide meaningful visualizations to make the knowledge extraction easier are essential. In this article, we briefly refer to the sequencing methodologies and the available equipment to serve these analyses and we describe the data formats of the files which get produced by them. We conclude with a thorough review of tools developed to efficiently store, analyze and visualize such data with emphasis in structural variation analysis and comparative genomics. We finally comment on their functionality, strengths and weaknesses and we discuss how future applications could further develop in this field
    corecore