38 research outputs found

    Ad hoc learning of peptide fragmentation from mass spectra enables an interpretable detection of phosphorylated and cross-linked peptides

    Get PDF
    Mass spectrometry-based proteomics provides a holistic snapshot of the entire protein set of living cells on a molecular level. Currently, only a few deep learning approaches exist that involve peptide fragmentation spectra, which represent partial sequence information of proteins. Commonly, these approaches lack the ability to characterize less studied or even unknown patterns in spectra because of their use of explicit domain knowledge. Here, to elevate unrestricted learning from spectra, we introduce ‘ad hoc learning of fragmentation’ (AHLF), a deep learning model that is end-to-end trained on 19.2 million spectra from several phosphoproteomic datasets. AHLF is interpretable, and we show that peak-level feature importance values and pairwise interactions between peaks are in line with corresponding peptide fragments. We demonstrate our approach by detecting post-translational modifications, specifically protein phosphorylation based on only the fragmentation spectrum without a database search. AHLF increases the area under the receiver operating characteristic curve (AUC) by an average of 9.4% on recent phosphoproteomic data compared with the current state of the art on this task. Furthermore, use of AHLF in rescoring search results increases the number of phosphopeptide identifications by a margin of up to 15.1% at a constant false discovery rate. To show the broad applicability of AHLF, we use transfer learning to also detect cross-linked peptides, as used in protein structure analysis, with an AUC of up to 94%

    Understanding the limits of animal models as predictors of human biology: lessons learned from the sbv IMPROVER Species Translation Challenge

    Get PDF
    Motivation: Inferring how humans respond to external cues such as drugs, chemicals, viruses or hormones is an essential question in biomedicine. Very often, however, this question cannot be addressed because it is not possible to perform experiments in humans. A reasonable alternative consists of generating responses in animal models and ‘translating' those results to humans. The limitations of such translation, however, are far from clear, and systematic assessments of its actual potential are urgently needed. sbv IMPROVER (systems biology verification for Industrial Methodology for PROcess VErification in Research) was designed as a series of challenges to address translatability between humans and rodents. This collaborative crowd-sourcing initiative invited scientists from around the world to apply their own computational methodologies on a multilayer systems biology dataset composed of phosphoproteomics, transcriptomics and cytokine data derived from normal human and rat bronchial epithelial cells exposed in parallel to 52 different stimuli under identical conditions. Our aim was to understand the limits of species-to-species translatability at different levels of biological organization: signaling, transcriptional and release of secreted factors (such as cytokines). Participating teams submitted 49 different solutions across the sub-challenges, two-thirds of which were statistically significantly better than random. Additionally, similar computational methods were found to range widely in their performance within the same challenge, and no single method emerged as a clear winner across all sub-challenges. Finally, computational methods were able to effectively translate some specific stimuli and biological processes in the lung epithelial system, such as DNA synthesis, cytoskeleton and extracellular matrix, translation, immune/inflammation and growth factor/proliferation pathways, better than the expected response similarity between species. Contact: [email protected] or [email protected] Supplementary information: Supplementary data are available at Bioinformatics onlin

    Setting the basis of best practices and standards for curation and annotation of logical models in biology

    Get PDF
    International audienceThe fast accumulation of biological data calls for their integration, analysis and exploitation through more systematic approaches. The generation of novel, relevant hypotheses from this enormous quantity of data remains challenging. Logical models have long been used to answer a variety of questions regarding the dynamical behaviours of regulatory networks. As the number of published logical models increases, there is a pressing need for systematic model annotation, referencing and curation in community-supported and standardised formats. This article summarises the key topics and future directions of a meeting entitled ‘Annotation and curation of computational models in biology’, organised as part of the 2019 [BC]2 conference. The purpose of the meeting was to develop and drive forward a plan towards the standardised annotation of logical models, review and connect various ongoing projects of experts from different communities involved in the modelling and annotation of molecular biological entities, interactions, pathways and models. This article defines a roadmap towards the annotation and curation of logical models, including milestones for best practices and minimum standard requirements

    Optimization of logical networks for the modelling of cancer signalling pathways

    Get PDF
    Cancer is one of the main causes of death throughout the world. The survival of patients diagnosed with various cancer types remains low despite the numerous progresses of the last decades. Some of the reasons for this unmet clinical need are the high heterogeneity between patients, the differentiation of cancer cells within a single tumor, the persistence of cancer stem cells, and the high number of possible clinical phenotypes arising from the combination of the genetic and epigenetic insults that confer to cells the functional characteristics enabling them to proliferate, evade the immune system and programmed cell death, and give rise to neoplasms. To identify new therapeutic options, a better understanding of the mechanisms that generate and maintain these functional characteristics is needed. As many of the alterations that characterize cancerous lesions relate to the signaling pathways that ensure the adequacy of cellular behavior in a specific micro-environment and in response to molecular cues, it is likely that increased knowledge about these signaling pathways will result in the identification of new pharmacological targets towards which new drugs can be designed. As such, the modeling of the cellular regulatory networks can play a prominent role in this understanding, as computational modeling allows the integration of large quantities of data and the simulation of large systems. Logical modeling is well adapted to the large-scale modeling of regulatory networks. Different types of logical network modeling have been used successfully to study cancer signaling pathways and investigate specific hypotheses. In this work we propose a Dynamic Bayesian Network framework to contextualize network models of signaling pathways. We implemented FALCON, a Matlab toolbox to formulate the parametrization of a prior-knowledge interaction network given a set of biological measurements under different experimental conditions. The FALCON toolbox allows a systems-level analysis of the model with the aim of identifying the most sensitive nodes and interactions of the inferred regulatory network and point to possible ways to modify its functional properties. The resulting hypotheses can be tested in the form of virtual knock-out experiments. We also propose a series of regularization schemes, materializing biological assumptions, to incorporate relevant research questions in the optimization procedure. These questions include the detection of the active signaling pathways in a specific context, the identification of the most important differences within a group of cell lines, or the time-frame of network rewiring. We used the toolbox and its extensions on a series of toy models and biological examples. We showed that our pipeline is able to identify cell type-specific parameters that are predictive of drug sensitivity, using a regularization scheme based on local parameter densities in the parameter space. We applied FALCON to the analysis of the resistance mechanism in A375 melanoma cells adapted to low doses of a TNFR agonist, and we accurately predict the re-sensitization and successful induction of apoptosis in the adapted cells via the silencing of XIAP and the down-regulation of NFkB. We further point to specific drug combinations that could be applied in the clinics. Overall, we demonstrate that our approach is able to identify the most relevant changes between sensitive and resistant cancer clones

    Evaluating Symbolic AI as a Tool to Understand Cell Signalling

    Get PDF
    The diverse and highly complex nature of modern phosphoproteomics research produces a high volume of data. Chemical phosphoproteomics especially, is amenable to a variety of analytical approaches. In this thesis we evaluate novel Symbolic AI based algorithms as potential tools in the analysis of cell signalling. Initially we developed a first order deductive, logic-based model. This allowed us to identify previously unreported inhibitor-kinase relationships which could offer novel therapeutic targets for further investigation. Following this we made use of the probabilistic reasoning of ProbLog to augment the aforementioned Prolog based model with an intuitively calculated degree of belief. This allowed us to rank previous associations while also further increasing our confidence in already established predictions. Finally we applied our methodology to a Saccharomyces cerevisiae gene perturbation, phosphoproteomics dataset. In this context we were able to confirm the majority of ground truths, i.e. gene deletions as having taken place as intended. For the remaining deletions, again using a purely symbolic based approach we were able to provide predictions on the rewiring of kinase based signalling networks following kinase encoding gene deletions. The explainable, human readable and white-box nature of this approach were highlighted, however its brittleness due to missing, inconsistent or conflicting background knowledge was also examined

    Computational Modeling and Reverse Engineering to Reveal Dominant Regulatory Interactions Controlling Osteochondral Differentiation: Potential for Regenerative Medicine

    Get PDF
    The specialization of cartilage cells, or chondrogenic differentiation, is an intricate and meticulously regulated process that plays a vital role in both bone formation and cartilage regeneration. Understanding the molecular regulation of this process might help to identify key regulatory factors that can serve as potential therapeutic targets, or that might improve the development of qualitative and robust skeletal tissue engineering approaches. However, each gene involved in this process is influenced by a myriad of feedback mechanisms that keep its expression in a desirable range, making the prediction of what will happen if one of these genes defaults or is targeted with drugs, challenging. Computer modeling provides a tool to simulate this intricate interplay from a network perspective. This paper aims to give an overview of the current methodologies employed to analyze cell differentiation in the context of skeletal tissue engineering in general and osteochondral differentiation in particular. In network modeling, a network can either be derived from mechanisms and pathways that have been reported in the literature (knowledge-based approach) or it can be inferred directly from the data (data-driven approach). Combinatory approaches allow further optimization of the network. Once a network is established, several modeling technologies are available to interpret dynamically the relationships that have been put forward in the network graph (implication of the activation or inhibition of certain pathways on the evolution of the system over time) and to simulate the possible outcomes of the established network such as a given cell state. This review provides for each of the aforementioned steps (building, optimizing, and modeling the network) a brief theoretical perspective, followed by a concise overview of published works, focusing solely on applications related to cell fate decisions, cartilage differentiation and growth plate biology. Particular attention is paid to an in-house developed example of gene regulatory network modeling of growth plate chondrocyte differentiation as all the aforementioned steps can be illustrated. In summary, this paper discusses and explores a series of tools that form a first step toward a rigorous and systems-level modeling of osteochondral differentiation in the context of regenerative medicine

    Sparse graphical models for cancer signalling

    Get PDF
    Protein signalling networks play a key role in cellular function, and their dysregulation is central to many diseases, including cancer. Recent advances in biochemical technology have begun to allow high-throughput, data-driven studies of signalling. In this thesis, we investigate multivariate statistical methods, rooted in sparse graphical models, aimed at probing questions in cancer signalling. First, we propose a Bayesian variable selection method for identifying subsets of proteins that jointly in uence an output of interest, such as drug response. Ancillary biological information is incorporated into inference using informative prior distributions. Prior information is selected and weighted in an automated manner using an empirical Bayes formulation. We present examples of informative pathway and network-based priors, and illustrate the proposed method on both synthetic and drug response data. Second, we use dynamic Bayesian networks to perform structure learning of context-specific signalling network topology from proteomic time-course data. We exploit a connection between variable selection and network structure learning to efficiently carry out exact inference. Existing biology is incorporated using informative network priors, weighted automatically by an empirical Bayes approach. The overall approach is computationally efficient and essentially free of user-set parameters. We show results from an empirical investigation, comparing the approach to several existing methods, and from an application to breast cancer cell line data. Hypotheses are generated regarding novel signalling links, some of which are validated by independent experiments. Third, we describe a network-based clustering approach for the discovery of cancer subtypes that differ in terms of subtype-specific signalling network structure. Model-based clustering is combined with penalised likelihood estimation of undirected graphical models to allow simultaneous learning of cluster assignments and cluster-specific network structure. Results are shown from an empirical investigation comparing several penalisation regimes, and an application to breast cancer proteomic data
    corecore