413 research outputs found

    Public (Q)SAR Services, Integrated Modeling Environments, and Model Repositories on the Web: State of the Art and Perspectives for Future Development

    Get PDF
    © 2017 Wiley-VCH Verlag GmbH & Co. KGaA, WeinheimThousands of (Quantitative) Structure-Activity Relationships (Q)SAR models have been described in peer-reviewed publications; however, this way of sharing seldom makes models available for the use by the research community outside of the developer's laboratory. Conversely, on-line models allow broad dissemination and application representing the most effective way of sharing the scientific knowledge. Approaches for sharing and providing on-line access to models range from web services created by individual users and laboratories to integrated modeling environments and model repositories. This emerging transition from the descriptive and informative, but “static”, and for the most part, non-executable print format to interactive, transparent and functional delivery of “living” models is expected to have a transformative effect on modern experimental research in areas of scientific and regulatory use of (Q)SAR models

    Identifying disease-related expressions in reviews using conditional random fields

    Get PDF
    As the as the volume of user-generated content in social media expands so do the potential benefits of mining social media to learn about patient conditions, drug indications, and beneficial or adverse drug reactions. In this paper, we apply Conditional Random Fields (CRF) model for extracting expressions related to diseases from patient comments. Our method utilizes hand-crafted features including contextual features, dictionaries, clusterbased and distributed word representation generated from unlabeled user posts in social media. We compare our CRF-based approach with deep recurrent neural networks and a dictionary-based approach. We examine different word embeddings generated from unlabeled user posts in social media and scientific literature. We show that CRF outperformed other methods and achieved the F1-measures of 69.1% and 79.4% on recognition of disease-related expressions in the exact and partial matching exercises, respectively. Qualitative evaluation of disease-related expressions recognized by our feature-rich CRF-based approach demonstrates the variability of reactions from patients with different health conditions

    Analyzing the Systems Biology Effects of COVID-19 mRNA Vaccines to Assess Their Safety and Putative Side Effects

    Get PDF
    COVID-19 vaccines have been instrumental tools in reducing the impact of SARS-CoV-2 infections around the world by preventing 80% to 90% of hospitalizations and deaths from reinfection, in addition to preventing 40% to 65% of symptomatic illnesses. However, the simultaneous large-scale vaccination of the global population will indubitably unveil heterogeneity in immune responses as well as in the propensity to developing post-vaccine adverse events, especially in vulnerable individuals. Herein, we applied a systems biology workflow, integrating vaccine transcriptional signatures with chemogenomics, to study the pharmacological effects of mRNA vaccines. First, we derived transcriptional signatures and predicted their biological effects using pathway enrichment and network approaches. Second, we queried the Connectivity Map (CMap) to prioritize adverse events hypotheses. Finally, we accepted higher-confidence hypotheses that have been predicted by independent approaches. Our results reveal that the mRNA-based BNT162b2 vaccine affects immune response pathways related to interferon and cytokine signaling, which should lead to vaccine success, but may also result in some adverse events. Our results emphasize the effects of BNT162b2 on calcium homeostasis, which could be contributing to some frequently encountered adverse events related to mRNA vaccines. Notably, cardiac side effects were signaled in the CMap query results. In summary, our approach has identified mechanisms underlying both the expected protective effects of vaccination as well as possible post-vaccine adverse effects. Our study illustrates the power of systems biology approaches in improving our understanding of the comprehensive biological response to vaccination against COVID-19

    Critical assessment of QSAR models of environmental toxicity against Tetrahymena pyriformis: focusing on applicability domain and overfitting by variable selection

    Get PDF
    The estimation of the accuracy of predictions is a critical problem in QSAR modeling. The "distance to model" can be defined as a metric that defines the similarity between the training set molecules and the test set compound for the given property in the context of a specific model. It could be expressed in many different ways, e.g., using Tanimoto coefficient, leverage, correlation in space of models, etc. In this paper we have used mixtures of Gaussian distributions as well as statistical tests to evaluate six types of distances to models with respect to their ability to discriminate compounds with small and large prediction errors. The analysis was performed for twelve QSAR models of aqueous toxicity against T. pyriformis obtained with different machine-learning methods and various types of descriptors. The distances to model based on standard deviation of predicted toxicity calculated from the ensemble of models afforded the best results. This distance also successfully discriminated molecules with low and large prediction errors for a mechanism-based model developed using log P and the Maximum Acceptor Superdelocalizability descriptors. Thus, the distance to model metric could also be used to augment mechanistic QSAR models by estimating their prediction errors. Moreover, the accuracy of prediction is mainly determined by the training set data distribution in the chemistry and activity spaces but not by QSAR approaches used to develop the models. We have shown that incorrect validation of a model may result in the wrong estimation of its performance and suggested how this problem could be circumvented. The toxicity of 3182 and 48774 molecules from the EPA High Production Volume (HPV) Challenge Program and EINECS (European chemical Substances Information System), respectively, was predicted, and the accuracy of prediction was estimated. The developed models are available online at http://www.qspr.org site

    Beware of R 2 : Simple, Unambiguous Assessment of the Prediction Accuracy of QSAR and QSPR Models

    Get PDF
    The statistical metrics used to characterize the external predictivity of a model, i.e., how well it predicts the properties of an independent test set, have proliferated over the past decade. This paper clarifies some apparent confusion over the use of the coefficient of determination, R2, as a measure of model fit and predictive power in QSAR and QSPR modelling

    Interpreting random forest classification models using a feature contribution method

    Get PDF
    Model interpretation is one of the key aspects of the model evaluation process. The explanation of the relationship between model variables and outputs is relatively easy for statistical models, such as linear regressions, thanks to the availability of model parameters and their statistical significance . For “black box” models, such as random forest, this information is hidden inside the model structure. This work presents an approach for computing feature contributions for random forest classification models. It allows for the determination of the influence of each variable on the model prediction for an individual instance. By analysing feature contributions for a training dataset, the most significant variables can be determined and their typical contribution towards predictions made for individual classes, i.e., class-specific feature contribution “patterns”, are discovered. These patterns represent a standard behaviour of the model and allow for an additional assessment of the model reliability for new data. Interpretation of feature contributions for two UCI benchmark datasets shows the potential of the proposed methodology. The robustness of results is demonstrated through an extensive analysis of feature contributions calculated for a large number of generated random forest models

    Identifying a causal link between prolactin signaling pathways and COVID-19 vaccine-induced menstrual changes

    Get PDF
    COVID-19 vaccines have been instrumental tools in the fight against SARS-CoV-2 helping to reduce disease severity and mortality. At the same time, just like any other therapeutic, COVID-19 vaccines were associated with adverse events. Women have reported menstrual cycle irregularity after receiving COVID-19 vaccines, and this led to renewed fears concerning COVID-19 vaccines and their effects on fertility. Herein we devised an informatics workflow to explore the causal drivers of menstrual cycle irregularity in response to vaccination with mRNA COVID-19 vaccine BNT162b2. Our methods relied on gene expression analysis in response to vaccination, followed by network biology analysis to derive testable hypotheses regarding the causal links between BNT162b2 and menstrual cycle irregularity. Five high-confidence transcription factors were identified as causal drivers of BNT162b2-induced menstrual irregularity, namely: IRF1, STAT1, RelA (p65 NF-kB subunit), STAT2 and IRF3. Furthermore, some biomarkers of menstrual irregularity, including TNF, IL6R, IL6ST, LIF, BIRC3, FGF2, ARHGDIB, RPS3, RHOU, MIF, were identified as topological genes and predicted as causal drivers of menstrual irregularity. Our network-based mechanism reconstruction results indicated that BNT162b2 exerted biological effects similar to those resulting from prolactin signaling. However, these effects were short-lived and didn’t raise concerns about long-term infertility issues. This approach can be applied to interrogate the functional links between drugs/vaccines and other side effects

    Discrete Molecular Dynamics Distinguishes Nativelike Binding Poses from Decoys in Difficult Targets

    Get PDF
    Virtual screening is one of the major tools used in computer-aided drug discovery. In structure-based virtual screening, the scoring function is critical to identifying the correct docking pose and accurately predicting the binding affinities of compounds. However, the performance of existing scoring functions has been shown to be uneven for different targets, and some important drug targets have proven especially challenging. In these targets, scoring functions cannot accurately identify the native or near-native binding pose of the ligand from among decoy poses, which affects both the accuracy of the binding affinity prediction and the ability of virtual screening to identify true binders in chemical libraries. Here, we present an approach to discriminating native poses from decoys in difficult targets for which several scoring functions failed to correctly identify the native pose. Our approach employs Discrete Molecular Dynamics simulations to incorporate protein-ligand dynamics and the entropic effects of binding. We analyze a collection of poses generated by docking and find that the residence time of the ligand in the native and nativelike binding poses is distinctly longer than that in decoy poses. This finding suggests that molecular simulations offer a unique approach to distinguishing the native (or nativelike) binding pose from decoy poses that cannot be distinguished using scoring functions that evaluate static structures. The success of our method emphasizes the importance of protein-ligand dynamics in the accurate determination of the binding pose, an aspect that is not addressed in typical docking and scoring protocols
    corecore