19 research outputs found
FAIR data management: what does it mean for drug discovery?
The drug discovery community faces high costs in bringing safe and effective medicines to market, in part due to the rising volume and complexity of data which must be generated during the research and development process. Fully utilising these expensively created experimental and computational data resources has become a key aim of scientists due to the clear imperative to leverage the power of artificial intelligence (AI) and machine learning-based analyses to solve the complex problems inherent in drug discovery. In turn, AI methods heavily rely on the quantity, quality, consistency, and scope of underlying training data. While pre-existing preclinical and clinical data cannot fully replace the need for de novo data generation in a project, having access to relevant historical data represents a valuable asset, as its reuse can reduce the need to perform similar experiments, therefore avoiding a “reinventing the wheel” scenario. Unfortunately, most suitable data resources are often archived within institutes, companies, or individual research groups and hence unavailable to the wider community. Hence, enabling the data to be Findable, Accessible, Interoperable, and Reusable (FAIR) is crucial for the wider community of drug discovery and development scientists to learn from the work performed and utilise the findings to enhance comprehension of their own research outcomes. In this mini-review, we elucidate the utility of FAIR data management across the drug discovery pipeline and assess the impact such FAIR data has made on the drug development process
Causal reasoning over knowledge graphs leveraging drug-perturbed and disease-specific transcriptomic signatures for drug discovery
Network-based approaches are becoming increasingly popular for drug discovery as they provide a systems-level overview of the mechanisms underlying disease pathophysiology. They have demonstrated significant early promise over other methods of biological data representation, such as in target discovery, side effect prediction and drug repurposing. In parallel, an explosion of -omics data for the deep characterization of biological systems routinely uncovers molecular signatures of disease for similar applications. Here, we present RPath, a novel algorithm that prioritizes drugs for a given disease by reasoning over causal paths in a knowledge graph (KG), guided by both drug-perturbed as well as disease-specific transcriptomic signatures. First, our approach identifies the causal paths that connect a drug to a particular disease. Next, it reasons over these paths to identify those that correlate with the transcriptional signatures observed in a drug-perturbation experiment, and anti-correlate to signatures observed in the disease of interest. The paths which match this signature profile are then proposed to represent the mechanism of action of the drug. We demonstrate how RPath consistently prioritizes clinically investigated drug-disease pairs on multiple datasets and KGs, achieving better performance over other similar methodologies. Furthermore, we present two case studies showing how one can deconvolute the predictions made by RPath as well as predict novel targets.DDF, YG, AP, CWD, BBM, DH, JR, and VC have been funded by Enveda Biosciences. This work has been funded by Enveda Biosciences (https://www.envedabio.com/). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. SM and DRB received no specific funding for this work.Peer ReviewedPostprint (author's final draft
Comprehensive Fragment Screening of the SARS-CoV-2 Proteome Explores Novel Chemical Space for Drug Development
12 pags., 4 figs., 3 tabs.SARS-CoV-2 (SCoV2) and its variants of concern pose serious challenges to the public health. The variants increased challenges to vaccines, thus necessitating for development of new intervention strategies including anti-virals. Within the international Covid19-NMR consortium, we have identified binders targeting the RNA genome of SCoV2. We established protocols for the production and NMR characterization of more than 80 % of all SCoV2 proteins. Here, we performed an NMR screening using a fragment library for binding to 25 SCoV2 proteins and identified hits also against previously unexplored SCoV2 proteins. Computational mapping was used to predict binding sites and identify functional moieties (chemotypes) of the ligands occupying these pockets. Striking consensus was observed between NMR-detected binding sites of the main protease and the computational procedure. Our investigation provides novel structural and chemical space for structure-based drug design against the SCoV2 proteome.Work at BMRZ is supported by the state of Hesse. Work in Covid19-NMR
was supported by the Goethe Corona Funds, by the IWBEFRE-program 20007375 of state of Hesse, the DFG
through CRC902: “Molecular Principles of RNA-based regulation.” and through infrastructure funds (project
numbers: 277478796, 277479031, 392682309, 452632086, 70653611) and by European Union’s Horizon 2020 research and innovation program iNEXT-discovery under grant agreement No 871037. BY-COVID receives funding from the European Union’s Horizon Europe Research and Innovation Programme under grant agreement number 101046203. “INSPIRED” (MIS 5002550) project, implemented under the Action “Reinforcement of the Research and Innovation Infrastructure,” funded by the Operational
Program “Competitiveness, Entrepreneurship and Innovation” (NSRF 2014–2020) and co-financed by Greece and the EU (European Regional Development Fund) and the FP7 REGPOT CT-2011-285950—“SEE-DRUG” project (purchase of UPAT’s 700 MHz NMR equipment). The support of the CERM/CIRMMP center of Instruct-ERIC is gratefully acknowledged. This work has been funded in part by a grant of the Italian Ministry of University and Research (FISR2020IP_02112, ID-COVID) and by Fondazione CR
Firenze. A.S. is supported by the Deutsche Forschungsgemeinschaft [SFB902/B16, SCHL2062/2-1] and the Johanna Quandt Young Academy at Goethe [2019/AS01]. M.H. and C.F. thank SFB902 and the Stiftung Polytechnische Gesellschaft for the Scholarship. L.L. work was supported by the French National Research Agency (ANR, NMR-SCoV2-ORF8), the Fondation de la Recherche Médicale (FRM, NMR-SCoV2-ORF8), FINOVI and the IR-RMN-THC Fr3050 CNRS. Work at UConn Health was supported by grants from the US National Institutes of Health (R01 GM135592 to B.H., P41 GM111135 and R01 GM123249 to J.C.H.) and the US National Science Foundation (DBI 2030601 to J.C.H.). Latvian Council of Science Grant No. VPP-COVID-2020/1-0014. National Science Foundation EAGER MCB-2031269. This work was supported by the grant Krebsliga KFS-4903-08-2019 and SNF-311030_192646 to J.O. P.G. (ITMP) The EOSC Future project is co-funded by the European Union Horizon Programme call INFRAEOSC-03-2020—Grant Agreement
Number 101017536. Open Access funding enabled and organized by Projekt DEALPeer reviewe
cthoyt/chembl-downloader: v0.4.3
What's Changed
Refresh datasets from Cortés-Ciriano by @cthoyt in https://github.com/cthoyt/chembl-downloader/pull/9
Update README.md by @YojanaGadiya in https://github.com/cthoyt/chembl-downloader/pull/5
Update cortes-ciriano-refresh.ipynb by @rflameiro in https://github.com/cthoyt/chembl-downloader/pull/12
New Contributors
@YojanaGadiya made their first contribution in https://github.com/cthoyt/chembl-downloader/pull/5
@rflameiro made their first contribution in https://github.com/cthoyt/chembl-downloader/pull/12
Full Changelog: https://github.com/cthoyt/chembl-downloader/compare/v0.4.2...v0.4.
cthoyt/chembl-downloader: v0.4.2
Full Changelog: https://github.com/cthoyt/chembl-downloader/compare/v0.4.1...v0.4.
Pharmacophore-based ML model to predict ligand selectivity for E3 ligase binders
E3 ligases are enzymes that play a critical role in ubiquitin-mediated protein degradation and are involved in various cellular processes. Pharmacophore analysis is a useful approach for predicting E3 ligase binding selectivity, which involves identifying key chemical features necessary for a ligand to interact with a specific protein target cavity. While pharmacophore analysis is not always sufficient to accurately predict ligand binding affinity, it can be a valuable tool for filtering and/or designing focused libraries for screening campaigns. In this study, we present a fast and inexpensive approach using a pharmacophore fingerprinting scheme known as ErG, which is used in a multiclass machine learning classification model. This model can assign the correct E3 ligase binder to its known E3 ligase and predict the probability of each molecule to bind to different E3 ligases. Practical applications of this approach are demonstrated on commercial libraries for rational design of E3 ligase binders
Pharmaceutical patent landscaping: A novel approach to understand patents from the drug discovery perspective
Patents play a crucial role in the drug discovery process by providing legal protection for discoveries and incentivising investments in research and development. By identifying patterns within patent data resources, researchers can gain insight into the market trends and priorities of the pharmaceutical and biotechnology industries, as well as provide additional perspectives on more fundamental aspects such as the emergence of potential new drug targets. In this paper, we used the patent enrichment tool, PEMT, to extract, integrate, and analyse patent literature for rare diseases (RD) and Alzheimer's disease (AD). This is followed by a systematic review of the underlying patent landscape to decipher trends and applications in patents for these diseases. To do so, we discuss prominent organisations involved in drug discovery research in AD and RD. This allows us to gain an understanding of the importance of AD and RD from specific organisational (pharmaceutical or university) perspectives. Next, we analyse the historical focus of patents in relation to individual therapeutic targets and correlate them with market scenarios allowing the identification of prominent targets for a disease. Lastly, we identified drug repurposing activities within the two diseases with the help of patents. This resulted in identifying existing repurposed drugs and novel potential therapeutic approaches applicable to the indication areas. The study demonstrates the expanded applicability of patent documents from legal to drug discovery, design, and research, thus, providing a valuable resource for future drug discovery efforts. Moreover, this study is an attempt towards understanding the importance of data underlying patent documents and raising the need for preparing the data for machine learning-based applications
BioDataFuse: Enhancing Data Interoperability through Modular Queries and Knowledge Graph Construction
In biological research, integrating experimental data with publicly available resources is pivotal for understanding complex biological mechanisms. However, this process is often intricate and time-consuming due to the complexity and diversity of data. Furthermore, the lack of consistent harmonization across different data types complicates the management of disparate data formats and sources. Addressing this, we introduce BioDataFuse, a query-based Python tool for seamless integration of biomedical data resources. BioDataFuse establishes a modular framework for efficient data wrangling, enabling context-specific knowledge graph creation and supporting graph-based analyses. With a user-friendly interface, it enables users to dynamically create knowledge graphs from their input experimental data. Supported by a robust Python package, pyBiodatafuse, this tool excels in data harmonization, aggregating diverse sources through modular queries. Moreover, BioDataFuse provides plugin capabilities for Cytoscape and Neo4j, allowing local graph hosting. Ongoing refinements enhance the graph utility through tasks like link prediction, making BioDataFuse a versatile solution for efficient and effective biological data integration
From spreadsheet lab data templates to knowledge graphs:A FAIR data journey in the domain of AMR research
While awareness of FAIR (Findable, Accessible, Interoperable, and Reusable) principles has expanded across diverse domains, there remains a notable absence of impactful narratives regarding the practical application of FAIR data. This gap is particularly evident in the context of in-vitro and in-vivo experimental studies associated with the drug discovery and development process. Despite the structured nature of these data, reliance on classic methods such as spreadsheet-based visualization and analysis has limited the long-term reuse opportunities for such datasets. In response to this challenge, our work presents a representative journey towards FAIR data, characterized by structured, conventional spreadsheet-based lab data templates and the adoption of a knowledge graph framework for breaking data silos in the field of early antimicrobial resistance research. Here, we illustrate a tailored application of a "FAIRification framework" facilitating the practical implementation of FAIR principles. By showcasing the feasibility and benefits of transitioning to FAIR data practices, our work aims to encourage broader adoption and integration of FAIR principles within a research lab setting
Modern drug discovery using ethnobotany: A large-scale cross-cultural analysis of traditional medicine reveals common therapeutic uses
Summary: For millennia, numerous cultures and civilizations have relied on traditional remedies derived from plants to treat a wide range of conditions and ailments. Here, we systematically analyzed ethnobotanical patterns across taxonomically related plants, demonstrating that congeneric medicinal plants are more likely to be used for treating similar indications. Next, we reconstructed the phytochemical space covered by medicinal plants to reveal that (i)Â taxonomically related medicinal plants cover a similar phytochemical space, and (ii) chemical similarity correlates with similar therapeutic usage. Lastly, we present several case scenarios illustrating how mining this information can be used for drug discovery applications, including: (i)Â investigating taxonomic hotspots around particular indications, (ii) exploring shared patterns of congeneric plants located in different geographic areas, but which have been used to treat the same indications, and (iii) showing the concordance between ethnobotanical patterns among non-taxonomically related plants and the presence of shared bioactive phytochemicals