Search CORE

1,044 research outputs found

The development of models to predict melting and pyrolysis point data associated with several hundred thousand compounds mined from PATENTS

Author: Antony J. Williams
Daniel M. Lowe
Igor V. Tetko
Publication venue: Springer Nature
Publication date: 01/01/2016
Field of study

BACKGROUND: Melting point (MP) is an important property in regards to the solubility of chemical compounds. Its prediction from chemical structure remains a highly challenging task for quantitative structure-activity relationship studies. Success in this area of research critically depends on the availability of high quality MP data as well as accurate chemical structure representations in order to develop models. Currently, available datasets for MP predictions have been limited to around 50k molecules while lots more data are routinely generated following the synthesis of novel materials. Significant amounts of MP data are freely available within the patent literature and, if it were available in the appropriate form, could potentially be used to develop predictive models. RESULTS: We have developed a pipeline for the automated extraction and annotation of chemical data from published PATENTS. Almost 300,000 data points have been collected and used to develop models to predict melting and pyrolysis (decomposition) points using tools available on the OCHEM modeling platform (http://ochem.eu). A number of technical challenges were simultaneously solved to develop models based on these data. These included the handing of sparse data matrices with >200,000,000,000 entries and parallel calculations using 32 × 6 cores per task using 13 descriptor sets totaling more than 700,000 descriptors. We showed that models developed using data collected from PATENTS had similar or better prediction accuracy compared to the highly curated data used in previous publications. The separation of data for chemicals that decomposed rather than melting, from compounds that did undergo a normal melting transition, was performed and models for both pyrolysis and MPs were developed. The accuracy of the consensus MP models for molecules from the drug-like region of chemical space was similar to their estimated experimental accuracy, 32 °C. Last but not least, important structural features related to the pyrolysis of chemicals were identified, and a model to predict whether a compound will decompose instead of melting was developed. CONCLUSIONS: We have shown that automated tools for the analysis of chemical information have reached a mature stage allowing for the extraction and collection of high quality data to enable the development of structure-activity relationship models. The developed models and data are publicly available at http://ochem.eu/article/99826

Springer - Publisher Connector

PubMed Central

PuSH

FigShare

Impact Fracture of Composite and Homogeneous Nanoagglomerates

Author: J. Musadaidzwa
R. A. Williams
R. Moreno-Atanasio
S. J. Antony
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2008
Field of study

It is not yet clear on whether the fracture characteristics of structured composite capsules and homogeneous nanoagglomerates differ significantly under impact loading conditions. Experimental measurement of impact fracture properties of such small agglomerates is difficult, due to the length and time scales associated with this problem. Using computer simulations, here we show that nanoagglomerates are subjected to normal impact loading fracture within a few nanoseconds in a brittle manner. The restitution coefficient of the nanoagglomerates varies nonlinearly with initial kinetic energy. The fracture of nanoagglomerates does not always happen at the moment when they experience the maximum wall force, but occurs after a time lag of a few nanoseconds as characterised by impact survival time (IST) and IST index. IST is dependant on the initial kinetic energy, mechanical and geometric properties of the nanoagglomerates. For identical geometries of the capsules, IST index is higher for capsules with a soft shell than for these with a hard shell, an indication of the enhanced ability of the soft nanocapsules to dissipate impact energy. The DEM simulations reported here based on theories of contact mechanics provide fundamental insights on the fracture behaviour of agglomerates—at nanoscale, the structure of the agglomerates significantly influences their breakage behaviour

Crossref

Heriot Watt Pure

Directory of Open Access Journals

Small-molecule Bioactivity Databases

Author: Alex M. Clark
Antony J. Williams
Barry A. Bunin
Christopher Southan
Sean Ekins
Publication venue: 'Royal Society of Chemistry (RSC)'
Publication date: 01/01/2016
Field of study

Crossref

Edinburgh Research Explorer

Supporting non-target identification by adding hydrogen deuterium exchange MS/MS capabilities to MetFrag

Author: Hollender Juliane
Krauss Martin
Neumann Steffen
Ruttkies Christoph
Schymanski Emma
Strehmel Nadine
Williams Antony J.
Publication venue
Publication date: 01/01/2019
Field of study

Liquid chromatography coupled with high-resolution mass spectrometry (LC-HRMS) is increasingly popular for the non-targeted exploration of complex samples, where tandem mass spectrometry (MS/MS) is used to characterize the structure of unknown compounds. However, mass spectra do not always contain sufficient information to unequivocally identify the correct structure. This study investigated how much additional information can be gained using hydrogen deuterium exchange (HDX) experiments. The exchange of “easily exchangeable” hydrogen atoms (connected to heteroatoms), with predominantly [M+D]+ ions in positive mode and [M-D]− in negative mode was observed. To enable high-throughput processing, new scoring terms were incorporated into the in silico fragmenter MetFrag. These were initially developed on small datasets and then tested on 762 compounds of environmental interest. Pairs of spectra (normal and deuterated) were found for 593 of these substances (506 positive mode, 155 negative mode spectra). The new scoring terms resulted in 29 additional correct identifications (78 vs 49) for positive mode and an increase in top 10 rankings from 80 to 106 in negative mode. Compounds with dual functionality (polar head group, long apolar tail) exhibited dramatic retention time (RT) shifts of up to several minutes, compared with an average 0.04 min RT shift. For a smaller dataset of 80 metabolites, top 10 rankings improved from 13 to 24 (positive mode, 57 spectra) and from 14 to 31 (negative mode, 63 spectra) when including HDX information. The results of standard measurements were confirmed using targets and tentatively identified surfactant species in an environmental sample collected from the river Danube near Novi Sad (Serbia). The changes to MetFrag have been integrated into the command line version available at http://c-ruttkies.github.io/MetFrag and all resulting spectra and compounds are available in online resources and in the Electronic Supplementary Material (ESM)

Repository for Publications and Research Data

Open Repository and Bibliography - Luxembourg

Automatic vs. manual curation of a multi-source chemical dictionary: the impact on text mining

Author: Antony J Williams
Erik M van Mulligen
Jan A Kors
Jos Kleinjans
KM Hettne
Kristina M Hettne
Valery Tkachenko
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Background. Previously, we developed a combined dictionary dubbed Chemlist for the identification of small molecules and drugs in text based on a number of publicly available databases and tested it on an annotated corpus. To achieve an acceptable recall and precision we used a number of automatic and semi-automatic processing steps together with disambiguation rules. However, it remained to be investigated which impact an extensive manual curation of a multi-source chemical dictionary would have on chemical term identification in text. ChemSpider is a chemical database that has undergone extensive manual curation aimed at establishing valid chemical name-to-structure relationships. Results. We acquired the component of ChemSpider containing only manually curated names and synonyms. Rule-based term filtering, semi-automatic manual curation, and disambiguation rules were applied. We tested the dictionary from ChemSpider on an annotated corpus and compared the results with those for the Chemlist dictionary. The ChemSpider dictionary of ca. 80 k names was only a 1/3 to a 1/4 the size of Chemlist at around 300 k. The ChemSpider dictionary had a precision of 0.43 and a recall of 0.19 before the application of filtering and disambiguation and a precision of 0.87 and a recall of 0.19 after filtering and disambiguation. The Chemlist dictionary had a precision of 0.20 and a recall of 0.47 before the application of filtering and disambiguation and a precision of 0.67 and a recall of 0.40 after filtering and disambiguation. Conclusions. We conclude the following: (1) The ChemSpider dictionary achieved the best precision but the Chemlist dictionary had a higher recall and the best F-score; (2) Rule-based filtering and disambiguation is necessary to achieve a high precision for both the automatically generated and the manually curated dictionary. ChemSpider is available as a web service at http://www.chemspider. com/ and the Chemlist dictionary is freely available as an XML file in Simple Knowledge Organization System format on the web at http://www.biosemantics.org/ chemlist

Maastricht University Research Portal

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

EUR Research Repository

Leiden University Scholary Publications

Erasmus University Digital Repository

Broad external validation of a multivariable risk prediction model for gastrointestinal malignancy in iron deficiency anaemia.

Author: Almilaji Orouba
Chapman Thomas P
Docherty Sharon
Ellis Antony J
Hebden John
Maynard Alec
Shine Brian SF
Snook Jonathon
Webb Gwilym
Williams Elizabeth J
Publication venue: Diagn Progn Res
Publication date: 15/12/2021
Field of study

BACKGROUND: Using two large datasets from Dorset, we previously reported an internally validated multivariable risk model for predicting the risk of GI malignancy in IDA-the IDIOM score. The aim of this retrospective observational study was to validate the IDIOM model using two independent external datasets. METHODS: The external validation datasets were collected, in a secondary care setting, by different investigators from cohorts in Oxford and Sheffield derived under different circumstances, comprising 1117 and 474 patients with confirmed IDA respectively. The data were anonymised prior to analysis. The predictive performance of the original model was evaluated by estimating measures of calibration, discrimination and clinical utility using the validation datasets. RESULTS: The discrimination of the original model using the external validation data was 70% (95% CI 65, 75) for the Oxford dataset and 70% (95% CI 61, 79) for the Sheffield dataset. The analysis of mean, weak, flexible and across the risk groups' calibration showed no tendency for under or over-estimated risks in the combined validation data. Decision curve analysis demonstrated the clinical value of the IDIOM model with a net benefit that is higher than 'investigate all' and 'investigate no-one' strategies up to a threshold of 18% in the combined validation data, using a risk cut-off of around 1.2% to categorise patients into the very low risk group showed that none of the patients stratified in this risk group proved to have GI cancer on investigation in the validation datasets. CONCLUSION: This external validation exercise has shown promising results for the IDIOM model in predicting the risk of underlying GI malignancy in independent IDA datasets collected in different clinical settings

LSHTM Research Online

PubMed Central

Bournemouth University Research Online

Apollo (Cambridge)

Recommended from our members

Broad external validation of a multivariable risk prediction model for gastrointestinal malignancy in iron deficiency anaemia.

Author: Almilaji Orouba
Chapman Thomas P
Docherty Sharon
Ellis Antony J
Hebden John
Maynard Alec
Shine Brian SF
Snook Jonathon
Webb Gwilym
Williams Elizabeth J
Publication venue: Diagn Progn Res
Publication date: 28/01/2022
Field of study

Apollo (Cambridge)

Connecting environmental exposure and neurodegeneration using cheminformatics and high resolution mass spectrometry: potential and challenges

Author: Baker Nancy C.
Balling Rudi
Kolber Pierre Luc
Krüger Rejko
Linster Carole
Paczia Nicole
Schymanski Emma
Singh Randolph
Trezzi Jean-Pierre
Williams Antony J
Wilmes Paul
Publication venue
Publication date: 01/09/2019
Field of study

Connecting chemical exposures over a lifetime to complex chronic diseases with multifactorial causes such as neurodegenerative diseases is an immense challenge requiring a long-term, interdisciplinary approach. Rapid developments in analytical and data technologies, such as non-target high resolution mass spectrometry (NT-HR-MS), have opened up new possibilities to accomplish this, inconceivable 20 years ago. While NT-HR-MS is being applied to increasingly complex research questions, there are still many unidentified chemicals and uncertainties in linking exposures to human health outcomes and environmental impacts. In this perspective, we explore the possibilities and challenges involved in using cheminformatics and NT-HR-MS to answer complex questions that cross many scientific disciplines, taking the identification of potential (small molecule) neurotoxicants in environmental or biological matrices as a case study. We explore capturing literature knowledge and patient exposure information in a form amenable to high-throughput data mining, and the related cheminformatic challenges. We then briefly cover which sample matrices are available, which method(s) could potentially be used to detect these chemicals in various matrices and what remains beyond the reach of NT-HR-MS. We touch on the potential for biological validation systems to contribute to mechanistic understanding of observations and explore which sampling and data archiving strategies may be required to form an accurate, sustained picture of small molecule signatures on extensive cohorts of patients with chronic neurodegenerative disorders. Finally, we reflect on how NT-HR-MS can support unravelling the contribution of the environment to complex diseases

Open Repository and Bibliography - Luxembourg

"MS-Ready" structures for non-targeted high-resolution mass spectrometry screening studies.

Author: Grulke Chris
Mansouri Kamel
McEachran Andrew D.
Ruttkies Christoph
Schymanski Emma
Williams Antony J.
Publication venue
Publication date: 01/01/2018
Field of study

Chemical database searching has become a fixture in many non-targeted identification workflows based on high-resolution mass spectrometry (HRMS). However, the form of a chemical structure observed in HRMS does not always match the form stored in a database (e.g., the neutral form versus a salt; one component of a mixture rather than the mixture form used in a consumer product). Linking the form of a structure observed via HRMS to its related form(s) within a database will enable the return of all relevant variants of a structure, as well as the related metadata, in a single query. A Konstanz Information Miner (KNIME) workflow has been developed to produce structural representations observed using HRMS ("MS-Ready structures") and links them to those stored in a database. These MS-Ready structures, and associated mappings to the full chemical representations, are surfaced via the US EPA's Chemistry Dashboard ( https://comptox.epa.gov/dashboard/ ). This article describes the workflow for the generation and linking of ~ 700,000 MS-Ready structures (derived from ~ 760,000 original structures) as well as download, search and export capabilities to serve structure identification using HRMS. The importance of this form of structural representation for HRMS is demonstrated with several examples, including integration with the in silico fragmentation software application MetFrag. The structures, search, download and export functionality are all available through the CompTox Chemistry Dashboard, while the MetFrag implementation can be viewed at https://msbi.ipb-halle.de/MetFragBeta/

Directory of Open Access Journals

Open Repository and Bibliography - Luxembourg

Rate-dependency of action potential duration and refractoriness in isolated myocytes from the rabbit AV node and atrium

Author: Andrew C. Rankin
Antony J. Workman
Anyukhovsky
Billette
Boyett
Cagin
Carmeliet
Dawodu
Delmar
Delmar
Denes
Duan
Fermini
Ferrier
Gibbs
Giles
Giles
Green
Hancox
Hoshino
Howarth
Isenberg
Jurkiewicz
Kaab
Kathleen A. Kane
Kukushkin
Li
Li
Meijler
Meijler
Mitcheson
Munk
Nakayama
Nattel
Nayebpour
Neher
Nilius
Petrecca
Rankin
Ravens
Shibata
Shigematsu
Tanaka
Vaughan-Williams
Wang
Wang
Wang
Wenckebach
Workman
Publication venue: 'Elsevier BV'
Publication date: 01/08/2000
Field of study

During atrial fibrillation, ventricular rate is determined by atrioventricular nodal (AVN) conduction, which in part is dependent upon the refractoriness of single AVN cells. The aims of this study were to investigate the rate-dependency of the action potential duration (APD) and effective refractory period (ERP) in single myocytes isolated from the AV node and atrium of rabbit hearts, using whole cell patch clamping, and to determine the contribution of the 4-aminopyridine (4-AP)-sensitive current, ITO1to these relationships in the two cell types. AVN cells had a more positive maximum diastolic potential (-60±1 v-71±2 mV), lower Vmax(8±2 v 144±17 V/s) and higher input resistance [420±46 v 65±7 MOHgr (mean±s.eP<0.05n=9–33)], respectively, than atrial myocytes. Stepwise increases in rate from 75 beats/min caused activation failure and Wenckebach periodicity in AVN cells (at around 400 beats/min), but 1:1 activation in atrial cells (at up to 600 beats/min). Rate reduction from 300 to 75 beats/min shortened the ERP in both cell types (from 155±7 to 135±11 ms in AVN cells [P<0.05, n=6] and from 130±8 to 106±7 ms in atrial cells [P<0.05, n=10]). Rate increase from 300 to 480 and 600 beats/min shortened ERP in atrial cells, by 12±4% (n=8) and 26±7% (n=7), respectively (P<0.05). By contrast, AVN ERP did not shorten at rates >300 beats/min. In atrial cells, rate reduction to 75 beats/min caused marked shortening of APD50(from 51±6 to 29±6 ms, P<0.05). 4-AP (1 mm) significantly prolonged atrial APD50at 75 beats/min (P<0.05, n=7), but not at 300 or 400 beats/min. In AVN cells, in contrast, there was less effect of rate change on APD, and 4-AP did not alter APD50at any rate. 4-AP also did not affect APD90or ERP in either cell type. In conclusion, a lack of ERP-shortening at high rates in rabbit single AVN cells may contribute to ventricular rate control. ITO1contributed to the APD50rate relation in atrial, but not AVN cells and did not contribute to the ERP rate relation in either cell type

Crossref

Enlighten