1,278 research outputs found
The development of models to predict melting and pyrolysis point data associated with several hundred thousand compounds mined from PATENTS
BACKGROUND: Melting point (MP) is an important property in regards to the solubility of chemical compounds. Its prediction from chemical structure remains a highly challenging task for quantitative structure-activity relationship studies. Success in this area of research critically depends on the availability of high quality MP data as well as accurate chemical structure representations in order to develop models. Currently, available datasets for MP predictions have been limited to around 50k molecules while lots more data are routinely generated following the synthesis of novel materials. Significant amounts of MP data are freely available within the patent literature and, if it were available in the appropriate form, could potentially be used to develop predictive models. RESULTS: We have developed a pipeline for the automated extraction and annotation of chemical data from published PATENTS. Almost 300,000 data points have been collected and used to develop models to predict melting and pyrolysis (decomposition) points using tools available on the OCHEM modeling platform (http://ochem.eu). A number of technical challenges were simultaneously solved to develop models based on these data. These included the handing of sparse data matrices with >200,000,000,000 entries and parallel calculations using 32 × 6 cores per task using 13 descriptor sets totaling more than 700,000 descriptors. We showed that models developed using data collected from PATENTS had similar or better prediction accuracy compared to the highly curated data used in previous publications. The separation of data for chemicals that decomposed rather than melting, from compounds that did undergo a normal melting transition, was performed and models for both pyrolysis and MPs were developed. The accuracy of the consensus MP models for molecules from the drug-like region of chemical space was similar to their estimated experimental accuracy, 32 °C. Last but not least, important structural features related to the pyrolysis of chemicals were identified, and a model to predict whether a compound will decompose instead of melting was developed. CONCLUSIONS: We have shown that automated tools for the analysis of chemical information have reached a mature stage allowing for the extraction and collection of high quality data to enable the development of structure-activity relationship models. The developed models and data are publicly available at http://ochem.eu/article/99826
Impact Fracture of Composite and Homogeneous Nanoagglomerates
It is not yet clear on whether the fracture characteristics of structured composite capsules and homogeneous nanoagglomerates differ significantly under impact loading conditions. Experimental measurement of impact fracture properties of such small agglomerates is difficult, due to the length and time scales associated with this problem. Using computer simulations, here we show that nanoagglomerates are subjected to normal impact loading fracture within a few nanoseconds in a brittle manner. The restitution coefficient of the nanoagglomerates varies nonlinearly with initial kinetic energy. The fracture of nanoagglomerates does not always happen at the moment when they experience the maximum wall force, but occurs after a time lag of a few nanoseconds as characterised by impact survival time (IST) and IST index. IST is dependant on the initial kinetic energy, mechanical and geometric properties of the nanoagglomerates. For identical geometries of the capsules, IST index is higher for capsules with a soft shell than for these with a hard shell, an indication of the enhanced ability of the soft nanocapsules to dissipate impact energy. The DEM simulations reported here based on theories of contact mechanics provide fundamental insights on the fracture behaviour of agglomerates—at nanoscale, the structure of the agglomerates significantly influences their breakage behaviour
Supporting non-target identification by adding hydrogen deuterium exchange MS/MS capabilities to MetFrag
Liquid chromatography coupled with high-resolution mass spectrometry (LC-HRMS) is increasingly popular for the non-targeted exploration of complex samples, where tandem mass spectrometry (MS/MS) is used to characterize the structure of unknown compounds. However, mass spectra do not always contain sufficient information to unequivocally identify the correct structure. This study investigated how much additional information can be gained using hydrogen deuterium exchange (HDX) experiments. The exchange of “easily exchangeable” hydrogen atoms (connected to heteroatoms), with predominantly [M+D]+ ions in positive mode and [M-D]− in negative mode was observed. To enable high-throughput processing, new scoring terms were incorporated into the in silico fragmenter MetFrag. These were initially developed on small datasets and then tested on 762 compounds of environmental interest. Pairs of spectra (normal and deuterated) were found for 593 of these substances (506 positive mode, 155 negative mode spectra). The new scoring terms resulted in 29 additional correct identifications (78 vs 49) for positive mode and an increase in top 10 rankings from 80 to 106 in negative mode. Compounds with dual functionality (polar head group, long apolar tail) exhibited dramatic retention time (RT) shifts of up to several minutes, compared with an average 0.04 min RT shift. For a smaller dataset of 80 metabolites, top 10 rankings improved from 13 to 24 (positive mode, 57 spectra) and from 14 to 31 (negative mode, 63 spectra) when including HDX information. The results of standard measurements were confirmed using targets and tentatively identified surfactant species in an environmental sample collected from the river Danube near Novi Sad (Serbia). The changes to MetFrag have been integrated into the command line version available at http://c-ruttkies.github.io/MetFrag and all resulting spectra and compounds are available in online resources and in the Electronic Supplementary Material (ESM)
Automatic vs. manual curation of a multi-source chemical dictionary: the impact on text mining
Background. Previously, we developed a combined dictionary dubbed Chemlist for the identification of small molecules and drugs in text based on a number of publicly available databases and tested it on an annotated corpus. To achieve an acceptable recall and precision we used a number of automatic and semi-automatic processing steps together with disambiguation rules. However, it remained to be investigated which impact an extensive manual curation of a multi-source chemical dictionary would have on chemical term identification in text. ChemSpider is a chemical database that has undergone extensive manual curation aimed at establishing valid chemical name-to-structure relationships. Results. We acquired the component of ChemSpider containing only manually curated names and synonyms. Rule-based term filtering, semi-automatic manual curation, and disambiguation rules were applied. We tested the dictionary from ChemSpider on an annotated corpus and compared the results with those for the Chemlist dictionary. The ChemSpider dictionary of ca. 80 k names was only a 1/3 to a 1/4 the size of Chemlist at around 300 k. The ChemSpider dictionary had a precision of 0.43 and a recall of 0.19 before the application of filtering and disambiguation and a precision of 0.87 and a recall of 0.19 after filtering and disambiguation. The Chemlist dictionary had a precision of 0.20 and a recall of 0.47 before the application of filtering and disambiguation and a precision of 0.67 and a recall of 0.40 after filtering and disambiguation. Conclusions. We conclude the following: (1) The ChemSpider dictionary achieved the best precision but the Chemlist dictionary had a higher recall and the best F-score; (2) Rule-based filtering and disambiguation is necessary to achieve a high precision for both the automatically generated and the manually curated dictionary. ChemSpider is available as a web service at http://www.chemspider. com/ and the Chemlist dictionary is freely available as an XML file in Simple Knowledge Organization System format on the web at http://www.biosemantics.org/ chemlist
Broad external validation of a multivariable risk prediction model for gastrointestinal malignancy in iron deficiency anaemia.
BACKGROUND: Using two large datasets from Dorset, we previously reported an internally validated multivariable risk model for predicting the risk of GI malignancy in IDA-the IDIOM score. The aim of this retrospective observational study was to validate the IDIOM model using two independent external datasets. METHODS: The external validation datasets were collected, in a secondary care setting, by different investigators from cohorts in Oxford and Sheffield derived under different circumstances, comprising 1117 and 474 patients with confirmed IDA respectively. The data were anonymised prior to analysis. The predictive performance of the original model was evaluated by estimating measures of calibration, discrimination and clinical utility using the validation datasets. RESULTS: The discrimination of the original model using the external validation data was 70% (95% CI 65, 75) for the Oxford dataset and 70% (95% CI 61, 79) for the Sheffield dataset. The analysis of mean, weak, flexible and across the risk groups' calibration showed no tendency for under or over-estimated risks in the combined validation data. Decision curve analysis demonstrated the clinical value of the IDIOM model with a net benefit that is higher than 'investigate all' and 'investigate no-one' strategies up to a threshold of 18% in the combined validation data, using a risk cut-off of around 1.2% to categorise patients into the very low risk group showed that none of the patients stratified in this risk group proved to have GI cancer on investigation in the validation datasets. CONCLUSION: This external validation exercise has shown promising results for the IDIOM model in predicting the risk of underlying GI malignancy in independent IDA datasets collected in different clinical settings
Recommended from our members
Broad external validation of a multivariable risk prediction model for gastrointestinal malignancy in iron deficiency anaemia.
BACKGROUND: Using two large datasets from Dorset, we previously reported an internally validated multivariable risk model for predicting the risk of GI malignancy in IDA-the IDIOM score. The aim of this retrospective observational study was to validate the IDIOM model using two independent external datasets. METHODS: The external validation datasets were collected, in a secondary care setting, by different investigators from cohorts in Oxford and Sheffield derived under different circumstances, comprising 1117 and 474 patients with confirmed IDA respectively. The data were anonymised prior to analysis. The predictive performance of the original model was evaluated by estimating measures of calibration, discrimination and clinical utility using the validation datasets. RESULTS: The discrimination of the original model using the external validation data was 70% (95% CI 65, 75) for the Oxford dataset and 70% (95% CI 61, 79) for the Sheffield dataset. The analysis of mean, weak, flexible and across the risk groups' calibration showed no tendency for under or over-estimated risks in the combined validation data. Decision curve analysis demonstrated the clinical value of the IDIOM model with a net benefit that is higher than 'investigate all' and 'investigate no-one' strategies up to a threshold of 18% in the combined validation data, using a risk cut-off of around 1.2% to categorise patients into the very low risk group showed that none of the patients stratified in this risk group proved to have GI cancer on investigation in the validation datasets. CONCLUSION: This external validation exercise has shown promising results for the IDIOM model in predicting the risk of underlying GI malignancy in independent IDA datasets collected in different clinical settings
Connecting environmental exposure and neurodegeneration using cheminformatics and high resolution mass spectrometry: potential and challenges
Connecting chemical exposures over a lifetime to complex chronic diseases with multifactorial causes such as neurodegenerative diseases is an immense challenge requiring a long-term, interdisciplinary approach. Rapid developments in analytical and data technologies, such as non-target high resolution mass spectrometry (NT-HR-MS), have opened up new possibilities to accomplish this, inconceivable 20 years ago. While NT-HR-MS is being applied to increasingly complex research questions, there are still many unidentified chemicals and uncertainties in linking exposures to human health outcomes and environmental impacts. In this perspective, we explore the possibilities and challenges involved in using cheminformatics and NT-HR-MS to answer complex questions that cross many scientific disciplines, taking the identification of potential (small molecule) neurotoxicants in environmental or biological matrices as a case study. We explore capturing literature knowledge and patient exposure information in a form amenable to high-throughput data mining, and the related cheminformatic challenges. We then briefly cover which sample matrices are available, which method(s) could potentially be used to detect these chemicals in various matrices and what remains beyond the reach of NT-HR-MS. We touch on the potential for biological validation systems to contribute to mechanistic understanding of observations and explore which sampling and data archiving strategies may be required to form an accurate, sustained picture of small molecule signatures on extensive cohorts of patients with chronic neurodegenerative disorders. Finally, we reflect on how NT-HR-MS can support unravelling the contribution of the environment to complex diseases
"MS-Ready" structures for non-targeted high-resolution mass spectrometry screening studies.
Chemical database searching has become a fixture in many non-targeted identification workflows based on high-resolution mass spectrometry (HRMS). However, the form of a chemical structure observed in HRMS does not always match the form stored in a database (e.g., the neutral form versus a salt; one component of a mixture rather than the mixture form used in a consumer product). Linking the form of a structure observed via HRMS to its related form(s) within a database will enable the return of all relevant variants of a structure, as well as the related metadata, in a single query. A Konstanz Information Miner (KNIME) workflow has been developed to produce structural representations observed using HRMS ("MS-Ready structures") and links them to those stored in a database. These MS-Ready structures, and associated mappings to the full chemical representations, are surfaced via the US EPA's Chemistry Dashboard ( https://comptox.epa.gov/dashboard/ ). This article describes the workflow for the generation and linking of ~ 700,000 MS-Ready structures (derived from ~ 760,000 original structures) as well as download, search and export capabilities to serve structure identification using HRMS. The importance of this form of structural representation for HRMS is demonstrated with several examples, including integration with the in silico fragmentation software application MetFrag. The structures, search, download and export functionality are all available through the CompTox Chemistry Dashboard, while the MetFrag implementation can be viewed at https://msbi.ipb-halle.de/MetFragBeta/
Rate-dependency of action potential duration and refractoriness in isolated myocytes from the rabbit AV node and atrium
During atrial fibrillation, ventricular rate is determined by atrioventricular nodal (AVN) conduction, which in part is dependent upon the refractoriness of single AVN cells. The aims of this study were to investigate the rate-dependency of the action potential duration (APD) and effective refractory period (ERP) in single myocytes isolated from the AV node and atrium of rabbit hearts, using whole cell patch clamping, and to determine the contribution of the 4-aminopyridine (4-AP)-sensitive current, ITO1to these relationships in the two cell types. AVN cells had a more positive maximum diastolic potential (-60±1 v-71±2 mV), lower Vmax(8±2 v 144±17 V/s) and higher input resistance [420±46 v 65±7 MOHgr (mean±s.eP<0.05n=9–33)], respectively, than atrial myocytes. Stepwise increases in rate from 75 beats/min caused activation failure and Wenckebach periodicity in AVN cells (at around 400 beats/min), but 1:1 activation in atrial cells (at up to 600 beats/min). Rate reduction from 300 to 75 beats/min shortened the ERP in both cell types (from 155±7 to 135±11 ms in AVN cells [P<0.05, n=6] and from 130±8 to 106±7 ms in atrial cells [P<0.05, n=10]). Rate increase from 300 to 480 and 600 beats/min shortened ERP in atrial cells, by 12±4% (n=8) and 26±7% (n=7), respectively (P<0.05). By contrast, AVN ERP did not shorten at rates >300 beats/min. In atrial cells, rate reduction to 75 beats/min caused marked shortening of APD50(from 51±6 to 29±6 ms, P<0.05). 4-AP (1 mm) significantly prolonged atrial APD50at 75 beats/min (P<0.05, n=7), but not at 300 or 400 beats/min. In AVN cells, in contrast, there was less effect of rate change on APD, and 4-AP did not alter APD50at any rate. 4-AP also did not affect APD90or ERP in either cell type. In conclusion, a lack of ERP-shortening at high rates in rabbit single AVN cells may contribute to ventricular rate control. ITO1contributed to the APD50rate relation in atrial, but not AVN cells and did not contribute to the ERP rate relation in either cell type
- …