12 research outputs found
Investigation of a Data Split Strategy Involving the Time Axis in Adverse Event Prediction Using Machine Learning
Adverse events are a serious issue in drug development and many prediction
methods using machine learning have been developed. The random split
cross-validation is the de facto standard for model building and evaluation in
machine learning, but care should be taken in adverse event prediction because
this approach tends to be overoptimistic compared with the real-world
situation. The time split, which uses the time axis, is considered suitable for
real-world prediction. However, the differences in model performance obtained
using the time and random splits are not fully understood. To understand the
differences, we compared the model performance between the time and random
splits using eight types of compound information as input, eight adverse events
as targets, and six machine learning algorithms. The random split showed higher
area under the curve values than did the time split for six of eight targets.
The chemical spaces of the training and test datasets of the time split were
similar, suggesting that the concept of applicability domain is insufficient to
explain the differences derived from the splitting. The area under the curve
differences were smaller for the protein interaction than for the other
datasets. Subsequent detailed analyses suggested the danger of confounding in
the use of knowledge-based information in the time split. These findings
indicate the importance of understanding the differences between the time and
random splits in adverse event prediction and suggest that appropriate use of
the splitting strategies and interpretation of results are necessary for the
real-world prediction of adverse events.Comment: 20 pages, 4 figure
Difficulty in learning chirality for Transformer fed with SMILES
Recent years have seen development of descriptor generation based on
representation learning of extremely diverse molecules, especially those that
apply natural language processing (NLP) models to SMILES, a literal
representation of molecular structure. However, little research has been done
on how these models understand chemical structure. To address this, we
investigated the relationship between the learning progress of SMILES and
chemical structure using a representative NLP model, the Transformer. The
results suggest that while the Transformer learns partial structures of
molecules quickly, it requires extended training to understand overall
structures. Consistently, the accuracy of molecular property predictions using
descriptors generated from models at different learning steps was similar from
the beginning to the end of training. Furthermore, we found that the
Transformer requires particularly long training to learn chirality and
sometimes stagnates with low translation accuracy due to misunderstanding of
enantiomers. These findings are expected to deepen understanding of NLP models
in chemistry.Comment: 20 pages, 6 figure
Difficulty in chirality recognition for Transformer architectures learning chemical structures from string representations
Abstract Recent years have seen rapid development of descriptor generation based on representation learning of extremely diverse molecules, especially those that apply natural language processing (NLP) models to SMILES, a literal representation of molecular structure. However, little research has been done on how these models understand chemical structure. To address this black box, we investigated the relationship between the learning progress of SMILES and chemical structure using a representative NLP model, the Transformer. We show that while the Transformer learns partial structures of molecules quickly, it requires extended training to understand overall structures. Consistently, the accuracy of molecular property predictions using descriptors generated from models at different learning steps was similar from the beginning to the end of training. Furthermore, we found that the Transformer requires particularly long training to learn chirality and sometimes stagnates with low performance due to misunderstanding of enantiomers. These findings are expected to deepen the understanding of NLP models in chemistry
Development of a Novel Platform of Proteome Profiling Based on an Easy-to-Handle and Informative 2D-DIGE System
Differential Roles of Ubiquitination in the Degradation Mechanism of Cell Surface–Resident Bile Salt Export Pump and Multidrug Resistance–Associated Protein 2
Evaluation of Organic Anion Transporter 1A2-knock-in Mice as a Model of Human Blood-brain Barrier
Intestinal Atp8b1 dysfunction causes hepatic choline deficiency and steatohepatitis
Abstract Choline is an essential nutrient, and its deficiency causes steatohepatitis. Dietary phosphatidylcholine (PC) is digested into lysoPC (LPC), glycerophosphocholine, and choline in the intestinal lumen and is the primary source of systemic choline. However, the major PC metabolites absorbed in the intestinal tract remain unidentified. ATP8B1 is a P4-ATPase phospholipid flippase expressed in the apical membrane of the epithelium. Here, we use intestinal epithelial cell (IEC)-specific Atp8b1-knockout (Atp8b1IEC-KO) mice. These mice progress to steatohepatitis by 4 weeks. Metabolomic analysis and cell-based assays show that loss of Atp8b1 in IEC causes LPC malabsorption and thereby hepatic choline deficiency. Feeding choline-supplemented diets to lactating mice achieves complete recovery from steatohepatitis in Atp8b1IEC-KO mice. Analysis of samples from pediatric patients with ATP8B1 deficiency suggests its translational potential. This study indicates that Atp8b1 regulates hepatic choline levels through intestinal LPC absorption, encouraging the evaluation of choline supplementation therapy for steatohepatitis caused by ATP8B1 dysfunction