12 research outputs found

    Investigation of a Data Split Strategy Involving the Time Axis in Adverse Event Prediction Using Machine Learning

    Full text link
    Adverse events are a serious issue in drug development and many prediction methods using machine learning have been developed. The random split cross-validation is the de facto standard for model building and evaluation in machine learning, but care should be taken in adverse event prediction because this approach tends to be overoptimistic compared with the real-world situation. The time split, which uses the time axis, is considered suitable for real-world prediction. However, the differences in model performance obtained using the time and random splits are not fully understood. To understand the differences, we compared the model performance between the time and random splits using eight types of compound information as input, eight adverse events as targets, and six machine learning algorithms. The random split showed higher area under the curve values than did the time split for six of eight targets. The chemical spaces of the training and test datasets of the time split were similar, suggesting that the concept of applicability domain is insufficient to explain the differences derived from the splitting. The area under the curve differences were smaller for the protein interaction than for the other datasets. Subsequent detailed analyses suggested the danger of confounding in the use of knowledge-based information in the time split. These findings indicate the importance of understanding the differences between the time and random splits in adverse event prediction and suggest that appropriate use of the splitting strategies and interpretation of results are necessary for the real-world prediction of adverse events.Comment: 20 pages, 4 figure

    Difficulty in learning chirality for Transformer fed with SMILES

    Full text link
    Recent years have seen development of descriptor generation based on representation learning of extremely diverse molecules, especially those that apply natural language processing (NLP) models to SMILES, a literal representation of molecular structure. However, little research has been done on how these models understand chemical structure. To address this, we investigated the relationship between the learning progress of SMILES and chemical structure using a representative NLP model, the Transformer. The results suggest that while the Transformer learns partial structures of molecules quickly, it requires extended training to understand overall structures. Consistently, the accuracy of molecular property predictions using descriptors generated from models at different learning steps was similar from the beginning to the end of training. Furthermore, we found that the Transformer requires particularly long training to learn chirality and sometimes stagnates with low translation accuracy due to misunderstanding of enantiomers. These findings are expected to deepen understanding of NLP models in chemistry.Comment: 20 pages, 6 figure

    Difficulty in chirality recognition for Transformer architectures learning chemical structures from string representations

    No full text
    Abstract Recent years have seen rapid development of descriptor generation based on representation learning of extremely diverse molecules, especially those that apply natural language processing (NLP) models to SMILES, a literal representation of molecular structure. However, little research has been done on how these models understand chemical structure. To address this black box, we investigated the relationship between the learning progress of SMILES and chemical structure using a representative NLP model, the Transformer. We show that while the Transformer learns partial structures of molecules quickly, it requires extended training to understand overall structures. Consistently, the accuracy of molecular property predictions using descriptors generated from models at different learning steps was similar from the beginning to the end of training. Furthermore, we found that the Transformer requires particularly long training to learn chirality and sometimes stagnates with low performance due to misunderstanding of enantiomers. These findings are expected to deepen the understanding of NLP models in chemistry

    Intestinal Atp8b1 dysfunction causes hepatic choline deficiency and steatohepatitis

    No full text
    Abstract Choline is an essential nutrient, and its deficiency causes steatohepatitis. Dietary phosphatidylcholine (PC) is digested into lysoPC (LPC), glycerophosphocholine, and choline in the intestinal lumen and is the primary source of systemic choline. However, the major PC metabolites absorbed in the intestinal tract remain unidentified. ATP8B1 is a P4-ATPase phospholipid flippase expressed in the apical membrane of the epithelium. Here, we use intestinal epithelial cell (IEC)-specific Atp8b1-knockout (Atp8b1IEC-KO) mice. These mice progress to steatohepatitis by 4 weeks. Metabolomic analysis and cell-based assays show that loss of Atp8b1 in IEC causes LPC malabsorption and thereby hepatic choline deficiency. Feeding choline-supplemented diets to lactating mice achieves complete recovery from steatohepatitis in Atp8b1IEC-KO mice. Analysis of samples from pediatric patients with ATP8B1 deficiency suggests its translational potential. This study indicates that Atp8b1 regulates hepatic choline levels through intestinal LPC absorption, encouraging the evaluation of choline supplementation therapy for steatohepatitis caused by ATP8B1 dysfunction
    corecore