56 research outputs found

    Developing Quantitative Structure–Activity Relationship (QSAR) Models for Water Contaminants’ Activities/Properties by Fine-Tuning GPT‑3 Models

    No full text
    In this study, we developed quantitative structure–activity relationship (QSAR) models for water contaminants’ activities/properties by fine-tuning GPT-3 models. We also proposed a novel masked atom importance (MAI) approach for model interpretation and an OpenAIEmbedding similarity-based method for determining the applicability domain. We utilized the Simplified Molecular-Input Line-Entry System (SMILES) of contaminants and their corresponding activities/properties from hree data sets: pKd, Koc, and Solubility. These were used as input prompts and completions, respectively, to fine-tune four GPT-3 models (Davinci, Curie, Babbage, and Ada) obtained from OpenAI. The Babbage model demonstrated superior performance for the pKd data set, while the Davinci model excelled with the Koc and Solubility data sets, even outperforming molecular fingerprint (MF) CatBoost-based QSAR models. The MAI interpretation results were qualitatively consistent with the SHapley additive expansion (SHAP) interpretation but exhibited less sensitivity in quantitative analysis. The OpenAIEmbedding similarity-based applicability domain determination approach showed efficacy comparable to that of the MF-based similarity approach but with added robustness. This study underscores the potential of large language models in developing QSAR models, paving the way for further advancements in QSAR modeling using state-of-the-art language models

    Count-Based Morgan Fingerprint: A More Efficient and Interpretable Molecular Representation in Developing Machine Learning-Based Predictive Regression Models for Water Contaminants’ Activities and Properties

    No full text
    In this study, we introduce the count-based Morgan fingerprint (C-MF) to represent chemical structures of contaminants and develop machine learning (ML)-based predictive models for their activities and properties. Compared with the binary Morgan fingerprint (B-MF), C-MF not only qualifies the presence or absence of an atom group but also quantifies its counts in a molecule. We employ six different ML algorithms (ridge regression, SVM, KNN, RF, XGBoost, and CatBoost) to develop models on 10 contaminant-related data sets based on C-MF and B-MF to compare them in terms of the model’s predictive performance, interpretation, and applicability domain (AD). Our results show that C-MF outperforms B-MF in nine of 10 data sets in terms of model predictive performance. The advantage of C-MF over B-MF is dependent on the ML algorithm, and the performance enhancements are proportional to the difference in the chemical diversity of data sets calculated by B-MF and C-MF. Model interpretation results show that the C-MF-based model can elucidate the effect of atom group counts on the target and have a wider range of SHAP values. AD analysis shows that C-MF-based models have an AD similar to that of B-MF-based ones. Finally, we developed a “ContaminaNET” platform to deploy these C-MF-based models for free use

    Count-Based Morgan Fingerprint: A More Efficient and Interpretable Molecular Representation in Developing Machine Learning-Based Predictive Regression Models for Water Contaminants’ Activities and Properties

    No full text
    In this study, we introduce the count-based Morgan fingerprint (C-MF) to represent chemical structures of contaminants and develop machine learning (ML)-based predictive models for their activities and properties. Compared with the binary Morgan fingerprint (B-MF), C-MF not only qualifies the presence or absence of an atom group but also quantifies its counts in a molecule. We employ six different ML algorithms (ridge regression, SVM, KNN, RF, XGBoost, and CatBoost) to develop models on 10 contaminant-related data sets based on C-MF and B-MF to compare them in terms of the model’s predictive performance, interpretation, and applicability domain (AD). Our results show that C-MF outperforms B-MF in nine of 10 data sets in terms of model predictive performance. The advantage of C-MF over B-MF is dependent on the ML algorithm, and the performance enhancements are proportional to the difference in the chemical diversity of data sets calculated by B-MF and C-MF. Model interpretation results show that the C-MF-based model can elucidate the effect of atom group counts on the target and have a wider range of SHAP values. AD analysis shows that C-MF-based models have an AD similar to that of B-MF-based ones. Finally, we developed a “ContaminaNET” platform to deploy these C-MF-based models for free use

    Common and distinct quantitative characteristics of Chinese and Western music in terms of modes, scales, degrees and melody variations

    No full text
    Music is in general mutually understood among different cultures worldwide but can sometimes vary by different styles. The Western music system (heptatonic system) and the Chinese music system (pentatonic system) have common and different characteristics in terms of modes, scales, degrees, and melody variations. In this paper, we show these characteristics through quantitative analysis and comparison and have identified two common characteristics of Chinese pentatonic and Western heptatonic music with respect to modes, scales, and degrees. Although the structures and profiles of the modes, scales, and degrees of both kinds of music are different, they both show a preference to bright modes and scales, and their preference to tonal centres and degrees are consistent with the generating orders via their primal tuning temperaments. Based on the structural characteristics of the musical scales of Chinese pentatonic and Western heptatonic music, we present three interval-dividing metrics to examine the melody variations quantitatively. We find that the melody variations of Chinese pentatonic and Western heptatonic music measured by the interval-dividing metrics both follow the power law, a physical-mathematical law existing in many natural and engineering systems. This study fills the gap in the quantitative analysis of Chinese pentatonic music and provides an approach to studying the common characteristics of different types of music.</p

    Role of Ferrate(IV) and Ferrate(V) in Activating Ferrate(VI) by Calcium Sulfite for Enhanced Oxidation of Organic Contaminants

    No full text
    Although the Fe­(VI)–sulfite process has shown great potential for the rapid removal of organic contaminants, the major active oxidants (Fe­(IV)/Fe­(V) versus SO4•–/•OH) involved in this process are still under debate. By employing sparingly soluble CaSO3 as a slow-releasing source of SO32–, this study evaluated the oxidation performance of the Fe­(VI)–CaSO3 process and identified the active oxidants involved in this process. The process exhibited efficient oxidation of a variety of compounds, including antibiotics, pharmaceuticals, and pesticides, at rates that were 6.1–173.7-fold faster than those measured for Fe­(VI) alone, depending on pH, CaSO3 dosage, and the properties of organic contaminants. Many lines of evidence verified that neither SO4•– nor •OH was the active species in the Fe­(VI)–CaSO3 process. The accelerating effect of CaSO3 was ascribed to the direct generation of Fe­(IV)/Fe­(V) species from the reaction of Fe­(VI) with soluble SO32– via one-electron steps as well as the indirect generation of Fe­(IV)/Fe­(V) species from the self-decay of Fe­(VI) and Fe­(VI) reaction with H2O2, which could be catalyzed by uncomplexed Fe­(III). Besides, the Fe­(VI)–CaSO3 process exhibited satisfactory removal of organic contaminants in real water, and inorganic anions showed negligible effects on organic contaminant decomposition in this process. Thus, the Fe­(VI)–CaSO3 process with Fe­(IV)/Fe­(V) as reactive oxidants may be a promising method for abating various micropollutants in water treatment

    Premagnetization for Enhancing the Reactivity of Multiple Zerovalent Iron Samples toward Various Contaminants

    No full text
    Premagnetization was applied to enhance the removal of various oxidative contaminants (including amaranth (AR27), lead ion (Pb<sup>2+</sup>), cupric ion (Cu<sup>2+</sup>), selenite (Se<sup>4+</sup>), silver ion (Ag<sup>+</sup>), and chromate (Cr<sup>6+</sup>)) by zerovalent iron (ZVI) from different origins under well-controlled experimental conditions. The rate constants of contaminants by premagnetized ZVI (Mag-ZVI) samples were 1.2–12.2-fold greater than those by pristine ZVI (Pri-ZVI) samples. Generally, there was a linear correlation between the specific reaction rate constants (<i>k</i><sub>SA</sub>) of one particular contaminant removal by various Pri-ZVI or Mag-ZVI samples and those of the other contaminant, which could be successfully employed to predict the <i>k</i><sub>SA</sub> of one contaminant by one ZVI sample if <i>k</i><sub>SA</sub> of the other contaminant by this ZVI sample was available. The specific rate constant of Fe­(II) release at pH 4.0 was proposed in this study to stand for the intrinsic reactivity of a ZVI sample. All Mag-ZVI samples had higher intrinsic reactivity than their counterparts without premagnetization. There were strong correlations between the intrinsic reactivity of various Pri-ZVI/Mag-ZVI samples and the removal rate constants of a specific contaminant by these ZVI samples not only at pH 4.0 when the intrinsic reactivity was determined but also at other pH levels. This correlation could be employed to predict the removal rate constant of this contaminant by a ZVI sample that was not included in the original data set once the intrinsic reactivity of the ZVI sample was known

    Activation of Manganese Oxidants with Bisulfite for Enhanced Oxidation of Organic Contaminants: The Involvement of Mn(III)

    No full text
    MnO<sub>4</sub><sup>–</sup> was activated by HSO<sub>3</sub><sup>–</sup>, resulting in a process that oxidizes organic contaminants at extraordinarily high rates. The permanganate/bisulfite (PM/BS) process oxidized phenol, ciprofloxacin, and methyl blue at pH<sub>ini</sub> 5.0 with rates (<i>k</i><sub>obs</sub> ≈ 60–150 s<sup>–1</sup>) that were 5–6 orders of magnitude faster than those measured for permanganate alone, and ∼5 to 7 orders of magnitude faster than conventional advanced oxidation processes for water treatment. Oxidation of phenol was fastest at pH 4.0, but still effective at pH 7.0, and only slightly slower when performed in tap water. A smaller, but still considerable (∼3 orders of magnitude) increase in oxidation rates of methyl blue was observed with MnO<sub>2</sub> activated by HSO<sub>3</sub><sup>–</sup> (MO/BS). The above results, time-resolved spectroscopy of manganese species under various conditions, stoichiometric analysis of pH changes, and the effect of pyrophosphate on UV absorbance spectra suggest that the reactive intermediate(s) responsible for the extremely rapid oxidation of organic contaminants in the PM/BS process involve manganese­(III) species with minimal stabilization by complexation. The PM/BS process may lead to a new category of advanced oxidation technologies based on contaminant oxidation by reactive manganese­(III) species, rather than hydroxyl and sulfate radicals

    Influence of Different Nominal Molecular Weight Fractions of Humic Acids on Phenol Oxidation by Permanganate

    No full text
    The effects of humic acid (HA) and its different nominal molecular weight (NMW) fractions on the phenol oxidation by permanganate were studied. Phenol oxidation by permanganate was enhanced by the presence of HA at pH 4−8, while slightly inhibited at pH 9−10. The effects of HA on phenol oxidation by permanganate were dependent on HA concentration and permanganate/phenol molar ratios. The high NMW fractions of HA enhanced phenol oxidation by permanganate at pH 7 more significantly than the low fractions of HA. The apparent second-order rate constants of phenol oxidation by permanganate in the presence of HA correlated well with their specific ultraviolet absorption (SUVA) at 254 nm and specific violet absorption (SVA) at 465 or 665 nm. High positive correlation coefficients (R2 > 0.72) implied that π-electrons of HA strongly influenced the reactivity of phenol towards permanganate oxidation which agreed well with the information provided by fluorescence spectroscopy. The FTIR analysis indicated that the HA fractions rich in aliphatic character, polysaccharide-like substances, and the amount of carboxylate groups had less effect on phenol oxidation by permanganate. The negative correlation between the rate constants of phenol oxidation by permanganate and O/C ratios suggested that the oxidation of phenol increased with a decrease in the content of oxygen-containing functional groups
    corecore