56 research outputs found
Developing Quantitative Structure–Activity Relationship (QSAR) Models for Water Contaminants’ Activities/Properties by Fine-Tuning GPT‑3 Models
In this study, we developed quantitative
structure–activity
relationship (QSAR) models for water contaminants’ activities/properties
by fine-tuning GPT-3 models. We also proposed a novel masked atom
importance (MAI) approach for model interpretation and an OpenAIEmbedding
similarity-based method for determining the applicability domain.
We utilized the Simplified Molecular-Input Line-Entry System (SMILES)
of contaminants and their corresponding activities/properties from
hree data sets: pKd, Koc, and Solubility. These were
used as input prompts and completions, respectively, to fine-tune
four GPT-3 models (Davinci, Curie, Babbage, and Ada) obtained from
OpenAI. The Babbage model demonstrated superior performance
for the pKd data set, while the Davinci model excelled with the Koc and Solubility data sets, even outperforming
molecular fingerprint (MF) CatBoost-based QSAR models. The MAI interpretation
results were qualitatively consistent with the SHapley additive expansion
(SHAP) interpretation but exhibited less sensitivity in quantitative
analysis. The OpenAIEmbedding similarity-based applicability domain
determination approach showed efficacy comparable to that of the MF-based
similarity approach but with added robustness. This study underscores
the potential of large language models in developing QSAR models,
paving the way for further advancements in QSAR modeling using state-of-the-art
language models
Count-Based Morgan Fingerprint: A More Efficient and Interpretable Molecular Representation in Developing Machine Learning-Based Predictive Regression Models for Water Contaminants’ Activities and Properties
In this study, we introduce the count-based
Morgan fingerprint
(C-MF) to represent chemical structures of contaminants and develop
machine learning (ML)-based predictive models for their activities
and properties. Compared with the binary Morgan fingerprint (B-MF),
C-MF not only qualifies the presence or absence of an atom group but
also quantifies its counts in a molecule. We employ six different
ML algorithms (ridge regression, SVM, KNN, RF, XGBoost, and CatBoost)
to develop models on 10 contaminant-related data sets based on C-MF
and B-MF to compare them in terms of the model’s predictive
performance, interpretation, and applicability domain (AD). Our results
show that C-MF outperforms B-MF in nine of 10 data sets in terms of
model predictive performance. The advantage of C-MF over B-MF is dependent
on the ML algorithm, and the performance enhancements are proportional
to the difference in the chemical diversity of data sets calculated
by B-MF and C-MF. Model interpretation results show that the C-MF-based
model can elucidate the effect of atom group counts on the target
and have a wider range of SHAP values. AD analysis shows that C-MF-based
models have an AD similar to that of B-MF-based ones. Finally, we
developed a “ContaminaNET” platform to deploy these
C-MF-based models for free use
Count-Based Morgan Fingerprint: A More Efficient and Interpretable Molecular Representation in Developing Machine Learning-Based Predictive Regression Models for Water Contaminants’ Activities and Properties
In this study, we introduce the count-based
Morgan fingerprint
(C-MF) to represent chemical structures of contaminants and develop
machine learning (ML)-based predictive models for their activities
and properties. Compared with the binary Morgan fingerprint (B-MF),
C-MF not only qualifies the presence or absence of an atom group but
also quantifies its counts in a molecule. We employ six different
ML algorithms (ridge regression, SVM, KNN, RF, XGBoost, and CatBoost)
to develop models on 10 contaminant-related data sets based on C-MF
and B-MF to compare them in terms of the model’s predictive
performance, interpretation, and applicability domain (AD). Our results
show that C-MF outperforms B-MF in nine of 10 data sets in terms of
model predictive performance. The advantage of C-MF over B-MF is dependent
on the ML algorithm, and the performance enhancements are proportional
to the difference in the chemical diversity of data sets calculated
by B-MF and C-MF. Model interpretation results show that the C-MF-based
model can elucidate the effect of atom group counts on the target
and have a wider range of SHAP values. AD analysis shows that C-MF-based
models have an AD similar to that of B-MF-based ones. Finally, we
developed a “ContaminaNET” platform to deploy these
C-MF-based models for free use
Common and distinct quantitative characteristics of Chinese and Western music in terms of modes, scales, degrees and melody variations
Music is in general mutually understood among different cultures worldwide but can sometimes vary by different styles. The Western music system (heptatonic system) and the Chinese music system (pentatonic system) have common and different characteristics in terms of modes, scales, degrees, and melody variations. In this paper, we show these characteristics through quantitative analysis and comparison and have identified two common characteristics of Chinese pentatonic and Western heptatonic music with respect to modes, scales, and degrees. Although the structures and profiles of the modes, scales, and degrees of both kinds of music are different, they both show a preference to bright modes and scales, and their preference to tonal centres and degrees are consistent with the generating orders via their primal tuning temperaments. Based on the structural characteristics of the musical scales of Chinese pentatonic and Western heptatonic music, we present three interval-dividing metrics to examine the melody variations quantitatively. We find that the melody variations of Chinese pentatonic and Western heptatonic music measured by the interval-dividing metrics both follow the power law, a physical-mathematical law existing in many natural and engineering systems. This study fills the gap in the quantitative analysis of Chinese pentatonic music and provides an approach to studying the common characteristics of different types of music.</p
Role of Ferrate(IV) and Ferrate(V) in Activating Ferrate(VI) by Calcium Sulfite for Enhanced Oxidation of Organic Contaminants
Although
the Fe(VI)–sulfite process has shown great potential
for the rapid removal of organic contaminants, the major active oxidants
(Fe(IV)/Fe(V) versus SO4•–/•OH) involved in this process are still under debate.
By employing sparingly soluble CaSO3 as a slow-releasing
source of SO32–, this study evaluated
the oxidation performance of the Fe(VI)–CaSO3 process
and identified the active oxidants involved in this process. The process
exhibited efficient oxidation of a variety of compounds, including
antibiotics, pharmaceuticals, and pesticides, at rates that were 6.1–173.7-fold
faster than those measured for Fe(VI) alone, depending on pH, CaSO3 dosage, and the properties of organic contaminants. Many
lines of evidence verified that neither SO4•– nor •OH was the active species in the Fe(VI)–CaSO3 process. The accelerating effect of CaSO3 was
ascribed to the direct generation of Fe(IV)/Fe(V) species from the
reaction of Fe(VI) with soluble SO32– via one-electron steps as well as the indirect generation of Fe(IV)/Fe(V)
species from the self-decay of Fe(VI) and Fe(VI) reaction with H2O2, which could be catalyzed by uncomplexed Fe(III).
Besides, the Fe(VI)–CaSO3 process exhibited satisfactory
removal of organic contaminants in real water, and inorganic anions
showed negligible effects on organic contaminant decomposition in
this process. Thus, the Fe(VI)–CaSO3 process with
Fe(IV)/Fe(V) as reactive oxidants may be a promising method for abating
various micropollutants in water treatment
Premagnetization for Enhancing the Reactivity of Multiple Zerovalent Iron Samples toward Various Contaminants
Premagnetization
was applied to enhance the removal of various
oxidative contaminants (including amaranth (AR27), lead ion (Pb<sup>2+</sup>), cupric ion (Cu<sup>2+</sup>), selenite (Se<sup>4+</sup>), silver ion (Ag<sup>+</sup>), and chromate (Cr<sup>6+</sup>)) by
zerovalent iron (ZVI) from different origins under well-controlled
experimental conditions. The rate constants of contaminants by premagnetized
ZVI (Mag-ZVI) samples were 1.2–12.2-fold greater than those
by pristine ZVI (Pri-ZVI) samples. Generally, there was a linear correlation
between the specific reaction rate constants (<i>k</i><sub>SA</sub>) of one particular contaminant removal by various Pri-ZVI
or Mag-ZVI samples and those of the other contaminant, which could
be successfully employed to predict the <i>k</i><sub>SA</sub> of one contaminant by one ZVI sample if <i>k</i><sub>SA</sub> of the other contaminant by this ZVI sample was available. The specific
rate constant of Fe(II) release at pH 4.0 was proposed in this study
to stand for the intrinsic reactivity of a ZVI sample. All Mag-ZVI
samples had higher intrinsic reactivity than their counterparts without
premagnetization. There were strong correlations between the intrinsic
reactivity of various Pri-ZVI/Mag-ZVI samples and the removal rate
constants of a specific contaminant by these ZVI samples not only
at pH 4.0 when the intrinsic reactivity was determined but also at
other pH levels. This correlation could be employed to predict the
removal rate constant of this contaminant by a ZVI sample that was
not included in the original data set once the intrinsic reactivity
of the ZVI sample was known
Activation of Manganese Oxidants with Bisulfite for Enhanced Oxidation of Organic Contaminants: The Involvement of Mn(III)
MnO<sub>4</sub><sup>–</sup> was activated by HSO<sub>3</sub><sup>–</sup>, resulting in
a process that oxidizes organic
contaminants at extraordinarily high rates. The permanganate/bisulfite
(PM/BS) process oxidized phenol, ciprofloxacin, and methyl blue at
pH<sub>ini</sub> 5.0 with rates (<i>k</i><sub>obs</sub> ≈
60–150 s<sup>–1</sup>) that were 5–6 orders of
magnitude faster than those measured for permanganate alone, and ∼5
to 7 orders of magnitude faster than conventional advanced oxidation
processes for water treatment. Oxidation of phenol was fastest at
pH 4.0, but still effective at pH 7.0, and only slightly slower when
performed in tap water. A smaller, but still considerable (∼3
orders of magnitude) increase in oxidation rates of methyl blue was
observed with MnO<sub>2</sub> activated by HSO<sub>3</sub><sup>–</sup> (MO/BS). The above results, time-resolved spectroscopy of manganese
species under various conditions, stoichiometric analysis of pH changes,
and the effect of pyrophosphate on UV absorbance spectra suggest that
the reactive intermediate(s) responsible for the extremely rapid oxidation
of organic contaminants in the PM/BS process involve manganese(III)
species with minimal stabilization by complexation. The PM/BS process
may lead to a new category of advanced oxidation technologies based
on contaminant oxidation by reactive manganese(III) species, rather
than hydroxyl and sulfate radicals
Influence of Different Nominal Molecular Weight Fractions of Humic Acids on Phenol Oxidation by Permanganate
The effects of humic acid (HA) and its different nominal molecular weight (NMW) fractions on the phenol oxidation by permanganate were studied. Phenol oxidation by permanganate was enhanced by the presence of HA at pH 4−8, while slightly inhibited at pH 9−10. The effects of HA on phenol oxidation by permanganate were dependent on HA concentration and permanganate/phenol molar ratios. The high NMW fractions of HA enhanced phenol oxidation by permanganate at pH 7 more significantly than the low fractions of HA. The apparent second-order rate constants of phenol oxidation by permanganate in the presence of HA correlated well with their specific ultraviolet absorption (SUVA) at 254 nm and specific violet absorption (SVA) at 465 or 665 nm. High positive correlation coefficients (R2 > 0.72) implied that π-electrons of HA strongly influenced the reactivity of phenol towards permanganate oxidation which agreed well with the information provided by fluorescence spectroscopy. The FTIR analysis indicated that the HA fractions rich in aliphatic character, polysaccharide-like substances, and the amount of carboxylate groups had less effect on phenol oxidation by permanganate. The negative correlation between the rate constants of phenol oxidation by permanganate and O/C ratios suggested that the oxidation of phenol increased with a decrease in the content of oxygen-containing functional groups
- …
