We implement an equivariant transformer that embeds molecular net charge and
spin state without additional neural network parameters. The model trained on a
singlet/triplet non-correlated \ce{CH2} dataset can identify different spin
states and shows state-of-the-art extrapolation capability. We found that
Softmax activation function utilised in the self-attention mechanism of graph
networks outperformed ReLU-like functions in prediction accuracy. Additionally,
increasing the attention temperature from Ο=dβ to 2dβ
further improved the extrapolation capability. We also purposed a weight
initialisation method that sensibly accelerated the training process