In an era where symbolic mathematical equations are indispensable for
modeling complex natural phenomena, scientific inquiry often involves
collecting observations and translating them into mathematical expressions.
Recently, deep learning has emerged as a powerful tool for extracting insights
from data. However, existing models typically specialize in either numeric or
symbolic domains, and are usually trained in a supervised manner tailored to
specific tasks. This approach neglects the substantial benefits that could
arise from a task-agnostic unified understanding between symbolic equations and
their numeric counterparts. To bridge the gap, we introduce SNIP, a
Symbolic-Numeric Integrated Pre-training, which employs joint contrastive
learning between symbolic and numeric domains, enhancing their mutual
similarities in the pre-trained embeddings. By performing latent space
analysis, we observe that SNIP provides cross-domain insights into the
representations, revealing that symbolic supervision enhances the embeddings of
numeric data and vice versa. We evaluate SNIP across diverse tasks, including
symbolic-to-numeric mathematical property prediction and numeric-to-symbolic
equation discovery, commonly known as symbolic regression. Results show that
SNIP effectively transfers to various tasks, consistently outperforming fully
supervised baselines and competing strongly with established task-specific
methods, especially in few-shot learning scenarios where available data is
limited