Symbolic regression (SR) is the process of discovering hidden relationships
from data with mathematical expressions, which is considered an effective way
to reach interpretable machine learning (ML). Genetic programming (GP) has been
the dominator in solving SR problems. However, as the scale of SR problems
increases, GP often poorly demonstrates and cannot effectively address the
real-world high-dimensional problems. This limitation is mainly caused by the
stochastic evolutionary nature of traditional GP in constructing the trees. In
this paper, we propose a differentiable approach named DGP to construct GP
trees towards high-dimensional SR for the first time. Specifically, a new data
structure called differentiable symbolic tree is proposed to relax the discrete
structure to be continuous, thus a gradient-based optimizer can be presented
for the efficient optimization. In addition, a sampling method is proposed to
eliminate the discrepancy caused by the above relaxation for valid symbolic
expressions. Furthermore, a diversification mechanism is introduced to promote
the optimizer escaping from local optima for globally better solutions. With
these designs, the proposed DGP method can efficiently search for the GP trees
with higher performance, thus being capable of dealing with high-dimensional
SR. To demonstrate the effectiveness of DGP, we conducted various experiments
against the state of the arts based on both GP and deep neural networks. The
experiment results reveal that DGP can outperform these chosen peer competitors
on high-dimensional regression benchmarks with dimensions varying from tens to
thousands. In addition, on the synthetic SR problems, the proposed DGP method
can also achieve the best recovery rate even with different noisy levels. It is
believed this work can facilitate SR being a powerful alternative to
interpretable ML for a broader range of real-world problems