High-order parametric models that include terms for feature interactions are
applied to various data mining tasks, where ground truth depends on
interactions of features. However, with sparse data, the high- dimensional
parameters for feature interactions often face three issues: expensive
computation, difficulty in parameter estimation and lack of structure. Previous
work has proposed approaches which can partially re- solve the three issues. In
particular, models with factorized parameters (e.g. Factorization Machines) and
sparse learning algorithms (e.g. FTRL-Proximal) can tackle the first two issues
but fail to address the third. Regarding to unstructured parameters,
constraints or complicated regularization terms are applied such that
hierarchical structures can be imposed. However, these methods make the
optimization problem more challenging. In this work, we propose Strongly
Hierarchical Factorization Machines and ANOVA kernel regression where all the
three issues can be addressed without making the optimization problem more
difficult. Experimental results show the proposed models significantly
outperform the state-of-the-art in two data mining tasks: cold-start user
response time prediction and stock volatility prediction.Comment: 9 pages, to appear in SDM'1