We investigate a growing body of work that seeks to improve recommender
systems through the use of review text. Generally, these papers argue that
since reviews 'explain' users' opinions, they ought to be useful to infer the
underlying dimensions that predict ratings or purchases. Schemes to incorporate
reviews range from simple regularizers to neural network approaches. Our
initial findings reveal several discrepancies in reported results, partly due
to (e.g.) copying results across papers despite changes in experimental
settings or data pre-processing. First, we attempt a comprehensive analysis to
resolve these ambiguities. Further investigation calls for discussion on a much
larger problem about the "importance" of user reviews for recommendation.
Through a wide range of experiments, we observe several cases where
state-of-the-art methods fail to outperform existing baselines, especially as
we deviate from a few narrowly-defined settings where reviews are useful. We
conclude by providing hypotheses for our observations, that seek to
characterize under what conditions reviews are likely to be helpful. Through
this work, we aim to evaluate the direction in which the field is progressing
and encourage robust empirical evaluation.Comment: 4 pages, 3 figures. Accepted for publication at SIGIR '2