In this paper, we address the problem of using sentiment analysis tools
'off-the-shelf,' that is when a gold standard is not available for retraining.
We evaluate the performance of four SE-specific tools in a cross-platform
setting, i.e., on a test set collected from data sources different from the one
used for training. We find that (i) the lexicon-based tools outperform the
supervised approaches retrained in a cross-platform setting and (ii) retraining
can be beneficial in within-platform settings in the presence of robust gold
standard datasets, even using a minimal training set. Based on our empirical
findings, we derive guidelines for reliable use of sentiment analysis tools in
software engineering.Comment: 12 page