Neural network models have shown great success at natural language inference
(NLI), the task of determining whether a premise entails a hypothesis. However,
recent studies suggest that these models may rely on fallible heuristics rather
than deep language understanding. We introduce a challenge set to test whether
NLI systems adopt one such heuristic: assuming that a sentence entails all of
its subsequences, such as assuming that "Alice believes Mary is lying" entails
"Alice believes Mary." We evaluate several competitive NLI models on this
challenge set and find strong evidence that they do rely on the subsequence
heuristic.Comment: Accepted as an abstract for SCiL 2019; added acknowledgment