Although deep language representations have become the dominant form of
language featurization in recent years, in many settings it is important to
understand a model's decision-making process. This necessitates not only an
interpretable model but also interpretable features. In particular, language
must be featurized in a way that is interpretable while still characterizing
the original text well. We present SenteCon, a method for introducing human
interpretability in deep language representations. Given a passage of text,
SenteCon encodes the text as a layer of interpretable categories in which each
dimension corresponds to the relevance of a specific category. Our empirical
evaluations indicate that encoding language with SenteCon provides high-level
interpretability at little to no cost to predictive performance on downstream
tasks. Moreover, we find that SenteCon outperforms existing interpretable
language representations with respect to both its downstream performance and
its agreement with human characterizations of the text.Comment: Accepted to Findings of ACL 202