Automatically disentangling an author's style from the content of their
writing is a longstanding and possibly insurmountable problem in computational
linguistics. At the same time, the availability of large text corpora furnished
with author labels has recently enabled learning authorship representations in
a purely data-driven manner for authorship attribution, a task that ostensibly
depends to a greater extent on encoding writing style than encoding content.
However, success on this surrogate task does not ensure that such
representations capture writing style since authorship could also be correlated
with other latent variables, such as topic. In an effort to better understand
the nature of the information these representations convey, and specifically to
validate the hypothesis that they chiefly encode writing style, we
systematically probe these representations through a series of targeted
experiments. The results of these experiments suggest that representations
learned for the surrogate authorship prediction task are indeed sensitive to
writing style. As a consequence, authorship representations may be expected to
be robust to certain kinds of data shift, such as topic drift over time.
Additionally, our findings may open the door to downstream applications that
require stylistic representations, such as style transfer.Comment: appearing at TACL 202