We apply modern statistical NLP techniques to study language transfer, a major issue in the theory of Second Language Acquisition (SLA). Using an SVM for the problem of native language classification, we show that a careful analysis of the effects of various features can lead to substantial scientific insights. In particular, we demonstrate that character bi-grams alone allow classification levels of about 66 % for a 5-class task even when content and function word differences are accounted for. We hypothesize that the phonology of a native language has a strong effect on the word choice of people writing in a second language.
To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.