Motivated by the amount of code that goes unidentified on the web, we
introduce a practical method for algorithmically identifying the programming
language of source code. Our work is based on supervised learning and
intelligent statistical features. We also explored, but abandoned, a
grammatical approach. In testing, our implementation greatly outperforms that
of an existing tool that relies on a Bayesian classifier. Code is written in
Python and available under an MIT license.Comment: 11 pages. Code:
https://github.com/simon-weber/Programming-Language-Identificatio