Source code segment authorship identification is the task of identifying the
author of a source code segment through supervised learning. It has vast
importance in plagiarism detection, digital forensics, and several other law
enforcement issues. However, when a source code segment is written by multiple
authors, typical author identification methods no longer work. Here, an author
identification technique, capable of predicting the authorship of source code
segments, even in the case of multiple authors, has been proposed which uses a
stacking ensemble classifier. This proposed technique is built upon several
deep neural networks, random forests and support vector machine classifiers. It
has been shown that for identifying the author group, a single classification
technique is no longer sufficient and using a deep neural network-based
stacking ensemble method can enhance the accuracy significantly. The
performance of the proposed technique has been compared with some existing
methods which only deal with the source code segments written precisely by a
single author. Despite the harder task of authorship identification for source
code segments written by multiple authors, our proposed technique has achieved
promising results evidenced by the identification accuracy, compared to the
related works which only deal with code segments written by a single author.Comment: 2019 22nd International Conference on Computer and Information
Technology (ICCIT