8 research outputs found

    Π˜Π΄Π΅Π½Ρ‚ΠΈΡ„ΠΈΠΊΠ°Ρ†ΠΈΡ Π°Π²Ρ‚ΠΎΡ€Π° исходного ΠΊΠΎΠ΄Π° ΠΌΠ΅Ρ‚ΠΎΠ΄Π°ΠΌΠΈ машинного обучСния

    Get PDF
    Π‘Ρ‚Π°Ρ‚ΡŒΡ посвящСна Π°Π½Π°Π»ΠΈΠ·Ρƒ ΠΏΡ€ΠΎΠ±Π»Π΅ΠΌΡ‹ опрСдСлСния Π°Π²Ρ‚ΠΎΡ€Π° исходного ΠΊΠΎΠ΄Π°, которая прСдставляСт интСрСс для исслСдоватСлСй Π² области ΠΈΠ½Ρ„ΠΎΡ€ΠΌΠ°Ρ†ΠΈΠΎΠ½Π½ΠΎΠΉ бСзопасности, ΠΊΠΎΠΌΠΏΡŒΡŽΡ‚Π΅Ρ€Π½ΠΎΠΉ криминалистики, ΠΎΡ†Π΅Π½ΠΊΠΈ качСства ΠΎΠ±Ρ€Π°Π·ΠΎΠ²Π°Ρ‚Π΅Π»ΡŒΠ½ΠΎΠ³ΠΎ процСсса, Π·Π°Ρ‰ΠΈΡ‚Ρ‹ ΠΈΠ½Ρ‚Π΅Π»Π»Π΅ΠΊΡ‚ΡƒΠ°Π»ΡŒΠ½ΠΎΠΉ собствСнности. ΠŸΡ€Π΅Π΄ΡΡ‚Π°Π²Π»Π΅Π½ ΠΏΠΎΠ΄Ρ€ΠΎΠ±Π½Ρ‹ΠΉ Π°Π½Π°Π»ΠΈΠ· соврСмСнных Ρ€Π΅ΡˆΠ΅Π½ΠΈΠΉ ΠΏΡ€ΠΎΠ±Π»Π΅ΠΌΡ‹. ΠŸΡ€Π΅Π΄Π»Π°Π³Π°ΡŽΡ‚ΡΡ Π΄Π²Π΅ Π½ΠΎΠ²Ρ‹Π΅ ΠΌΠ΅Ρ‚ΠΎΠ΄ΠΈΠΊΠΈ ΠΈΠ΄Π΅Π½Ρ‚ΠΈΡ„ΠΈΠΊΠ°Ρ†ΠΈΠΈ Π½Π° основС Π°Π»Π³ΠΎΡ€ΠΈΡ‚ΠΌΠΎΠ² машинного обучСния: ΠΌΠ°ΡˆΠΈΠ½Ρ‹ ΠΎΠΏΠΎΡ€Π½Ρ‹Ρ… Π²Π΅ΠΊΡ‚ΠΎΡ€ΠΎΠ², Ρ„ΠΈΠ»ΡŒΡ‚Ρ€Π° быстрой коррСляции ΠΈ ΠΈΠ½Ρ„ΠΎΡ€ΠΌΠ°Ρ‚ΠΈΠ²Π½Ρ‹Ρ… ΠΏΡ€ΠΈΠ·Π½Π°ΠΊΠΎΠ²; Π³ΠΈΠ±Ρ€ΠΈΠ΄Π½ΠΎΠΉ свСрточно-Ρ€Π΅ΠΊΡƒΡ€Ρ€Π΅Π½Ρ‚Π½ΠΎΠΉ Π½Π΅ΠΉΡ€ΠΎΠ½Π½ΠΎΠΉ сСти. ЭкспСримСнты ΠΏΡ€ΠΎΠ²ΠΎΠ΄ΠΈΠ»ΠΈΡΡŒ Π½Π° Π±Π°Π·Π΅ исходных ΠΊΠΎΠ΄ΠΎΠ², написанных Π½Π° Π½Π°ΠΈΠ±ΠΎΠ»Π΅Π΅ популярных языках программирования. Π’ ΡΠΊΡΠΏΠ΅Ρ€ΠΈΠΌΠ΅Π½Ρ‚Π°Π»ΡŒΠ½ΡƒΡŽ Π±Π°Π·Ρƒ вошли экзСмпляры исходных ΠΊΠΎΠ΄ΠΎΠ², написанных Π½Π° Java, C++, Python, PHP, JavaScript, C, C# ΠΈ Ruby. Π”Π°Π½Π½Ρ‹Π΅ Π±Ρ‹Π»ΠΈ ΠΏΠΎΠ»ΡƒΡ‡Π΅Π½Ρ‹ с Π²Π΅Π±-сСрвиса для хостинга IT-ΠΏΡ€ΠΎΠ΅ΠΊΡ‚ΠΎΠ² Github. ΠžΠ±Ρ‰Π΅Π΅ количСство исходных ΠΊΠΎΠ΄ΠΎΠ² ΠΏΡ€Π΅Π²Ρ‹ΡˆΠ°Π΅Ρ‚ 150 тысяч ΠΎΠ±Ρ€Π°Π·Ρ†ΠΎΠ², срСдняя Π΄Π»ΠΈΠ½Π° ΠΊΠ°ΠΆΠ΄ΠΎΠ³ΠΎ ΠΈΠ· ΠΊΠΎΡ‚ΠΎΡ€Ρ‹Ρ… составляСт 850 символов. Π Π°Π·ΠΌΠ΅Ρ€ корпуса β€” 542 Π°Π²Ρ‚ΠΎΡ€Π°. Π‘ ΠΏΠΎΠΌΠΎΡ‰ΡŒΡŽ пСрСкрСстной ΠΏΡ€ΠΎΠ²Π΅Ρ€ΠΊΠΈ ΠΏΠΎ 10 Π±Π»ΠΎΠΊΠ°ΠΌ ΠΎΡ†Π΅Π½Π΅Π½Π° Ρ‚ΠΎΡ‡Π½ΠΎΡΡ‚ΡŒ Ρ€Π°Π·Ρ€Π°Π±ΠΎΡ‚Π°Π½Π½Ρ‹Ρ… ΠΌΠ΅Ρ‚ΠΎΠ΄ΠΈΠΊ для Ρ€Π°Π·Π»ΠΈΡ‡Π½ΠΎΠ³ΠΎ количСства Π°Π²Ρ‚ΠΎΡ€ΠΎΠ². Для Π½Π°ΠΈΠ±ΠΎΠ»Π΅Π΅ популярного языка программирования Java ΠΏΡ€ΠΎΠ²Π΅Π΄Π΅Π½ Π΄ΠΎΠΏΠΎΠ»Π½ΠΈΡ‚Π΅Π»ΡŒΠ½Ρ‹ΠΉ ряд экспСримСнтов с количСством Π°Π²Ρ‚ΠΎΡ€ΠΎΠ² ΠΎΡ‚ 2 Π΄ΠΎ 50 ΠΈ ΠΏΡ€ΠΈΠ²Π΅Π΄Π΅Π½Ρ‹ Π³Ρ€Π°Ρ„ΠΈΠΊΠΈ зависимости точности ΠΈΠ΄Π΅Π½Ρ‚ΠΈΡ„ΠΈΠΊΠ°Ρ†ΠΈΠΈ ΠΎΡ‚ Ρ€Π°Π·ΠΌΠ΅Ρ€Π° корпуса. Анализ Ρ€Π΅Π·ΡƒΠ»ΡŒΡ‚Π°Ρ‚ΠΎΠ² ΠΏΠΎΠΊΠ°Π·Π°Π», Ρ‡Ρ‚ΠΎ ΠΌΠ΅Ρ‚ΠΎΠ΄ΠΈΠΊΠ° Π½Π° основС Π³ΠΈΠ±Ρ€ΠΈΠ΄Π½ΠΎΠΉ Π½Π΅ΠΉΡ€ΠΎΠ½Π½ΠΎΠΉ сСти способна Π΄ΠΎΡΡ‚ΠΈΠ³Π°Ρ‚ΡŒ точности 97%, Ρ‡Ρ‚ΠΎ являСтся Π½Π°ΠΈΠ»ΡƒΡ‡ΡˆΠΈΠΌ Ρ€Π΅Π·ΡƒΠ»ΡŒΡ‚Π°Ρ‚ΠΎΠΌ Π½Π° сСгодняшний дСнь. ΠœΠ΅Ρ‚ΠΎΠ΄ΠΈΠΊΠ° Π½Π° основС ΠΌΠ°ΡˆΠΈΠ½Ρ‹ ΠΎΠΏΠΎΡ€Π½Ρ‹Ρ… Π²Π΅ΠΊΡ‚ΠΎΡ€ΠΎΠ² ΠΏΠΎΠ·Π²ΠΎΠ»ΠΈΠ»Π° Π΄ΠΎΠ±ΠΈΡ‚ΡŒΡΡ точности 96%. Гибридная нСйронная ΡΠ΅Ρ‚ΡŒ оказалась Ρ‚ΠΎΡ‡Π½Π΅Π΅ ΠΌΠ°ΡˆΠΈΠ½Ρ‹ ΠΎΠΏΠΎΡ€Π½Ρ‹Ρ… Π²Π΅ΠΊΡ‚ΠΎΡ€ΠΎΠ² Π² срСднСм Π½Π° 5%

    Π˜Π΄Π΅Π½Ρ‚ΠΈΡ„ΠΈΠΊΠ°Ρ†ΠΈΡ Π°Π²Ρ‚ΠΎΡ€Π° исходного ΠΊΠΎΠ΄Π° ΠΌΠ΅Ρ‚ΠΎΠ΄Π°ΠΌΠΈ машинного обучСния

    Get PDF
    The paper is devoted to the analysis of the problem of determining the source code author , which is of interest to researchers in the field of information security, computer forensics, assessment of the quality of the educational process, protection of intellectual property. The paper presents a detailed analysis of modern solutions to the problem. The authors suggest two new identification techniques based on machine learning algorithms: support vector machine, fast correlation filter and informative features; the technique based on hybrid convolutional recurrent neural network. The experimental database includes samples of source codes written in Java, C ++, Python, PHP, JavaScript, C, C # and Ruby. The data was obtained using a web service for hosting IT-projects – Github. The total number of source codes exceeds 150 thousand samples. The average length of each of them is 850 characters. The case size is 542 authors. The experiments were conducted with source codes written in the most popular programming languages. Accuracy of the developed techniques for different numbers of authors was assessed using 10-fold cross-validation. An additional series of experiments was conducted with the number of authors from 2 to 50 for the most popular Java programming language. The graphs of the relationship between identification accuracy and case size are plotted. The analysis of result showed that the method based on hybrid neural network gives 97% accuracy, and it’s at the present time the best-known result. The technique based on the support vector machine made it possible to achieve 96% accuracy. The difference between the results of the hybrid neural network and the support vector machine was approximately 5%.Π‘Ρ‚Π°Ρ‚ΡŒΡ посвящСна Π°Π½Π°Π»ΠΈΠ·Ρƒ ΠΏΡ€ΠΎΠ±Π»Π΅ΠΌΡ‹ опрСдСлСния Π°Π²Ρ‚ΠΎΡ€Π° исходного ΠΊΠΎΠ΄Π°, которая прСдставляСт интСрСс для исслСдоватСлСй Π² области ΠΈΠ½Ρ„ΠΎΡ€ΠΌΠ°Ρ†ΠΈΠΎΠ½Π½ΠΎΠΉ бСзопасности, ΠΊΠΎΠΌΠΏΡŒΡŽΡ‚Π΅Ρ€Π½ΠΎΠΉ криминалистики, ΠΎΡ†Π΅Π½ΠΊΠΈ качСства ΠΎΠ±Ρ€Π°Π·ΠΎΠ²Π°Ρ‚Π΅Π»ΡŒΠ½ΠΎΠ³ΠΎ процСсса, Π·Π°Ρ‰ΠΈΡ‚Ρ‹ ΠΈΠ½Ρ‚Π΅Π»Π»Π΅ΠΊΡ‚ΡƒΠ°Π»ΡŒΠ½ΠΎΠΉ собствСнности. ΠŸΡ€Π΅Π΄ΡΡ‚Π°Π²Π»Π΅Π½ ΠΏΠΎΠ΄Ρ€ΠΎΠ±Π½Ρ‹ΠΉ Π°Π½Π°Π»ΠΈΠ· соврСмСнных Ρ€Π΅ΡˆΠ΅Π½ΠΈΠΉ ΠΏΡ€ΠΎΠ±Π»Π΅ΠΌΡ‹. ΠŸΡ€Π΅Π΄Π»Π°Π³Π°ΡŽΡ‚ΡΡ Π΄Π²Π΅ Π½ΠΎΠ²Ρ‹Π΅ ΠΌΠ΅Ρ‚ΠΎΠ΄ΠΈΠΊΠΈ ΠΈΠ΄Π΅Π½Ρ‚ΠΈΡ„ΠΈΠΊΠ°Ρ†ΠΈΠΈ Π½Π° основС Π°Π»Π³ΠΎΡ€ΠΈΡ‚ΠΌΠΎΠ² машинного обучСния: ΠΌΠ°ΡˆΠΈΠ½Ρ‹ ΠΎΠΏΠΎΡ€Π½Ρ‹Ρ… Π²Π΅ΠΊΡ‚ΠΎΡ€ΠΎΠ², Ρ„ΠΈΠ»ΡŒΡ‚Ρ€Π° быстрой коррСляции ΠΈ ΠΈΠ½Ρ„ΠΎΡ€ΠΌΠ°Ρ‚ΠΈΠ²Π½Ρ‹Ρ… ΠΏΡ€ΠΈΠ·Π½Π°ΠΊΠΎΠ²; Π³ΠΈΠ±Ρ€ΠΈΠ΄Π½ΠΎΠΉ свСрточно-Ρ€Π΅ΠΊΡƒΡ€Ρ€Π΅Π½Ρ‚Π½ΠΎΠΉ Π½Π΅ΠΉΡ€ΠΎΠ½Π½ΠΎΠΉ сСти. ЭкспСримСнты ΠΏΡ€ΠΎΠ²ΠΎΠ΄ΠΈΠ»ΠΈΡΡŒ Π½Π° Π±Π°Π·Π΅ исходных ΠΊΠΎΠ΄ΠΎΠ², написанных Π½Π° Π½Π°ΠΈΠ±ΠΎΠ»Π΅Π΅ популярных языках программирования. Π’ ΡΠΊΡΠΏΠ΅Ρ€ΠΈΠΌΠ΅Π½Ρ‚Π°Π»ΡŒΠ½ΡƒΡŽ Π±Π°Π·Ρƒ вошли экзСмпляры исходных ΠΊΠΎΠ΄ΠΎΠ², написанных Π½Π° Java, C++, Python, PHP, JavaScript, C, C# ΠΈ Ruby. Π”Π°Π½Π½Ρ‹Π΅ Π±Ρ‹Π»ΠΈ ΠΏΠΎΠ»ΡƒΡ‡Π΅Π½Ρ‹ с Π²Π΅Π±-сСрвиса для хостинга IT-ΠΏΡ€ΠΎΠ΅ΠΊΡ‚ΠΎΠ² Github. ΠžΠ±Ρ‰Π΅Π΅ количСство исходных ΠΊΠΎΠ΄ΠΎΠ² ΠΏΡ€Π΅Π²Ρ‹ΡˆΠ°Π΅Ρ‚ 150 тысяч ΠΎΠ±Ρ€Π°Π·Ρ†ΠΎΠ², срСдняя Π΄Π»ΠΈΠ½Π° ΠΊΠ°ΠΆΠ΄ΠΎΠ³ΠΎ ΠΈΠ· ΠΊΠΎΡ‚ΠΎΡ€Ρ‹Ρ… составляСт 850 символов. Π Π°Π·ΠΌΠ΅Ρ€ корпуса β€” 542 Π°Π²Ρ‚ΠΎΡ€Π°. Π‘ ΠΏΠΎΠΌΠΎΡ‰ΡŒΡŽ пСрСкрСстной ΠΏΡ€ΠΎΠ²Π΅Ρ€ΠΊΠΈ ΠΏΠΎ 10 Π±Π»ΠΎΠΊΠ°ΠΌ ΠΎΡ†Π΅Π½Π΅Π½Π° Ρ‚ΠΎΡ‡Π½ΠΎΡΡ‚ΡŒ Ρ€Π°Π·Ρ€Π°Π±ΠΎΡ‚Π°Π½Π½Ρ‹Ρ… ΠΌΠ΅Ρ‚ΠΎΠ΄ΠΈΠΊ для Ρ€Π°Π·Π»ΠΈΡ‡Π½ΠΎΠ³ΠΎ количСства Π°Π²Ρ‚ΠΎΡ€ΠΎΠ². Для Π½Π°ΠΈΠ±ΠΎΠ»Π΅Π΅ популярного языка программирования Java ΠΏΡ€ΠΎΠ²Π΅Π΄Π΅Π½ Π΄ΠΎΠΏΠΎΠ»Π½ΠΈΡ‚Π΅Π»ΡŒΠ½Ρ‹ΠΉ ряд экспСримСнтов с количСством Π°Π²Ρ‚ΠΎΡ€ΠΎΠ² ΠΎΡ‚ 2 Π΄ΠΎ 50 ΠΈ ΠΏΡ€ΠΈΠ²Π΅Π΄Π΅Π½Ρ‹ Π³Ρ€Π°Ρ„ΠΈΠΊΠΈ зависимости точности ΠΈΠ΄Π΅Π½Ρ‚ΠΈΡ„ΠΈΠΊΠ°Ρ†ΠΈΠΈ ΠΎΡ‚ Ρ€Π°Π·ΠΌΠ΅Ρ€Π° корпуса. Анализ Ρ€Π΅Π·ΡƒΠ»ΡŒΡ‚Π°Ρ‚ΠΎΠ² ΠΏΠΎΠΊΠ°Π·Π°Π», Ρ‡Ρ‚ΠΎ ΠΌΠ΅Ρ‚ΠΎΠ΄ΠΈΠΊΠ° Π½Π° основС Π³ΠΈΠ±Ρ€ΠΈΠ΄Π½ΠΎΠΉ Π½Π΅ΠΉΡ€ΠΎΠ½Π½ΠΎΠΉ сСти способна Π΄ΠΎΡΡ‚ΠΈΠ³Π°Ρ‚ΡŒ точности 97%, Ρ‡Ρ‚ΠΎ являСтся Π½Π°ΠΈΠ»ΡƒΡ‡ΡˆΠΈΠΌ Ρ€Π΅Π·ΡƒΠ»ΡŒΡ‚Π°Ρ‚ΠΎΠΌ Π½Π° сСгодняшний дСнь. ΠœΠ΅Ρ‚ΠΎΠ΄ΠΈΠΊΠ° Π½Π° основС ΠΌΠ°ΡˆΠΈΠ½Ρ‹ ΠΎΠΏΠΎΡ€Π½Ρ‹Ρ… Π²Π΅ΠΊΡ‚ΠΎΡ€ΠΎΠ² ΠΏΠΎΠ·Π²ΠΎΠ»ΠΈΠ»Π° Π΄ΠΎΠ±ΠΈΡ‚ΡŒΡΡ точности 96%. Гибридная нСйронная ΡΠ΅Ρ‚ΡŒ оказалась Ρ‚ΠΎΡ‡Π½Π΅Π΅ ΠΌΠ°ΡˆΠΈΠ½Ρ‹ ΠΎΠΏΠΎΡ€Π½Ρ‹Ρ… Π²Π΅ΠΊΡ‚ΠΎΡ€ΠΎΠ² Π² срСднСм Π½Π° 5%

    Git Blame Who?: Stylistic Authorship Attribution of Small, Incomplete Source Code Fragments

    No full text
    Program authorship attribution has implications for the privacy of programmers who wish to contribute code anonymously. While previous work has shown that individually authored complete files can be attributed, these efforts have focused on such ideal data sets as contest submissions and student assignments. We explore the problem of authorship attribution β€œin the wild,” examining source code obtained from open-source version control systems, and investigate how contributions can be attributed to their authors, either on an individual or a per-account basis. In this work, we present a study of attribution of code collected from collaborative environments and identify factors which make attribution of code fragments more or less successful. For individual contributions, we show that previous methods (adapted to be applied to short code fragments) yield an accuracy of approximately 50% or 60%, depending on whether we average by sample or by author, at identifying the correct author out of a set of 104 programmers. By ensembling the classification probabilities of a sufficiently large set of samples belonging to the same author we achieve much higher accuracy for assigning the set of samples to the correct author from a known suspect set. Additionally, we propose the use of calibration curves to identify which samples are by unknown and previously unencountered authors

    Git Blame Who?: Stylistic Authorship Attribution of Small, Incomplete Source Code Fragments

    No full text

    Identifying Authorship Style in Malicious Binaries: Techniques, Challenges & Datasets

    Get PDF
    Attributing a piece of malware to its creator typically requires threat intelligence. Binary attribution increases the level of difficulty as it mostly relies upon the ability to disassemble binaries to identify authorship style. Our survey explores malicious author style and the adversarial techniques used by them to remain anonymous. We examine the adversarial impact on the state-of-the-art methods. We identify key findings and explore the open research challenges. To mitigate the lack of ground truth datasets in this domain, we publish alongside this survey the largest and most diverse meta-information dataset of 15,660 malware labeled to 164 threat actor groups

    Source Code Stylometry and Authorship Attribution for Open Source

    Get PDF
    Public software repositories such as GitHub make transparent the development history of an open source software system. Source code commits, discussions about new features and bugs, and code reviews are stored and carefully attributed to the appropriate developers. However, sometimes governments may seek to analyze these repositories, to identify citizens who contribute to projects they disapprove of, such as those involving cryptography or social media. While developers who seek anonymity may contribute under assumed identities, their body of public work may be characteristic enough to betray who they really are. The ability to contribute anonymously to public bodies of knowledge is extremely important to the future of technological and intellectual freedoms. Just as in security hacking, the only way to protect vulnerable individuals is by demonstrating the means and strength of available attacks so that those concerned may know of the need and develop the means to protect themselves. In this work, we present a method to de-anonymize source code contributors based on the authors' intrinsic programming style. First, we present a partial replication study wherein we attempt to de-anonymize a large number of entries into the Google Code Jam competition. We base our approach on Caliskan-Islam et al. 2015, but with modifications to the feature set and modelling strategy for scalability and feature-selection robustness. We did not achieve 0.98 F1 achieved in this prior work, but managed a still reasonable 0.71 F1 under identical experimental conditions, and a 0.88 F1 given more data from the same set. Second, we present an exploratory study focused on de-anonymizing programmers who have contributed to a repository, using other commits from the same repository as training data. We train random-forest classifiers using programmer data collected from 37 medium to large open-source repositories. Given a choice between active developers in a project, we were able to correctly determine authorship of a given function about 75% of the time, without the use of identifying meta-data or comments. We were also able to correctly validate a contributor as the author of a questioned function with 80\% recall and 65\% precision. This exploratory study provides empirical support for our approach. Finally, we present the results of a similar, but more difficult study wherein we attempt de-anonymize a repository in the same manner, but without using the target repository as training data. To do this, we gather as much training data as possible from the repository's contributors through the Github API. We evaluate our technique over 3 repositories: Bitcoin, Ethereum (crypto-currencies) and TrinityCore (a game engine). Our results in this experiment starkly contrast our results in the intra-repository study showing accuracies of 35% for Bitcoin, 22% for Ethereum, and 21% for TrinityCore which had candidate set sizes of 6, 5, and 7 respectively. Our results indicate that we can do somewhat better than random guessing, even under difficult experimental conditions, but they also indicate some fundamental issues with the state of the art of Code Stylometry. In this work we present our methodology, results, and some comments on past empirical studies, the difficulties we faced, and likely hurdles for future work in the area

    The Effect of Code Obfuscation on Authorship Attribution of Binary Computer Files

    Get PDF
    In many forensic investigations, questions linger regarding the identity of the authors of the software specimen. Research has identified methods for the attribution of binary files that have not been obfuscated, but a significant percentage of malicious software has been obfuscated in an effort to hide both the details of its origin and its true intent. Little research has been done around analyzing obfuscated code for attribution. In part, the reason for this gap in the research is that deobfuscation of an unknown program is a challenging task. Further, the additional transformation of the executable file introduced by the obfuscator modifies or removes features from the original executable that would have been used in the author attribution process. Existing research has demonstrated good success in attributing the authorship of an executable file of unknown provenance using methods based on static analysis of the specimen file. With the addition of file obfuscation, static analysis of files becomes difficult, time consuming, and in some cases, may lead to inaccurate findings. This paper presents a novel process for authorship attribution using dynamic analysis methods. A software emulated system was fully instrumented to become a test harness for a specimen of unknown provenance, allowing for supervised control, monitoring, and trace data collection during execution. This trace data was used as input into a supervised machine learning algorithm trained to identify stylometric differences in the specimen under test and provide predictions on who wrote the specimen. The specimen files were also analyzed for authorship using static analysis methods to compare prediction accuracies with prediction accuracies gathered from this new, dynamic analysis based method. Experiments indicate that this new method can provide better accuracy of author attribution for files of unknown provenance, especially in the case where the specimen file has been obfuscated
    corecore