93,260 research outputs found
On the Feasibility of Malware Authorship Attribution
There are many occasions in which the security community is interested to
discover the authorship of malware binaries, either for digital forensics
analysis of malware corpora or for thwarting live threats of malware invasion.
Such a discovery of authorship might be possible due to stylistic features
inherent to software codes written by human programmers. Existing studies of
authorship attribution of general purpose software mainly focus on source code,
which is typically based on the style of programs and environment. However,
those features critically depend on the availability of the program source
code, which is usually not the case when dealing with malware binaries. Such
program binaries often do not retain many semantic or stylistic features due to
the compilation process. Therefore, authorship attribution in the domain of
malware binaries based on features and styles that will survive the compilation
process is challenging. This paper provides the state of the art in this
literature. Further, we analyze the features involved in those techniques. By
using a case study, we identify features that can survive the compilation
process. Finally, we analyze existing works on binary authorship attribution
and study their applicability to real malware binaries.Comment: FPS 201
SHIELD: Thwarting Code Authorship Attribution
Authorship attribution has become increasingly accurate, posing a serious
privacy risk for programmers who wish to remain anonymous. In this paper, we
introduce SHIELD to examine the robustness of different code authorship
attribution approaches against adversarial code examples. We define four
attacks on attribution techniques, which include targeted and non-targeted
attacks, and realize them using adversarial code perturbation. We experiment
with a dataset of 200 programmers from the Google Code Jam competition to
validate our methods targeting six state-of-the-art authorship attribution
methods that adopt a variety of techniques for extracting authorship traits
from source-code, including RNN, CNN, and code stylometry. Our experiments
demonstrate the vulnerability of current authorship attribution methods against
adversarial attacks. For the non-targeted attack, our experiments demonstrate
the vulnerability of current authorship attribution methods against the attack
with an attack success rate exceeds 98.5\% accompanied by a degradation of the
identification confidence that exceeds 13\%. For the targeted attacks, we show
the possibility of impersonating a programmer using targeted-adversarial
perturbations with a success rate ranging from 66\% to 88\% for different
authorship attribution techniques under several adversarial scenarios.Comment: 12 pages, 13 figure
Authorship Attribution Through Words Surrounding Named Entities
In text analysis, authorship attribution occurs in a variety of ways. The field of computational linguistics becomes more important as the need of authorship attribution and text analysis becomes more widespread. For this research, pre-existing authorship attribution software, Java Graphical Authorship Attribution Program (JGAAP), implements a named entity recognizer, specifically the Stanford Named Entity Recognizer, to probe into similar genre text and to aid in extricating the correct author. This research specifically examines the words authors use around named entities in order to test the ability of these words at attributing authorshi
- …