2,332 research outputs found
Identifying Authorship Style in Malicious Binaries: Techniques, Challenges & Datasets
Attributing a piece of malware to its creator typically requires threat intelligence. Binary attribution increases the level of difficulty as it mostly relies upon the ability to disassemble binaries to identify authorship style. Our survey explores malicious author style and the adversarial techniques used by them to remain anonymous. We examine the adversarial impact on the state-of-the-art methods. We identify key findings and explore the open research challenges. To mitigate the lack of ground truth datasets in this domain, we publish alongside this survey the largest and most diverse meta-information dataset of 15,660 malware labeled to 164 threat actor groups
Designing for Irrelevance
My job title is ‘designer’ but I’m reluctant to describe myself as a designer for a number of reasons: first, because the practice has a lot to answer for; and second, because I don’t do a whole lot of design. I help groups of people to collaborate and converse their way through problems towards solutions—activating a latent capability for design in people as they think and work differently, together. The sense of agency that accompanies this is intoxicating. This work can produce strategies, systems, and services, as well as spaces, objects, and graphics. The awareness that design can shape both our (intangible) experiences and our (tangible) environments—and that, as a mode of thinking, it can be accessible, inclusive, and participatory—shifts it from a practice to a stance. In this sense, is design a choice that we make to perceive and move through the world in a contextual and intentional way? What does this mean for the practice of design?I respod to these question by reflecting on my experience of participating in the Indonesia Australia Design Futures project
Adversarial Attacks on Code Models with Discriminative Graph Patterns
Pre-trained language models of code are now widely used in various software
engineering tasks such as code generation, code completion, vulnerability
detection, etc. This, in turn, poses security and reliability risks to these
models. One of the important threats is \textit{adversarial attacks}, which can
lead to erroneous predictions and largely affect model performance on
downstream tasks. Current adversarial attacks on code models usually adopt
fixed sets of program transformations, such as variable renaming and dead code
insertion, leading to limited attack effectiveness. To address the
aforementioned challenges, we propose a novel adversarial attack framework,
GraphCodeAttack, to better evaluate the robustness of code models. Given a
target code model, GraphCodeAttack automatically mines important code patterns,
which can influence the model's decisions, to perturb the structure of input
code to the model. To do so, GraphCodeAttack uses a set of input source codes
to probe the model's outputs and identifies the \textit{discriminative} ASTs
patterns that can influence the model decisions. GraphCodeAttack then selects
appropriate AST patterns, concretizes the selected patterns as attacks, and
inserts them as dead code into the model's input program. To effectively
synthesize attacks from AST patterns, GraphCodeAttack uses a separate
pre-trained code model to fill in the ASTs with concrete code snippets. We
evaluate the robustness of two popular code models (e.g., CodeBERT and
GraphCodeBERT) against our proposed approach on three tasks: Authorship
Attribution, Vulnerability Prediction, and Clone Detection. The experimental
results suggest that our proposed approach significantly outperforms
state-of-the-art approaches in attacking code models such as CARROT and ALERT
SHIELD: Thwarting Code Authorship Attribution
Authorship attribution has become increasingly accurate, posing a serious
privacy risk for programmers who wish to remain anonymous. In this paper, we
introduce SHIELD to examine the robustness of different code authorship
attribution approaches against adversarial code examples. We define four
attacks on attribution techniques, which include targeted and non-targeted
attacks, and realize them using adversarial code perturbation. We experiment
with a dataset of 200 programmers from the Google Code Jam competition to
validate our methods targeting six state-of-the-art authorship attribution
methods that adopt a variety of techniques for extracting authorship traits
from source-code, including RNN, CNN, and code stylometry. Our experiments
demonstrate the vulnerability of current authorship attribution methods against
adversarial attacks. For the non-targeted attack, our experiments demonstrate
the vulnerability of current authorship attribution methods against the attack
with an attack success rate exceeds 98.5\% accompanied by a degradation of the
identification confidence that exceeds 13\%. For the targeted attacks, we show
the possibility of impersonating a programmer using targeted-adversarial
perturbations with a success rate ranging from 66\% to 88\% for different
authorship attribution techniques under several adversarial scenarios.Comment: 12 pages, 13 figure
Robin: A Novel Method to Produce Robust Interpreters for Deep Learning-Based Code Classifiers
Deep learning has been widely used in source code classification tasks, such
as code classification according to their functionalities, code authorship
attribution, and vulnerability detection. Unfortunately, the black-box nature
of deep learning makes it hard to interpret and understand why a classifier
(i.e., classification model) makes a particular prediction on a given example.
This lack of interpretability (or explainability) might have hindered their
adoption by practitioners because it is not clear when they should or should
not trust a classifier's prediction. The lack of interpretability has motivated
a number of studies in recent years. However, existing methods are neither
robust nor able to cope with out-of-distribution examples. In this paper, we
propose a novel method to produce \underline{Rob}ust \underline{in}terpreters
for a given deep learning-based code classifier; the method is dubbed Robin.
The key idea behind Robin is a novel hybrid structure combining an interpreter
and two approximators, while leveraging the ideas of adversarial training and
data augmentation. Experimental results show that on average the interpreter
produced by Robin achieves a 6.11\% higher fidelity (evaluated on the
classifier), 67.22\% higher fidelity (evaluated on the approximator), and
15.87x higher robustness than that of the three existing interpreters we
evaluated. Moreover, the interpreter is 47.31\% less affected by
out-of-distribution examples than that of LEMNA.Comment: To be published in the 38th IEEE/ACM International Conference on
Automated Software Engineering (ASE 2023
Generative adversarial copy machines
This essay explores the redistribution of expressive agency across human artists and non-human entities that inevitably occurs when artificial intelligence (AI) becomes involved in creative processes. In doing so, my focus is not on a ‘becoming-creative’ of AI in an anthropocentric sense of the term. Rather, my central argument is as follows: if AI systems are (or will be) capable of generating outputs that can satisfy requirements by which creativity is currently being evaluated, validated, and valorised, then AI inevitably disturbs prevailing aesthetic and ontological assumptions concerning anthropocentrically framed ideals of the artist figure, the work of art, and the idea of creativity as such. I will elaborate this argument by way of a close reading of Generative Adversarial Network (GAN) technology and its uses in AI art, alongside examples of ownership claims and disputes involving GAN-style AI art. Overall, the discussion links to cultural theories of AI, relevant legal theory, and posthumanist thought. It is across these contexts that I will reframe GAN systems, even when their ‘artistic’ outputs can be interpreted with reference to the concept of the singular author figure, as ‘Generative Adversarial Copy Machines.’ Ultimately, I want to propose that the disturbances effected by AI in artistic practices can pose a critical challenge to the integrity of cultural ownership models – specifically: intellectual property (IP) enclosures – which rely on an anthropocentric conceptualisation of authorship
OpenML: networked science in machine learning
Many sciences have made significant breakthroughs by adopting online tools
that help organize, structure and mine information that is too detailed to be
printed in journals. In this paper, we introduce OpenML, a place for machine
learning researchers to share and organize data in fine detail, so that they
can work more effectively, be more visible, and collaborate with others to
tackle harder problems. We discuss how OpenML relates to other examples of
networked science and what benefits it brings for machine learning research,
individual scientists, as well as students and practitioners.Comment: 12 pages, 10 figure
Artificial intelligence and UK national security: Policy considerations
RUSI was commissioned by GCHQ to conduct an independent research study into the use of artificial intelligence (AI) for national security purposes. The aim of this project is to establish an independent evidence base to inform future policy development regarding national security uses of AI. The findings are based on in-depth consultation with stakeholders from across the UK national security community, law enforcement agencies, private sector companies, academic and legal experts, and civil society representatives. This was complemented by a targeted review of existing literature on the topic of AI and national security.
The research has found that AI offers numerous opportunities for the UK national security community to improve efficiency and effectiveness of existing processes. AI methods can rapidly derive insights from large, disparate datasets and identify connections that would otherwise go unnoticed by human operators. However, in the context of national security and the powers given to UK intelligence agencies, use of AI could give rise to additional privacy and human rights considerations which would need to be assessed within the existing legal and regulatory framework. For this reason, enhanced policy and guidance is needed to ensure the privacy and human rights implications of national security uses of AI are reviewed on an ongoing basis as new analysis methods are applied to data
Dos and Don'ts of Machine Learning in Computer Security
With the growing processing power of computing systems and the increasing
availability of massive datasets, machine learning algorithms have led to major
breakthroughs in many different areas. This development has influenced computer
security, spawning a series of work on learning-based security systems, such as
for malware detection, vulnerability discovery, and binary code analysis.
Despite great potential, machine learning in security is prone to subtle
pitfalls that undermine its performance and render learning-based systems
potentially unsuitable for security tasks and practical deployment. In this
paper, we look at this problem with critical eyes. First, we identify common
pitfalls in the design, implementation, and evaluation of learning-based
security systems. We conduct a study of 30 papers from top-tier security
conferences within the past 10 years, confirming that these pitfalls are
widespread in the current security literature. In an empirical analysis, we
further demonstrate how individual pitfalls can lead to unrealistic performance
and interpretations, obstructing the understanding of the security problem at
hand. As a remedy, we propose actionable recommendations to support researchers
in avoiding or mitigating the pitfalls where possible. Furthermore, we identify
open problems when applying machine learning in security and provide directions
for further research.Comment: to appear at USENIX Security Symposium 202
- …