258 research outputs found

    Intriguing Properties of Adversarial ML Attacks in the Problem Space

    Get PDF
    Recent research efforts on adversarial ML have investigated problem-space attacks, focusing on the generation of real evasive objects in domains where, unlike images, there is no clear inverse mapping to the feature space (e.g., software). However, the design, comparison, and real-world implications of problem-space attacks remain underexplored. This paper makes two major contributions. First, we propose a novel formalization for adversarial ML evasion attacks in the problem-space, which includes the definition of a comprehensive set of constraints on available transformations, preserved semantics, robustness to preprocessing, and plausibility. We shed light on the relationship between feature space and problem space, and we introduce the concept of side-effect features as the byproduct of the inverse feature-mapping problem. This enables us to define and prove necessary and sufficient conditions for the existence of problem-space attacks. We further demonstrate the expressive power of our formalization by using it to describe several attacks from related literature across different domains. Second, building on our formalization, we propose a novel problem-space attack on Android malware that overcomes past limitations. Experiments on a dataset with 170K Android apps from 2017 and 2018 show the practical feasibility of evading a state-of-the-art malware classifier along with its hardened version. Our results demonstrate that "adversarial-malware as a service" is a realistic threat, as we automatically generate thousands of realistic and inconspicuous adversarial applications at scale, where on average it takes only a few minutes to generate an adversarial app. Our formalization of problem-space attacks paves the way to more principled research in this domain.Comment: This arXiv version (v2) corresponds to the one published at IEEE Symposium on Security & Privacy (Oakland), 202

    Information Forensics and Security: A quarter-century-long journey

    Get PDF
    Information forensics and security (IFS) is an active R&D area whose goal is to ensure that people use devices, data, and intellectual properties for authorized purposes and to facilitate the gathering of solid evidence to hold perpetrators accountable. For over a quarter century, since the 1990s, the IFS research area has grown tremendously to address the societal needs of the digital information era. The IEEE Signal Processing Society (SPS) has emerged as an important hub and leader in this area, and this article celebrates some landmark technical contributions. In particular, we highlight the major technological advances by the research community in some selected focus areas in the field during the past 25 years and present future trends

    AI alignment and generalization in deep learning

    Full text link
    This thesis covers a number of works in deep learning aimed at understanding and improving generalization abilities of deep neural networks (DNNs). DNNs achieve unrivaled performance in a growing range of tasks and domains, yet their behavior during learning and deployment remains poorly understood. They can also be surprisingly brittle: in-distribution generalization can be a poor predictor of behavior or performance under distributional shifts, which typically cannot be avoided in practice. While these limitations are not unique to DNNs -- and indeed are likely to be challenges facing any AI systems of sufficient complexity -- the prevalence and power of DNNs makes them particularly worthy of study. I frame these challenges within the broader context of "AI Alignment": a nascent field focused on ensuring that AI systems behave in accordance with their user's intentions. While making AI systems more intelligent or capable can help make them more aligned, it is neither necessary nor sufficient for alignment. However, being able to align state-of-the-art AI systems (e.g. DNNs) is of great social importance in order to avoid undesirable and unsafe behavior from advanced AI systems. Without progress in AI Alignment, advanced AI systems might pursue objectives at odds with human survival, posing an existential risk (``x-risk'') to humanity. A core tenet of this thesis is that the achieving high performance on machine learning benchmarks if often a good indicator of AI systems' capabilities, but not their alignment. This is because AI systems often achieve high performance in unexpected ways that reveal the limitations of our performance metrics, and more generally, our techniques for specifying our intentions. Learning about human intentions using DNNs shows some promise, but DNNs are still prone to learning to solve tasks using concepts of "features" very different from those which are salient to humans. Indeed, this is a major source of their poor generalization on out-of-distribution data. By better understanding the successes and failures of DNN generalization and current methods of specifying our intentions, we aim to make progress towards deep-learning based AI systems that are able to understand users' intentions and act accordingly.Cette thèse discute quelques travaux en apprentissage profond visant à comprendre et à améliorer les capacités de généralisation des réseaux de neurones profonds (DNN). Les DNNs atteignent des performances inégalées dans un éventail croissant de tâches et de domaines, mais leur comportement pendant l'apprentissage et le déploiement reste mal compris. Ils peuvent également être étonnamment fragiles: la généralisation dans la distribution peut être un mauvais prédicteur du comportement ou de la performance lors de changements de distribution, ce qui ne peut généralement pas être évité dans la pratique. Bien que ces limitations ne soient pas propres aux DNN - et sont en effet susceptibles de constituer des défis pour tout système d'IA suffisamment complexe - la prévalence et la puissance des DNN les rendent particulièrement dignes d'étude. J'encadre ces défis dans le contexte plus large de «l'alignement de l'IA»: un domaine naissant axé sur la garantie que les systèmes d'IA se comportent conformément aux intentions de leurs utilisateurs. Bien que rendre les systèmes d'IA plus intelligents ou capables puisse aider à les rendre plus alignés, cela n'est ni nécessaire ni suffisant pour l'alignement. Cependant, être capable d'aligner les systèmes d'IA de pointe (par exemple les DNN) est d'une grande importance sociale afin d'éviter les comportements indésirables et dangereux des systèmes d'IA avancés. Sans progrès dans l'alignement de l'IA, les systèmes d'IA avancés pourraient poursuivre des objectifs contraires à la survie humaine, posant un risque existentiel («x-risque») pour l'humanité. L'un des principes fondamentaux de cette thèse est que l'obtention de hautes performances sur les repères d'apprentissage automatique est souvent un bon indicateur des capacités des systèmes d'IA, mais pas de leur alignement. En effet, les systèmes d'IA atteignent souvent des performances élevées de manière inattendue, ce qui révèle les limites de nos mesures de performance et, plus généralement, de nos techniques pour spécifier nos intentions. L'apprentissage des intentions humaines à l'aide des DNN est quelque peu prometteur, mais les DNN sont toujours enclins à apprendre à résoudre des tâches en utilisant des concepts de «caractéristiques» très différents de ceux qui sont saillants pour les humains. En effet, c'est une source majeure de leur mauvaise généralisation sur les données hors distribution. En comprenant mieux les succès et les échecs de la généralisation DNN et les méthodes actuelles de spécification de nos intentions, nous visons à progresser vers des systèmes d'IA basés sur l'apprentissage en profondeur qui sont capables de comprendre les intentions des utilisateurs et d'agir en conséquence
    • …
    corecore