131 research outputs found

    Massive vectors and loop observables: the g2g-2 case

    Get PDF
    We discuss the use of massive vectors for the interpretation of some recent experimental anomalies, with special attention to the muon g2g-2. We restrict our discussion to the case where the massive vector is embedded into a spontaneously broken gauge symmetry, so that the predictions are not affected by the choice of an arbitrary energy cut-off. Extended gauge symmetries, however, typically impose strong constraints on the mass of the new vector boson and for the muon g2g-2 they basically rule out, barring the case of abelian gauge extensions, the explanation of the discrepancy in terms of a single vector extension of the standard model. We finally comment on the use of massive vectors for BB-meson decay and di-photon anomalies.Comment: 25 pages, 1 figure. References added, to appear in JHE

    Practical Attacks on Machine Learning: A Case Study on Adversarial Windows Malware

    Get PDF
    While machine learning is vulnerable to adversarial examples, it still lacks systematic procedures and tools for evaluating its security in different application contexts. In this article, we discuss how to develop automated and scalable security evaluations of machine learning using practical attacks, reporting a use case on Windows malware detection

    Modeling lens potentials with continuous neural fields in galaxy-scale strong lenses

    Full text link
    Strong gravitational lensing is a unique observational tool for studying the dark and luminous mass distribution both within and between galaxies. Given the presence of substructures, current strong lensing observations demand more complex mass models than smooth analytical profiles, such as power-law ellipsoids. In this work, we introduce a continuous neural field to predict the lensing potential at any position throughout the image plane, allowing for a nearly model-independent description of the lensing mass. We apply our method on simulated Hubble Space Telescope imaging data containing different types of perturbations to a smooth mass distribution: a localized dark subhalo, a population of subhalos, and an external shear perturbation. Assuming knowledge of the source surface brightness, we use the continuous neural field to model either the perturbations alone or the full lensing potential. In both cases, the resulting model is able to fit the imaging data, and we are able to accurately recover the properties of both the smooth potential and of the perturbations. Unlike many other deep learning methods, ours explicitly retains lensing physics (i.e., the lens equation) and introduces high flexibility in the model only where required, namely, in the lens potential. Moreover, the neural network does not require pre-training on large sets of labelled data and predicts the potential from the single observed lensing image. Our model is implemented in the fully differentiable lens modeling code Herculens

    Explaining Machine Learning DGA Detectors from DNS Traffic Data

    Get PDF
    One of the most common causes of lack of continuity of online systems stems from a widely popular Cyber Attack known as Distributed Denial of Service (DDoS), in which a network of infected devices (botnet) gets exploited to flood the computational capacity of services through the commands of an attacker. This attack is made by leveraging the Domain Name System (DNS) technology through Domain Generation Algorithms (DGAs), a stealthy connection strategy that yet leaves suspicious data patterns. To detect such threats, advances in their analysis have been made. For the majority, they found Machine Learning (ML) as a solution, which can be highly effective in analyzing and classifying massive amounts of data. Although strongly performing, ML models have a certain degree of obscurity in their decision-making process. To cope with this problem, a branch of ML known as Explainable ML tries to break down the black-box nature of classifiers and make them interpretable and human-readable. This work addresses the problem of Explainable ML in the context of botnet and DGA detection, which at the best of our knowledge, is the first to concretely break down the decisions of ML classifiers when devised for botnet/DGA detection, therefore providing global and local explanations

    Robust Machine Learning for Malware Detection over Time

    Get PDF
    The presence and persistence of Android malware is an on-going threat that plagues this information era, and machine learning technologies are now extensively used to deploy more effective detectors that can block the majority of these malicious programs. However, these algorithms have not been developed to pursue the natural evolution of malware, and their performances significantly degrade over time because of such concept-drift. Currently, state-of-the-art techniques only focus on detecting the presence of such drift, or they address it by relying on frequent updates of models. Hence, there is a lack of knowledge regarding the cause of the concept drift, and ad-hoc solutions that can counter the passing of time are still underinvestigated. In this work, we commence to address these issues as we propose (i) a drift-analysis framework to identify which characteristics of data are causing the drift, and (ii) SVM-CB, a time-aware classifier that leverages the drift-analysis information to slow down the performance drop. We highlight the efficacy of our contribution by comparing its degradation over time with a state-of-the-art classifier, and we show that SVM-CB better withstand the distribution changes that naturally characterizes the malware domain. We conclude by discussing the limitations of our approach and how our contribution can be taken as a first step towards more time-resistant classifiers that not only tackle, but also understand the concept drift that affect data

    Living-off-The-Land Reverse-Shell Detection by Informed Data Augmentation

    Full text link
    The living-off-the-land (LOTL) offensive methodologies rely on the perpetration of malicious actions through chains of commands executed by legitimate applications, identifiable exclusively by analysis of system logs. LOTL techniques are well hidden inside the stream of events generated by common legitimate activities, moreover threat actors often camouflage activity through obfuscation, making them particularly difficult to detect without incurring in plenty of false alarms, even using machine learning. To improve the performance of models in such an harsh environment, we propose an augmentation framework to enhance and diversify the presence of LOTL malicious activity inside legitimate logs. Guided by threat intelligence, we generate a dataset by injecting attack templates known to be employed in the wild, further enriched by malleable patterns of legitimate activities to replicate the behavior of evasive threat actors. We conduct an extensive ablation study to understand which models better handle our augmented dataset, also manipulated to mimic the presence of model-agnostic evasion and poisoning attacks. Our results suggest that augmentation is needed to maintain high-predictive capabilities, robustness to attack is achieved through specific hardening techniques like adversarial training, and it is possible to deploy near-real-time models with almost-zero false alarms

    Harnessing Synthetic Datasets: The Role of Shape Bias in Deep Neural Network Generalization

    Full text link
    Recent advancements in deep learning have been primarily driven by the use of large models trained on increasingly vast datasets. While neural scaling laws have emerged to predict network performance given a specific level of computational resources, the growing demand for expansive datasets raises concerns. To address this, a new research direction has emerged, focusing on the creation of synthetic data as a substitute. In this study, we investigate how neural networks exhibit shape bias during training on synthetic datasets, serving as an indicator of the synthetic data quality. Specifically, our findings indicate three key points: (1) Shape bias varies across network architectures and types of supervision, casting doubt on its reliability as a predictor for generalization and its ability to explain differences in model recognition compared to human capabilities. (2) Relying solely on shape bias to estimate generalization is unreliable, as it is entangled with diversity and naturalism. (3) We propose a novel interpretation of shape bias as a tool for estimating the diversity of samples within a dataset. Our research aims to clarify the implications of using synthetic data and its associated shape bias in deep learning, addressing concerns regarding generalization and dataset quality

    Statistical meta-analysis of presentation attacks for secure multibiometric systems

    Get PDF
    Prior work has shown that multibiometric systems are vulnerable to presentation attacks, assuming that their matching score distribution is identical to that of genuine users, without fabricating any fake trait. We have recently shown that this assumption is not representative of current fingerprint and face presentation attacks, leading one to overestimate the vulnerability of multibiometric systems, and to design less effective fusion rules. In this paper, we overcome these limitations by proposing a statistical meta-model of face and fingerprint presentation attacks that characterizes a wider family of fake score distributions, including distributions of known and, potentially, unknown attacks. This allows us to perform a thorough security evaluation of multibiometric systems against presentation attacks, quantifying how their vulnerability may vary also under attacks that are different from those considered during design, through an uncertainty analysis. We empirically show that our approach can reliably predict the performance of multibiometric systems even under never-before-seen face and fingerprint presentation attacks, and that the secure fusion rules designed using our approach can exhibit an improved trade-off between the performance in the absence and in the presence of attack. We finally argue that our method can be extended to other biometrics besides faces and fingerprints

    Explaining Vulnerabilities of Deep Learning to Adversarial Malware Binaries

    Get PDF
    Recent work has shown that deep-learning algorithms for malware detection are also susceptible to adversarial examples, i.e., carefully-crafted perturbations to input malware that enable misleading classification. Although this has questioned their suitability for this task, it is not yet clear why such algorithms are easily fooled also in this particular application domain. In this work, we take a first step to tackle this issue by leveraging explainable machine-learning algorithms developed to interpret the black-box decisions of deep neural networks. In particular, we use an explainable technique known as feature attribution to identify the most influential input features contributing to each decision, and adapt it to provide meaningful explanations to the classification of malware binaries. In this case, we find that a recently-proposed convolutional neural network does not learn any meaningful characteristic for malware detection from the data and text sections of executable files, but rather tends to learn to discriminate between benign and malware samples based on the characteristics found in the file header. Based on this finding, we propose a novel attack algorithm that generates adversarial malware binaries by only changing few tens of bytes in the file header. With respect to the other state-of-the-art attack algorithms, our attack does not require injecting any padding bytes at the end of the file, and it is much more efficient, as it requires manipulating much fewer bytes

    Adversarial ModSecurity: Countering Adversarial SQL Injections with Robust Machine Learning

    Full text link
    ModSecurity is widely recognized as the standard open-source Web Application Firewall (WAF), maintained by the OWASP Foundation. It detects malicious requests by matching them against the Core Rule Set, identifying well-known attack patterns. Each rule in the CRS is manually assigned a weight, based on the severity of the corresponding attack, and a request is detected as malicious if the sum of the weights of the firing rules exceeds a given threshold. In this work, we show that this simple strategy is largely ineffective for detecting SQL injection (SQLi) attacks, as it tends to block many legitimate requests, while also being vulnerable to adversarial SQLi attacks, i.e., attacks intentionally manipulated to evade detection. To overcome these issues, we design a robust machine learning model, named AdvModSec, which uses the CRS rules as input features, and it is trained to detect adversarial SQLi attacks. Our experiments show that AdvModSec, being trained on the traffic directed towards the protected web services, achieves a better trade-off between detection and false positive rates, improving the detection rate of the vanilla version of ModSecurity with CRS by 21%. Moreover, our approach is able to improve its adversarial robustness against adversarial SQLi attacks by 42%, thereby taking a step forward towards building more robust and trustworthy WAFs
    corecore