23 research outputs found

    XGV-BERT: Leveraging Contextualized Language Model and Graph Neural Network for Efficient Software Vulnerability Detection

    Full text link
    With the advancement of deep learning (DL) in various fields, there are many attempts to reveal software vulnerabilities by data-driven approach. Nonetheless, such existing works lack the effective representation that can retain the non-sequential semantic characteristics and contextual relationship of source code attributes. Hence, in this work, we propose XGV-BERT, a framework that combines the pre-trained CodeBERT model and Graph Neural Network (GCN) to detect software vulnerabilities. By jointly training the CodeBERT and GCN modules within XGV-BERT, the proposed model leverages the advantages of large-scale pre-training, harnessing vast raw data, and transfer learning by learning representations for training data through graph convolution. The research results demonstrate that the XGV-BERT method significantly improves vulnerability detection accuracy compared to two existing methods such as VulDeePecker and SySeVR. For the VulDeePecker dataset, XGV-BERT achieves an impressive F1-score of 97.5%, significantly outperforming VulDeePecker, which achieved an F1-score of 78.3%. Again, with the SySeVR dataset, XGV-BERT achieves an F1-score of 95.5%, surpassing the results of SySeVR with an F1-score of 83.5%

    Fed-LSAE: Thwarting Poisoning Attacks against Federated Cyber Threat Detection System via Autoencoder-based Latent Space Inspection

    Full text link
    The significant rise of security concerns in conventional centralized learning has promoted federated learning (FL) adoption in building intelligent applications without privacy breaches. In cybersecurity, the sensitive data along with the contextual information and high-quality labeling in each enterprise organization play an essential role in constructing high-performance machine learning (ML) models for detecting cyber threats. Nonetheless, the risks coming from poisoning internal adversaries against FL systems have raised discussions about designing robust anti-poisoning frameworks. Whereas defensive mechanisms in the past were based on outlier detection, recent approaches tend to be more concerned with latent space representation. In this paper, we investigate a novel robust aggregation method for FL, namely Fed-LSAE, which takes advantage of latent space representation via the penultimate layer and Autoencoder to exclude malicious clients from the training process. The experimental results on the CIC-ToN-IoT and N-BaIoT datasets confirm the feasibility of our defensive mechanism against cutting-edge poisoning attacks for developing a robust FL-based threat detector in the context of IoT. More specifically, the FL evaluation witnesses an upward trend of approximately 98% across all metrics when integrating with our Fed-LSAE defense

    Some algorithms to solve a bi-objectives problem for team selection

    Get PDF
    In real life, many problems are instances of combinatorial optimization. Cross-functional team selection is one of the typical issues. The decision-maker has to select solutions among (kh) solutions in the decision space, where k is the number of all candidates, and h is the number of members in the selected team. This paper is our continuing work since 2018; here, we introduce the completed version of the Min Distance to the Boundary model (MDSB) that allows access to both the "deep" and "wide" aspects of the selected team. The compromise programming approach enables decision-makers to ignore the parameters in the decision-making process. Instead, they point to the one scenario they expect. The aim of model construction focuses on finding the solution that matched the most to the expectation. We develop two algorithms: one is the genetic algorithm and another based on the philosophy of DC programming (DC) and its algorithm (DCA) to find the optimal solution. We also compared the introduced algorithms with the MIQP-CPLEX search algorithm to show their effectiveness

    On the Effectiveness of Adversarial Samples against Ensemble Learning-based Windows PE Malware Detectors

    Full text link
    Recently, there has been a growing focus and interest in applying machine learning (ML) to the field of cybersecurity, particularly in malware detection and prevention. Several research works on malware analysis have been proposed, offering promising results for both academic and practical applications. In these works, the use of Generative Adversarial Networks (GANs) or Reinforcement Learning (RL) can aid malware creators in crafting metamorphic malware that evades antivirus software. In this study, we propose a mutation system to counteract ensemble learning-based detectors by combining GANs and an RL model, overcoming the limitations of the MalGAN model. Our proposed FeaGAN model is built based on MalGAN by incorporating an RL model called the Deep Q-network anti-malware Engines Attacking Framework (DQEAF). The RL model addresses three key challenges in performing adversarial attacks on Windows Portable Executable malware, including format preservation, executability preservation, and maliciousness preservation. In the FeaGAN model, ensemble learning is utilized to enhance the malware detector's evasion ability, with the generated adversarial patterns. The experimental results demonstrate that 100\% of the selected mutant samples preserve the format of executable files, while certain successes in both executability preservation and maliciousness preservation are achieved, reaching a stable success rate

    VNHSGE: VietNamese High School Graduation Examination Dataset for Large Language Models

    Full text link
    The VNHSGE (VietNamese High School Graduation Examination) dataset, developed exclusively for evaluating large language models (LLMs), is introduced in this article. The dataset, which covers nine subjects, was generated from the Vietnamese National High School Graduation Examination and comparable tests. 300 literary essays have been included, and there are over 19,000 multiple-choice questions on a range of topics. The dataset assesses LLMs in multitasking situations such as question answering, text generation, reading comprehension, visual question answering, and more by including both textual data and accompanying images. Using ChatGPT and BingChat, we evaluated LLMs on the VNHSGE dataset and contrasted their performance with that of Vietnamese students to see how well they performed. The results show that ChatGPT and BingChat both perform at a human level in a number of areas, including literature, English, history, geography, and civics education. They still have space to grow, though, especially in the areas of mathematics, physics, chemistry, and biology. The VNHSGE dataset seeks to provide an adequate benchmark for assessing the abilities of LLMs with its wide-ranging coverage and variety of activities. We intend to promote future developments in the creation of LLMs by making this dataset available to the scientific community, especially in resolving LLMs' limits in disciplines involving mathematics and the natural sciences.Comment: 74 pages, 44 figure

    Awareness and preparedness of healthcare workers against the first wave of the COVID-19 pandemic: A cross-sectional survey across 57 countries.

    Get PDF
    BACKGROUND: Since the COVID-19 pandemic began, there have been concerns related to the preparedness of healthcare workers (HCWs). This study aimed to describe the level of awareness and preparedness of hospital HCWs at the time of the first wave. METHODS: This multinational, multicenter, cross-sectional survey was conducted among hospital HCWs from February to May 2020. We used a hierarchical logistic regression multivariate analysis to adjust the influence of variables based on awareness and preparedness. We then used association rule mining to identify relationships between HCW confidence in handling suspected COVID-19 patients and prior COVID-19 case-management training. RESULTS: We surveyed 24,653 HCWs from 371 hospitals across 57 countries and received 17,302 responses from 70.2% HCWs overall. The median COVID-19 preparedness score was 11.0 (interquartile range [IQR] = 6.0-14.0) and the median awareness score was 29.6 (IQR = 26.6-32.6). HCWs at COVID-19 designated facilities with previous outbreak experience, or HCWs who were trained for dealing with the SARS-CoV-2 outbreak, had significantly higher levels of preparedness and awareness (p<0.001). Association rule mining suggests that nurses and doctors who had a 'great-extent-of-confidence' in handling suspected COVID-19 patients had participated in COVID-19 training courses. Male participants (mean difference = 0.34; 95% CI = 0.22, 0.46; p<0.001) and nurses (mean difference = 0.67; 95% CI = 0.53, 0.81; p<0.001) had higher preparedness scores compared to women participants and doctors. INTERPRETATION: There was an unsurprising high level of awareness and preparedness among HCWs who participated in COVID-19 training courses. However, disparity existed along the lines of gender and type of HCW. It is unknown whether the difference in COVID-19 preparedness that we detected early in the pandemic may have translated into disproportionate SARS-CoV-2 burden of disease by gender or HCW type

    Perpendicular magnetic anisotropy and the magnetization process in CoFeB/Pd multilayer films

    No full text
    Perpendicular magnetic anisotropy (PMA) and dynamic magnetization reversal process in [CoFeB tt nm/Pd 1.0 nm]n_n (tt = 0.4, 0.6, 0.8, 1.0, and 1.2 nm; nn = 2 - 20) multilayer films have been studied by means of magnetic hysteresis and Kerr effect measurements. Strong and controllable PMA with an effective uniaxial anisotropy up to 7.7×\times 106^6 J.m−3^{-3} and a saturation magnetization as low as 200 emu/cc are achieved. Surface/interfacial anisotropy of CoFeB/Pd interfaces, the main contribution to the PMA, is separated from the effective uniaxial anisotropy of the films, and appears to increase with the number of the CoFeB/Pd bilayers. Observation of the magnetic domains during a magnetization reversal process using polar magneto-optical Kerr microscopy shows the detailed behavior of nucleation and displacement of the domain walls.Comment: 18 pages, 5 figures, original research articl

    AAGAN: Android Malware Generation System Based on Generative Adversarial Network

    No full text
    With the rapid evolution of mobile malware, especially Android malware, machine learning (ML)-based Android malware detection systems have drawn massive attention. Although ML algorithms have recently led to many vital breakthroughs in malware detection, they are still particularly vulnerable to adversarial example (AE) attacks. By applying small random perturbations (e.g. simply modifying different kinds of features from the application’s manifest file), an AE attack can cause the misclassification of legitimate applications. This paper proposes AAGAN, an automated Android malware generation system based on Generative Adversarial Networks (GAN) that can successfully deceive current ML detectors. Our experiment results indicate that AEs generated by our system can flip the prediction of the state-of-the-art detection algorithms in 99% of cases using a real-world dataset. To defend against AE attacks, we improve the robustness of our detection system by alternatively retraining with these newly generated AEs. Surprisingly, after retraining five times, AAGAN can achieve an 89% success rate in bypassing our malware detection system
    corecore