24 research outputs found
XGV-BERT: Leveraging Contextualized Language Model and Graph Neural Network for Efficient Software Vulnerability Detection
With the advancement of deep learning (DL) in various fields, there are many
attempts to reveal software vulnerabilities by data-driven approach.
Nonetheless, such existing works lack the effective representation that can
retain the non-sequential semantic characteristics and contextual relationship
of source code attributes. Hence, in this work, we propose XGV-BERT, a
framework that combines the pre-trained CodeBERT model and Graph Neural Network
(GCN) to detect software vulnerabilities. By jointly training the CodeBERT and
GCN modules within XGV-BERT, the proposed model leverages the advantages of
large-scale pre-training, harnessing vast raw data, and transfer learning by
learning representations for training data through graph convolution. The
research results demonstrate that the XGV-BERT method significantly improves
vulnerability detection accuracy compared to two existing methods such as
VulDeePecker and SySeVR. For the VulDeePecker dataset, XGV-BERT achieves an
impressive F1-score of 97.5%, significantly outperforming VulDeePecker, which
achieved an F1-score of 78.3%. Again, with the SySeVR dataset, XGV-BERT
achieves an F1-score of 95.5%, surpassing the results of SySeVR with an
F1-score of 83.5%
Fed-LSAE: Thwarting Poisoning Attacks against Federated Cyber Threat Detection System via Autoencoder-based Latent Space Inspection
The significant rise of security concerns in conventional centralized
learning has promoted federated learning (FL) adoption in building intelligent
applications without privacy breaches. In cybersecurity, the sensitive data
along with the contextual information and high-quality labeling in each
enterprise organization play an essential role in constructing high-performance
machine learning (ML) models for detecting cyber threats. Nonetheless, the
risks coming from poisoning internal adversaries against FL systems have raised
discussions about designing robust anti-poisoning frameworks. Whereas defensive
mechanisms in the past were based on outlier detection, recent approaches tend
to be more concerned with latent space representation. In this paper, we
investigate a novel robust aggregation method for FL, namely Fed-LSAE, which
takes advantage of latent space representation via the penultimate layer and
Autoencoder to exclude malicious clients from the training process. The
experimental results on the CIC-ToN-IoT and N-BaIoT datasets confirm the
feasibility of our defensive mechanism against cutting-edge poisoning attacks
for developing a robust FL-based threat detector in the context of IoT. More
specifically, the FL evaluation witnesses an upward trend of approximately 98%
across all metrics when integrating with our Fed-LSAE defense
Some algorithms to solve a bi-objectives problem for team selection
In real life, many problems are instances of combinatorial optimization. Cross-functional team selection is one of the typical issues. The decision-maker has to select solutions among (kh) solutions in the decision space, where k is the number of all candidates, and h is the number of members in the selected team. This paper is our continuing work since 2018; here, we introduce the completed version of the Min Distance to the Boundary model (MDSB) that allows access to both the "deep" and "wide" aspects of the selected team. The compromise programming approach enables decision-makers to ignore the parameters in the decision-making process. Instead, they point to the one scenario they expect. The aim of model construction focuses on finding the solution that matched the most to the expectation. We develop two algorithms: one is the genetic algorithm and another based on the philosophy of DC programming (DC) and its algorithm (DCA) to find the optimal solution. We also compared the introduced algorithms with the MIQP-CPLEX search algorithm to show their effectiveness
On the Effectiveness of Adversarial Samples against Ensemble Learning-based Windows PE Malware Detectors
Recently, there has been a growing focus and interest in applying machine
learning (ML) to the field of cybersecurity, particularly in malware detection
and prevention. Several research works on malware analysis have been proposed,
offering promising results for both academic and practical applications. In
these works, the use of Generative Adversarial Networks (GANs) or Reinforcement
Learning (RL) can aid malware creators in crafting metamorphic malware that
evades antivirus software. In this study, we propose a mutation system to
counteract ensemble learning-based detectors by combining GANs and an RL model,
overcoming the limitations of the MalGAN model. Our proposed FeaGAN model is
built based on MalGAN by incorporating an RL model called the Deep Q-network
anti-malware Engines Attacking Framework (DQEAF). The RL model addresses three
key challenges in performing adversarial attacks on Windows Portable Executable
malware, including format preservation, executability preservation, and
maliciousness preservation. In the FeaGAN model, ensemble learning is utilized
to enhance the malware detector's evasion ability, with the generated
adversarial patterns. The experimental results demonstrate that 100\% of the
selected mutant samples preserve the format of executable files, while certain
successes in both executability preservation and maliciousness preservation are
achieved, reaching a stable success rate
VNHSGE: VietNamese High School Graduation Examination Dataset for Large Language Models
The VNHSGE (VietNamese High School Graduation Examination) dataset, developed
exclusively for evaluating large language models (LLMs), is introduced in this
article. The dataset, which covers nine subjects, was generated from the
Vietnamese National High School Graduation Examination and comparable tests.
300 literary essays have been included, and there are over 19,000
multiple-choice questions on a range of topics. The dataset assesses LLMs in
multitasking situations such as question answering, text generation, reading
comprehension, visual question answering, and more by including both textual
data and accompanying images. Using ChatGPT and BingChat, we evaluated LLMs on
the VNHSGE dataset and contrasted their performance with that of Vietnamese
students to see how well they performed. The results show that ChatGPT and
BingChat both perform at a human level in a number of areas, including
literature, English, history, geography, and civics education. They still have
space to grow, though, especially in the areas of mathematics, physics,
chemistry, and biology. The VNHSGE dataset seeks to provide an adequate
benchmark for assessing the abilities of LLMs with its wide-ranging coverage
and variety of activities. We intend to promote future developments in the
creation of LLMs by making this dataset available to the scientific community,
especially in resolving LLMs' limits in disciplines involving mathematics and
the natural sciences.Comment: 74 pages, 44 figure
Awareness and preparedness of healthcare workers against the first wave of the COVID-19 pandemic: A cross-sectional survey across 57 countries.
BACKGROUND: Since the COVID-19 pandemic began, there have been concerns related to the preparedness of healthcare workers (HCWs). This study aimed to describe the level of awareness and preparedness of hospital HCWs at the time of the first wave. METHODS: This multinational, multicenter, cross-sectional survey was conducted among hospital HCWs from February to May 2020. We used a hierarchical logistic regression multivariate analysis to adjust the influence of variables based on awareness and preparedness. We then used association rule mining to identify relationships between HCW confidence in handling suspected COVID-19 patients and prior COVID-19 case-management training. RESULTS: We surveyed 24,653 HCWs from 371 hospitals across 57 countries and received 17,302 responses from 70.2% HCWs overall. The median COVID-19 preparedness score was 11.0 (interquartile range [IQR] = 6.0-14.0) and the median awareness score was 29.6 (IQR = 26.6-32.6). HCWs at COVID-19 designated facilities with previous outbreak experience, or HCWs who were trained for dealing with the SARS-CoV-2 outbreak, had significantly higher levels of preparedness and awareness (p<0.001). Association rule mining suggests that nurses and doctors who had a 'great-extent-of-confidence' in handling suspected COVID-19 patients had participated in COVID-19 training courses. Male participants (mean difference = 0.34; 95% CI = 0.22, 0.46; p<0.001) and nurses (mean difference = 0.67; 95% CI = 0.53, 0.81; p<0.001) had higher preparedness scores compared to women participants and doctors. INTERPRETATION: There was an unsurprising high level of awareness and preparedness among HCWs who participated in COVID-19 training courses. However, disparity existed along the lines of gender and type of HCW. It is unknown whether the difference in COVID-19 preparedness that we detected early in the pandemic may have translated into disproportionate SARS-CoV-2 burden of disease by gender or HCW type
Perpendicular magnetic anisotropy and the magnetization process in CoFeB/Pd multilayer films
Perpendicular magnetic anisotropy (PMA) and dynamic magnetization reversal
process in [CoFeB nm/Pd 1.0 nm] ( = 0.4, 0.6, 0.8, 1.0, and 1.2 nm;
= 2 - 20) multilayer films have been studied by means of magnetic
hysteresis and Kerr effect measurements. Strong and controllable PMA with an
effective uniaxial anisotropy up to 7.7 10 J.m and a
saturation magnetization as low as 200 emu/cc are achieved. Surface/interfacial
anisotropy of CoFeB/Pd interfaces, the main contribution to the PMA, is
separated from the effective uniaxial anisotropy of the films, and appears to
increase with the number of the CoFeB/Pd bilayers. Observation of the magnetic
domains during a magnetization reversal process using polar magneto-optical
Kerr microscopy shows the detailed behavior of nucleation and displacement of
the domain walls.Comment: 18 pages, 5 figures, original research articl
AAGAN: Android Malware Generation System Based on Generative Adversarial Network
With the rapid evolution of mobile malware, especially Android malware, machine learning (ML)-based Android malware detection systems have drawn massive attention. Although ML algorithms have recently led to many vital breakthroughs in malware detection, they are still particularly vulnerable to adversarial example (AE) attacks. By applying small random perturbations (e.g. simply modifying different kinds of features from the application’s manifest file), an AE attack can cause the misclassification of legitimate applications. This paper proposes AAGAN, an automated Android malware generation system based on Generative Adversarial Networks (GAN) that can successfully deceive current ML detectors. Our experiment results indicate that AEs generated by our system can flip the prediction of the state-of-the-art detection algorithms in 99% of cases using a real-world dataset. To defend against AE attacks, we improve the robustness of our detection system by alternatively retraining with these newly generated AEs. Surprisingly, after retraining five times, AAGAN can achieve an 89% success rate in bypassing our malware detection system