Search CORE

13 research outputs found

Z2F: Heterogeneous graph-based Android malware detection.

Author: Nurbor Luktarhan
Ziwei Ma
Publication venue: Public Library of Science (PLoS)
Publication date: 01/01/2024
Field of study

Android malware is becoming more common, and its invasion of smart devices has brought immeasurable losses to people's lives. Most existing Android malware detection methods extract Android features from the original application files without considering the high-order hidden information behind them, but these hidden information can reflect malicious behaviors. To solve this problem, this paper proposes Z2F, a detection framework based on multidimensional Android feature extraction and graph neural networks for Android applications. Z2F first extracts seven types of Android features from the original Android application and then embeds them into a heterogeneous graph. On this basis, we design 12 kinds of meta-structures to analyze different semantic spaces of heterogeneous graphs, mine high-order hidden semantic information, and adopt a multi-layer graph attention mechanism to iteratively embed and update information. In this paper, a total of 14429 Android applications were detected and 1039726 Android features were extracted, with a detection accuracy of 99.7%

Directory of Open Access Journals

Android Malware Detection Using TCN with Bytecode Image

Author: Ding
Lu
Luktarhan
Zhang
Publication venue: 'MDPI AG'
Publication date: 22/06/2021
Field of study

With the rapid increase in the number of Android malware, the image-based analysis method has become an effective way to defend against symmetric encryption and confusing malware. At present, the existing Android malware bytecode image detection method, based on a convolution neural network (CNN), relies on a single DEX file feature and requires a large amount of computation. To solve these problems, we combine the visual features of the XML file with the data section of the DEX file for the first time, and propose a new Android malware detection model, based on a temporal convolution network (TCN). First, four gray-scale image datasets with four different combinations of texture features are created by combining XML files and DEX files. Then the image size is unified and input to the designed neural network with three different convolution methods for experimental validation. The experimental results show that adding XML files is beneficial for Android malware detection. The detection accuracy of the TCN model is 95.44%, precision is 95.45%, recall rate is 95.45%, and F1-Score is 95.44%. Compared with other methods based on the traditional CNN model or lightweight MobileNetV2 model, the method proposed in this paper, based on the TCN model, can effectively utilize bytecode image sequence features, improve the accuracy of detecting Android malware and reduce its computation

Multidisciplinary Digital Publishing Institute

LogLS: Research on System Log Anomaly Detection Method Based on Dual LSTM

Author: Dan Lv
Nurbol Luktarhan
Yiyong Chen
Publication venue: 'MDPI AG'
Publication date: 24/02/2022
Field of study

System logs record the status and important events of the system at different time periods. They are important resources for administrators to understand and manage the system. Detecting anomalies in logs is critical to identifying system faults in time. However, with the increasing size and complexity of today’s software systems, the number of logs has exploded. In many cases, the traditional manual log-checking method becomes impractical and time-consuming. On the other hand, existing automatic log anomaly detection methods are error-prone and often use indices or log templates. In this work, we propose LogLS, a system log anomaly detection method based on dual long short-term memory (LSTM) with symmetric structure, which regarded the system log as a natural-language sequence and modeled the log according to the preorder relationship and postorder relationship. LogLS is optimized based on the DeepLog method to solve the problem of poor prediction performance of LSTM on long sequences. By providing a feedback mechanism, it implements the prediction of logs that do not appear. To evaluate LogLS, we conducted experiments on two real datasets, and the experimental results demonstrate the effectiveness of our proposed method in log anomaly detection

Multidisciplinary Digital Publishing Institute

An Efficient Intrusion Detection Method Based on LightGBM and Autoencoder

Author: Chaofei Tang
Nurbol Luktarhan
Yuxin Zhao
Publication venue: 'MDPI AG'
Publication date: 04/09/2020
Field of study

Due to the insidious characteristics of network intrusion behaviors, developing an efficient intrusion detection system is still a big challenge, especially in the era of big data where the number of traffic and the dimension of each traffic feature are high. Because of the shortcomings of traditional common machine learning algorithms in network intrusion detection, such as insufficient accuracy, a network intrusion detection system based on LightGBM and autoencoder (AE) is proposed. The LightGBM-AE model proposed in this paper includes three steps: data preprocessing, feature selection, and classification. The LightGBM-AE model adopts the LightGBM algorithm for feature selection, and then uses an autoencoder for training and detection. When a set of data containing network intrusion behaviors are inputted into an autoencoder, there is a large reconstruction error between the original input data and the reconstructed data obtained by the autoencoder, which provides a basis for intrusion detection. According to the reconstruction error, an appropriate threshold is set to distinguish symmetrically between normal behavior and attack behavior. The experiment is carried out on the NSL-KDD dataset and implemented using Pytorch. In addition to autoencoder, variational autoencoder (VAE) and denoising autoencoder (DAE) are also used for intrusion detection and are compared with existing machine learning algorithms such as Decision Tree, Random Forest, KNN, GBDT, and XGBoost. The evaluation is carried out through classification evaluation indexes such as accuracy, precision, recall, F1-score. The experimental results show that the method can efficiently separate the attack behavior from normal behavior according to the reconstruction error. Compared with other methods, the effectiveness and superiority of this method are verified

Multidisciplinary Digital Publishing Institute

BFCN: A Novel Classification Method of Encrypted Traffic Based on BERT and CNN

Author: Gaoqi Tian
Nurbol Luktarhan
Yangyang Song
Zhaolei Shi
Publication venue: 'MDPI AG'
Publication date: 19/01/2023
Field of study

With the speedy advancement of encryption technology and the exponential increase in applications, network traffic classification has become an increasingly important research topic. Existing methods for classifying encrypted traffic have certain limitations. For example, traditional approaches such as machine learning rely heavily on feature engineering, deep learning approaches are susceptible to the amount and distribution of labeled data, and pretrained models focus merely on the global traffic features while ignoring local features. To solve the above problem, we propose a BERT-based byte-level feature convolutional network (BFCN) model consisting of two novel modules. The first is a packet encoder module, in which we use the BERT pretrained encrypted traffic classification model to capture global traffic features through its attention mechanism; the second is a CNN module, which captures byte-level local features in the traffic through convolutional operations. The packet-level and byte-level features are concatenated as the traffic’s final representation, which can better represent encrypted traffic. Our approach achieves state-of-the-art performance on the publicly available ISCX-VPN dataset for the traffic service and application identification task, achieving F1 scores of 99.11% and 99.41%, respectively, on these two tasks. The experimental results demonstrate that our method further improves the performance of encrypted traffic classification

Multidisciplinary Digital Publishing Institute

Malcertificate: Research and Implementation of a Malicious Certificate Detection Algorithm Based on GCN

Author: Jingru Liu
Nurbol Luktarhan
Wenjie Yu
Yuyuan Chang
Publication venue: MDPI AG
Publication date: 01/04/2022
Field of study

Encryption is widely used to ensure the security and confidentiality of information. Because people trust in encryption technology, a series of attack methods based on certificates have been derived. Malicious certificates protect many malicious behaviors and threaten data security. To counter this threat, machine learning algorithms are widely used in malicious certificate detection. However, the detection efficiency of such algorithms largely depends on whether the extracted features can effectively represent the data. In contrast, graph convolutional networks (GCNs) can automatically extract useful features. GCNs are powerful at fitting graph data, which can improve the effectiveness of learning systems by efficiently embedding prior knowledge in an end-to-end manner. In this paper, we propose an algorithm for detecting malicious digital certificates with GCNs. Firstly, we transform the digital certificate dataset with pem document structure into a corpus of graph structure based on attribute co-occurrence and document attribute relations. Then, we put the graph structure certificate dataset into a GCN for training. The results of the experiment show that GCN is very effective in certificate classification and outperforms traditional machine learning algorithms and extant neural network algorithms. The accuracy of our algorithm to detect malicious certificates is 97.41%. This shows that our algorithm is very effective

Directory of Open Access Journals

A Novel Hierarchical Clustering Algorithm Based on Density Peaks for Complex Datasets

Author: Nurbol Luktarhan
Rong Zhou
Shengzhong Feng
Yong Zhang
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2018
Field of study

Clustering aims to differentiate objects from different groups (clusters) by similarities or distances between pairs of objects. Numerous clustering algorithms have been proposed to investigate what factors constitute a cluster and how to efficiently find them. The clustering by fast search and find of density peak algorithm is proposed to intuitively determine cluster centers and assign points to corresponding partitions for complex datasets. This method incorporates simple structure due to the noniterative logic and less few parameters; however, the guidelines for parameter selection and center determination are not explicit. To tackle these problems, we propose an improved hierarchical clustering method HCDP aiming to represent the complex structure of the dataset. A k-nearest neighbor strategy is integrated to compute the local density of each point, avoiding to select the nonnecessary global parameter dc and enables cluster smoothing and condensing. In addition, a new clustering evaluation approach is also introduced to extract a “flat” and “optimal” partition solution from the structure by adaptively computing the clustering stability. The proposed approach is conducted on some applications with complex datasets, where the results demonstrate that the novel method outperforms its counterparts to a large extent

Directory of Open Access Journals

ETCNLog: A System Log Anomaly Detection Method Based on Efficient Channel Attention and Temporal Convolutional Network

Author: Jingru Liu
Nurbol Luktarhan
Qinglin Chen
Yuyuan Chang
Publication venue: 'MDPI AG'
Publication date: 01/04/2023
Field of study

The scale of the system and network applications is expanding, and higher requirements are being put forward for anomaly detection. The system log can record system states and significant operational events at different critical points. Therefore, using the system log for anomaly detection can help with system maintenance and avoid unnecessary loss. The system log has obvious timing characteristics, and the execution sequence of the system log has a certain dependency relationship. However, sometimes the length of sequence dependence is long. To handle the problem of longer sequence logs in anomaly detection, this paper proposes a system log anomaly detection method based on efficient channel attention and temporal convolutional network (ETCNLog). It builds a model by treating the system log as a natural language sequence. To handle longer sequence logs more effectively, ETCNLog uses the semantic and timing information of logs. It can automatically learn the importance of different log sequences and detect hidden dependencies within sequences to improve the accuracy of anomaly detection. We run extensive experiments on the actual public log dataset BGL. The experimental results show that the Precision and F1-score of ETCNLog reach 98.15% and 98.21%, respectively, both of which are better than the current anomaly detection methods

Directory of Open Access Journals

ICLSTM: Encrypted Traffic Service Identification Based on Inception-LSTM Neural Network

Author: Bei Lu
Chao Ding
Nurbol Luktarhan
Wenhui Zhang
Publication venue: 'MDPI AG'
Publication date: 17/06/2021
Field of study

The wide application of encryption technology has made traffic classification gradually become a major challenge in the field of network security. Traditional methods such as machine learning, which rely heavily on feature engineering and others, can no longer fully meet the needs of encrypted traffic classification. Therefore, we propose an Inception-LSTM(ICLSTM) traffic classification method in this paper to achieve encrypted traffic service identification. This method converts traffic data into common gray images, and then uses the constructed ICLSTM neural network to extract key features and perform effective traffic classification. To alleviate the problem of category imbalance, different weight parameters are set for each category separately in the training phase to make it more symmetrical for different categories of encrypted traffic, and the identification effect is more balanced and reasonable. The method is validated on the public ISCX 2016 dataset, and the results of five classification experiments show that the accuracy of the method exceeds 98% for both regular encrypted traffic service identification and VPN encrypted traffic service identification. At the same time, this deep learning-based classification method also greatly simplifies the difficulty of traffic feature extraction work

Multidisciplinary Digital Publishing Institute

Malcertificate: Research and Implementation of a Malicious Certificate Detection Algorithm Based on GCN

Author: Jingru Liu
Nurbol Luktarhan
Wenjie Yu
Yuyuan Chang
Publication venue: 'MDPI AG'
Publication date: 27/04/2022
Field of study

Multidisciplinary Digital Publishing Institute