622,286 research outputs found
Network Intrusion Detection with Two-Phased Hybrid Ensemble Learning and Automatic Feature Selection
The use of network connected devices has grown exponentially in recent years revolutionizing our daily lives. However, it has also attracted the attention of cybercriminals making the attacks targeted towards these devices increase not only in numbers but also in sophistication. To detect such attacks, a Network Intrusion Detection System (NIDS) has become a vital component in network applications. However, network devices produce large scale high-dimensional data which makes it difficult to accurately detect various known and unknown attacks. Moreover, the complex nature of network data makes the feature selection process of a NIDS a challenging task. In this study, we propose a machine learning based NIDS with Two-phased Hybrid Ensemble learning and Automatic Feature Selection. The proposed framework leverages four different machine learning classifiers to perform automatic feature selection based on their ability to detect the most significant features. The two-phased hybrid ensemble learning algorithm consists of two learning phases, with the first phase constructed using classifiers built from an adaptation of the One-vs-One framework, and the second phase constructed using classifiers built from combinations of attack classes. The proposed framework was evaluated on two well-referenced datasets for both wired and wireless applications, and the results demonstrate that the two-phased ensemble learning framework combined with the automatic feature selection engine has superior attack detection capability compared to other similar studies found in the literature
MvFS: Multi-view Feature Selection for Recommender System
Feature selection, which is a technique to select key features in recommender
systems, has received increasing research attention. Recently, Adaptive Feature
Selection (AdaFS) has shown remarkable performance by adaptively selecting
features for each data instance, considering that the importance of a given
feature field can vary significantly across data. However, this method still
has limitations in that its selection process could be easily biased to major
features that frequently occur. To address these problems, we propose
Multi-view Feature Selection (MvFS), which selects informative features for
each instance more effectively. Most importantly, MvFS employs a multi-view
network consisting of multiple sub-networks, each of which learns to measure
the feature importance of a part of data with different feature patterns. By
doing so, MvFS mitigates the bias problem towards dominant patterns and
promotes a more balanced feature selection process. Moreover, MvFS adopts an
effective importance score modeling strategy which is applied independently to
each field without incurring dependency among features. Experimental results on
real-world datasets demonstrate the effectiveness of MvFS compared to
state-of-the-art baselines.Comment: CIKM 202
A Comparative Performance Analysis of Explainable Machine Learning Models With And Without RFECV Feature Selection Technique Towards Ransomware Classification
Ransomware has emerged as one of the major global threats in recent days. The
alarming increasing rate of ransomware attacks and new ransomware variants
intrigue the researchers in this domain to constantly examine the
distinguishing traits of ransomware and refine their detection or
classification strategies. Among the broad range of different behavioral
characteristics, the trait of Application Programming Interface (API) calls and
network behaviors have been widely utilized as differentiating factors for
ransomware detection, or classification. Although many of the prior approaches
have shown promising results in detecting and classifying ransomware families
utilizing these features without applying any feature selection techniques,
feature selection, however, is one of the potential steps toward an efficient
detection or classification Machine Learning model because it reduces the
probability of overfitting by removing redundant data, improves the model's
accuracy by eliminating irrelevant features, and therefore reduces training
time. There have been a good number of feature selection techniques to date
that are being used in different security scenarios to optimize the performance
of the Machine Learning models. Hence, the aim of this study is to present the
comparative performance analysis of widely utilized Supervised Machine Learning
models with and without RFECV feature selection technique towards ransomware
classification utilizing the API call and network traffic features. Thereby,
this study provides insight into the efficiency of the RFECV feature selection
technique in the case of ransomware classification which can be used by peers
as a reference for future work in choosing the feature selection technique in
this domain.Comment: arXiv admin note: text overlap with arXiv:2210.1123
Sentiment Analysis on Social Media Via Machine Learning
Social media are shaping users\u27 attitudes and behaviors through spreading information anytime and anywhere. Monitoring user opinions on social media is an effective solution to measure users\u27 preferences towards brands or events. Currently, supervised machine learning-based methods dominate this area. However, as far as we know, there is no comprehensive comparison of performances of different models to figure out which model will be better for individual datasets. The focus of this thesis is to compare the performance of different supervised machine learning models. In detail, we built six classifiers, including support vector machine, random forest, neural network, Adaboost, decision tree, and Naive Bayes on two datasets and compare their performance. Furthermore, we introduced feature selection to remove unrelated attributes to preprocess the data and compare performance by building classifiers on the preprocessed data. Experimental results show that without feature selection, there is no significant difference in the performance. After feature selection, random forest outperformed other classifiers
Improving gender classification with feature selection in forensic anthropology
Gender classification has been one of the most vital tasks in a real world problem especially when it comes to death investigations. Developing a biological profile of an individual is a crucial step in forensic anthropology process as for the identification of gender. Forensic anthropologists employ the principle of skeleton remains to produce a biological profile. Different parts of skeleton contains different features that will contribute to gender classification. However, not all the features could contribute to gender classification and affect to a low accuracy of gender classification. Therefore, feature selection method is applied to identify the most significant features for gender classification. This paper presents the implementation of feature selection approaches which are Particle Swarm Optimization (PSO) and Harmony Search (HS) algorithm using three different dataset from Goldman Osteometric Dataset, Osteological Collection and George Murray Black Collection. All three dataset contains 4081 samples of metrics measurement and have gone through the process of classification by using Back Propagation Neural Network (BPNN) and Naïve Bayes classifier. The main scope of this paper is to identify the effect of feature selection towards gender classification. The result shows that the accuracy of gender classification for every dataset increased when feature selection is applied to the dataset. Among all the skeleton parts in this experiment, clavicle part achieved the highest increment of accuracy rate which is from 89.76% to 96.06% for PSO algorithm and 96.32% for HS
Context-aware multi-attribute decision multi - attribute decision making for radio access technology selection in ultra dense network
Ultra Dense Network (UDN) is the extreme densification of heterogeneous Radio Access
Technology (RAT) that is deployed closely in coordinated or uncoordinated manner. The densification of RAT forms an overlapping zone of signal coverage leading to the frequent service handovers among the RAT, thus degrading overall system performance. The current RAT selection approach is biased towards network-centric criteria pertaining to signal strength. However, the paradigm shift from network-centric to user-centric approach necessitates a multi-criteria selection process, with methodology
relating to both network and user preferences in the context of future generation networks. Hence, an effective selection approach is required to avoid unnecessary handovers in RAT. The main aim of this study is to propose the Context-aware Multiattribute decision making for RAT (CMRAT) selection for investigating the need to choose a new RAT and further determine the best amongst the available methods. The
CMRAT consists of two mechanisms, namely the Context-aware Analytical Hierarchy Process (CAHP) and Context-aware Technique for Order Preference by Similarity to an Ideal Solution (CTOPSIS). The CAHP mechanism measures the need to switch from the current RAT, while CTOPSIS aids in decision making to choose the best target RAT. A series of experimental studies were conducted to validate the effectiveness of CMRAT for achieving improved system performance. The investigation utilises shopping mall and urban dense network scenarios to evaluate the performance of RAT selection through simulation. The findings demonstrated that the CMRAT approach reduces delay and the number of handovers leading to an improvement of throughput and packet delivery ratio when compared to that of the commonly used A2A4-RSRQ approach. The CMRAT approach is effective in the RAT selection within UDN environment, thus supporting heterogeneous RAT deployment in future 5G networks. With context-aware selection, the user-centric feature is also emphasized
The evolution of complex gene regulation by low specificity binding sites
Transcription factor binding sites vary in their specificity, both within and
between species. Binding specificity has a strong impact on the evolution of
gene expression, because it determines how easily regulatory interactions are
gained and lost. Nevertheless, we have a relatively poor understanding of what
evolutionary forces determine the specificity of binding sites. Here we address
this question by studying regulatory modules composed of multiple binding
sites. Using a population-genetic model, we show that more complex regulatory
modules, composed of a greater number of binding sites, must employ binding
sites that are individually less specific, compared to less complex regulatory
modules. This effect is extremely general, and it hold regardless of the
regulatory logic of a module. We attribute this phenomenon to the inability of
stabilising selection to maintain highly specific sites in large regulatory
modules. Our analysis helps to explain broad empirical trends in the yeast
regulatory network: those genes with a greater number of transcriptional
regulators feature by less specific binding sites, and there is less variance
in their specificity, compared to genes with fewer regulators. Likewise, our
results also help to explain the well-known trend towards lower specificity in
the transcription factor binding sites of higher eukaryotes, which perform
complex regulatory tasks, compared to prokaryotes
- …