622,286 research outputs found

    Network Intrusion Detection with Two-Phased Hybrid Ensemble Learning and Automatic Feature Selection

    Get PDF
    The use of network connected devices has grown exponentially in recent years revolutionizing our daily lives. However, it has also attracted the attention of cybercriminals making the attacks targeted towards these devices increase not only in numbers but also in sophistication. To detect such attacks, a Network Intrusion Detection System (NIDS) has become a vital component in network applications. However, network devices produce large scale high-dimensional data which makes it difficult to accurately detect various known and unknown attacks. Moreover, the complex nature of network data makes the feature selection process of a NIDS a challenging task. In this study, we propose a machine learning based NIDS with Two-phased Hybrid Ensemble learning and Automatic Feature Selection. The proposed framework leverages four different machine learning classifiers to perform automatic feature selection based on their ability to detect the most significant features. The two-phased hybrid ensemble learning algorithm consists of two learning phases, with the first phase constructed using classifiers built from an adaptation of the One-vs-One framework, and the second phase constructed using classifiers built from combinations of attack classes. The proposed framework was evaluated on two well-referenced datasets for both wired and wireless applications, and the results demonstrate that the two-phased ensemble learning framework combined with the automatic feature selection engine has superior attack detection capability compared to other similar studies found in the literature

    MvFS: Multi-view Feature Selection for Recommender System

    Full text link
    Feature selection, which is a technique to select key features in recommender systems, has received increasing research attention. Recently, Adaptive Feature Selection (AdaFS) has shown remarkable performance by adaptively selecting features for each data instance, considering that the importance of a given feature field can vary significantly across data. However, this method still has limitations in that its selection process could be easily biased to major features that frequently occur. To address these problems, we propose Multi-view Feature Selection (MvFS), which selects informative features for each instance more effectively. Most importantly, MvFS employs a multi-view network consisting of multiple sub-networks, each of which learns to measure the feature importance of a part of data with different feature patterns. By doing so, MvFS mitigates the bias problem towards dominant patterns and promotes a more balanced feature selection process. Moreover, MvFS adopts an effective importance score modeling strategy which is applied independently to each field without incurring dependency among features. Experimental results on real-world datasets demonstrate the effectiveness of MvFS compared to state-of-the-art baselines.Comment: CIKM 202

    A Comparative Performance Analysis of Explainable Machine Learning Models With And Without RFECV Feature Selection Technique Towards Ransomware Classification

    Full text link
    Ransomware has emerged as one of the major global threats in recent days. The alarming increasing rate of ransomware attacks and new ransomware variants intrigue the researchers in this domain to constantly examine the distinguishing traits of ransomware and refine their detection or classification strategies. Among the broad range of different behavioral characteristics, the trait of Application Programming Interface (API) calls and network behaviors have been widely utilized as differentiating factors for ransomware detection, or classification. Although many of the prior approaches have shown promising results in detecting and classifying ransomware families utilizing these features without applying any feature selection techniques, feature selection, however, is one of the potential steps toward an efficient detection or classification Machine Learning model because it reduces the probability of overfitting by removing redundant data, improves the model's accuracy by eliminating irrelevant features, and therefore reduces training time. There have been a good number of feature selection techniques to date that are being used in different security scenarios to optimize the performance of the Machine Learning models. Hence, the aim of this study is to present the comparative performance analysis of widely utilized Supervised Machine Learning models with and without RFECV feature selection technique towards ransomware classification utilizing the API call and network traffic features. Thereby, this study provides insight into the efficiency of the RFECV feature selection technique in the case of ransomware classification which can be used by peers as a reference for future work in choosing the feature selection technique in this domain.Comment: arXiv admin note: text overlap with arXiv:2210.1123

    Sentiment Analysis on Social Media Via Machine Learning

    Get PDF
    Social media are shaping users\u27 attitudes and behaviors through spreading information anytime and anywhere. Monitoring user opinions on social media is an effective solution to measure users\u27 preferences towards brands or events. Currently, supervised machine learning-based methods dominate this area. However, as far as we know, there is no comprehensive comparison of performances of different models to figure out which model will be better for individual datasets. The focus of this thesis is to compare the performance of different supervised machine learning models. In detail, we built six classifiers, including support vector machine, random forest, neural network, Adaboost, decision tree, and Naive Bayes on two datasets and compare their performance. Furthermore, we introduced feature selection to remove unrelated attributes to preprocess the data and compare performance by building classifiers on the preprocessed data. Experimental results show that without feature selection, there is no significant difference in the performance. After feature selection, random forest outperformed other classifiers

    Improving gender classification with feature selection in forensic anthropology

    Get PDF
    Gender classification has been one of the most vital tasks in a real world problem especially when it comes to death investigations. Developing a biological profile of an individual is a crucial step in forensic anthropology process as for the identification of gender. Forensic anthropologists employ the principle of skeleton remains to produce a biological profile. Different parts of skeleton contains different features that will contribute to gender classification. However, not all the features could contribute to gender classification and affect to a low accuracy of gender classification. Therefore, feature selection method is applied to identify the most significant features for gender classification. This paper presents the implementation of feature selection approaches which are Particle Swarm Optimization (PSO) and Harmony Search (HS) algorithm using three different dataset from Goldman Osteometric Dataset, Osteological Collection and George Murray Black Collection. All three dataset contains 4081 samples of metrics measurement and have gone through the process of classification by using Back Propagation Neural Network (BPNN) and Naïve Bayes classifier. The main scope of this paper is to identify the effect of feature selection towards gender classification. The result shows that the accuracy of gender classification for every dataset increased when feature selection is applied to the dataset. Among all the skeleton parts in this experiment, clavicle part achieved the highest increment of accuracy rate which is from 89.76% to 96.06% for PSO algorithm and 96.32% for HS

    Context-aware multi-attribute decision multi - attribute decision making for radio access technology selection in ultra dense network

    Get PDF
    Ultra Dense Network (UDN) is the extreme densification of heterogeneous Radio Access Technology (RAT) that is deployed closely in coordinated or uncoordinated manner. The densification of RAT forms an overlapping zone of signal coverage leading to the frequent service handovers among the RAT, thus degrading overall system performance. The current RAT selection approach is biased towards network-centric criteria pertaining to signal strength. However, the paradigm shift from network-centric to user-centric approach necessitates a multi-criteria selection process, with methodology relating to both network and user preferences in the context of future generation networks. Hence, an effective selection approach is required to avoid unnecessary handovers in RAT. The main aim of this study is to propose the Context-aware Multiattribute decision making for RAT (CMRAT) selection for investigating the need to choose a new RAT and further determine the best amongst the available methods. The CMRAT consists of two mechanisms, namely the Context-aware Analytical Hierarchy Process (CAHP) and Context-aware Technique for Order Preference by Similarity to an Ideal Solution (CTOPSIS). The CAHP mechanism measures the need to switch from the current RAT, while CTOPSIS aids in decision making to choose the best target RAT. A series of experimental studies were conducted to validate the effectiveness of CMRAT for achieving improved system performance. The investigation utilises shopping mall and urban dense network scenarios to evaluate the performance of RAT selection through simulation. The findings demonstrated that the CMRAT approach reduces delay and the number of handovers leading to an improvement of throughput and packet delivery ratio when compared to that of the commonly used A2A4-RSRQ approach. The CMRAT approach is effective in the RAT selection within UDN environment, thus supporting heterogeneous RAT deployment in future 5G networks. With context-aware selection, the user-centric feature is also emphasized

    The evolution of complex gene regulation by low specificity binding sites

    Full text link
    Transcription factor binding sites vary in their specificity, both within and between species. Binding specificity has a strong impact on the evolution of gene expression, because it determines how easily regulatory interactions are gained and lost. Nevertheless, we have a relatively poor understanding of what evolutionary forces determine the specificity of binding sites. Here we address this question by studying regulatory modules composed of multiple binding sites. Using a population-genetic model, we show that more complex regulatory modules, composed of a greater number of binding sites, must employ binding sites that are individually less specific, compared to less complex regulatory modules. This effect is extremely general, and it hold regardless of the regulatory logic of a module. We attribute this phenomenon to the inability of stabilising selection to maintain highly specific sites in large regulatory modules. Our analysis helps to explain broad empirical trends in the yeast regulatory network: those genes with a greater number of transcriptional regulators feature by less specific binding sites, and there is less variance in their specificity, compared to genes with fewer regulators. Likewise, our results also help to explain the well-known trend towards lower specificity in the transcription factor binding sites of higher eukaryotes, which perform complex regulatory tasks, compared to prokaryotes
    corecore