4 research outputs found

    Analysis of Feature Categories for Malware Visualization

    Get PDF
    It is important to know which features are more effective for certain visualization types. Furthermore, selecting an appropriate visualization tool plays a key role in descriptive, diagnostic, predictive and prescriptive analytics. Moreover, analyzing the activities of malicious scripts or codes is dependent on the extracted features. In this paper, the authors focused on reviewing and classifying the most common extracted features that have been used for malware visualization based on specified categories. This study examines the features categories and its usefulness for effective malware visualization. Additionally, it focuses on the common extracted features that have been used in the malware visualization domain. Therefore, the conducted literature review finding revealed that the features could be categorized into four main categories, namely, static, dynamic, hybrid, and application metadata. The contribution of this research paper is about feature selection for illustrating which features are effective with which visualization tools for malware visualization

    Systematic literature review for malware visualization techniques

    Get PDF
    Analyzing the activities or the behaviors of malicious scripts highly depends on extracted features. It is also significant to know which features are more effective for certain visualization types. Similarly, selecting an appropriate visualization technique plays a key role for analytical descriptive, diagnostic, predictive and prescriptive. Thus, the visualization technique should provide understandable information about the malicious code activities. This paper followed systematic literature review method in order to review the extracted features that are used to identify the malware, different types of visualization techniques and guidelines to select the right visualization techniques. An advanced search has been performed in most relevant digital libraries to obtain potentially relevant articles. The results demonstrate significant resources and types of features that are important to analyze malware activities and common visualization techniques that are currently used and methods to choose the right visualization technique in order to analyze the security events effectively

    Feature Space Modeling for Accurate and Efficient Learning From Non-Stationary Data

    Get PDF
    A non-stationary dataset is one whose statistical properties such as the mean, variance, correlation, probability distribution, etc. change over a specific interval of time. On the contrary, a stationary dataset is one whose statistical properties remain constant over time. Apart from the volatile statistical properties, non-stationary data poses other challenges such as time and memory management due to the limitation of computational resources mostly caused by the recent advancements in data collection technologies which generate a variety of data at an alarming pace and volume. Additionally, when the collected data is complex, managing data complexity, emerging from its dimensionality and heterogeneity, can pose another challenge for effective computational learning. The problem is to enable accurate and efficient learning from non-stationary data in a continuous fashion over time while facing and managing the critical challenges of time, memory, concept change, and complexity simultaneously. Feature space modeling is one of the most effective solutions to address this problem. For non-stationary data, selecting relevant features is even more critical than stationary data due to the reduction of feature dimension which can ensure the best use a computational resource to produce higher accuracy and efficiency by data mining algorithms. In this dissertation, we investigated a variety of feature space modeling techniques to improve the overall performance of data mining algorithms. In particular, we built Relief based feature sub selection method in combination with data complexity iv analysis to improve the classification performance using ovarian cancer image data collected in a non-stationary batch mode. We also collected time series health sensor data in a streaming environment and deployed feature space transformation using Singular Value Decomposition (SVD). This led to reduced dimensionality of feature space resulting in better accuracy and efficiency produced by Density Ration Estimation Method in identifying potential change points in data over time. We have also built an unsupervised feature space modeling using matrix factorization and Lasso Regression which was successfully deployed in conjugate with Relative Density Ratio Estimation to address the botnet attacks in a non-stationary environment. Relief based feature model improved 16% accuracy of Fuzzy Forest classifier. For change detection framework, we observed 9% improvement in accuracy for PCA feature transformation. Due to the unsupervised feature selection model, for 2% and 5% malicious traffic ratio, the proposed botnet detection framework exhibited average 20% better accuracy than One Class Support Vector Machine (OSVM) and average 25% better accuracy than Autoencoder. All these results successfully demonstrate the effectives of these feature space models. The fundamental theme that repeats itself in this dissertation is about modeling efficient feature space to improve both accuracy and efficiency of selected data mining models. Every contribution in this dissertation has been subsequently and successfully employed to capitalize on those advantages to solve real-world problems. Our work bridges the concepts from multiple disciplines ineffective and surprising ways, leading to new insights, new frameworks, and ultimately to a cross-production of diverse fields like mathematics, statistics, and data mining
    corecore