15 research outputs found

    Random Access in Nondelimited Variable-length Record Collections for Parallel Reading with Hadoop

    Get PDF
    The industry standard Packet CAPture (PCAP) format for storing network packet traces is normally only readable in serial due to its lack of delimiters, indexing, or blocking. This presents a challenge for parallel analysis of large networks, where packet traces can be many gigabytes in size. In this work we present RAPCAP, a novel method for random access into variable-length record collections like PCAP by identifying a record boundary within a small number of bytes of the access point. Unlike related heuristic methods that can limit scalability with a nonzero probability of error, the new method offers a correctness guarantee with a well formed ïŹle and does not rely on prior knowledge of the contents. We include a practical implementation of the algorithm with an extension to the Hadoop framework, and a performance comparison to serial ingestion. Finally, we present a number of similar storage types that could utilize a modiïŹed version of RAPCAP for random access

    Design of Automation Environment for Analyzing Various IoT Malware

    Get PDF
    With the increasing proliferation of IoT systems, the security of IoT systems has become very important to individuals and businesses. IoT malware has been increasing exponentially since the emergence of Mirai in 2016. Because the IoT system environment is diverse, IoT malware also has various environments. In the case of existing analysis systems, there is no environment for dynamic analysis by running IoT malware of various architectures. It is inefficient in terms of time and cost to build an environment that analyzes malware one by one for analysis. The purpose of this paper is to improve the problems and limitations of the existing analysis system and provide an environment to analyze a large amount of IoT malware. Using existing open source analysis tools suitable for various IoT malicious codes and QEMU, a virtualization software, the environment in which the actual malicious code will run is built, and the library or system call that is actually called is statically and dynamically analyzed. In the text, the analysis system is applied to the actual collected malicious code to check whether it is analyzed and derive statistics. Information on the architecture of malicious code, attack method, command used, and access path can be checked, and this information can be used as a basis for malicious code detection research or classification research. The advantages are described of the system designed compared to the most commonly used automated analysis tools and improvements to existing limitations

    Transfer Learning for Detecting Unknown Network Attacks

    Get PDF
    Network attacks are serious concerns in today’s increasingly interconnected society. Recent studies have applied conventional machine learning to network attack detection by learning the patterns of the network behaviors and training a classification model. These models usually require large labeled datasets; however, the rapid pace and unpredictability of cyber attacks make this labeling impossible in real time. To address these problems, we proposed utilizing transfer learning for detecting new and unseen attacks by transferring the knowledge of the known attacks. In our previous work, we have proposed a transfer learning-enabled framework and approach, called HeTL, which can find the common latent subspace of two different attacks and learn an optimized representation, which was invariant to attack behaviors’ changes. However, HeTL relied on manual pre-settings of hyper-parameters such as relativeness between the source and target attacks. In this paper, we extended this study by proposing a clustering-enhanced transfer learning approach, called CeHTL, which can automatically find the relation between the new attack and known attack. We evaluated these approaches by stimulating scenarios where the testing dataset contains different attack types or subtypes from the training set. We chose several conventional classification models such as decision trees, random forests, KNN, and other novel transfer learning approaches as strong baselines. Results showed that proposed HeTL and CeHTL improved the performance remarkably. CeHTL performed best, demonstrating the effectiveness of transfer learning in detecting new network attacks

    Detecting scanning computer worms using machine learning and darkspace network traffic

    Get PDF
    The conference aimed at supporting and stimulating active productive research set to strengthen the technical foundations of engineers and scientists in the continent, through developing strong technical foundations and skills, leading to new small to medium enterprises within the African sub-continent. It also seeked to encourage the emergence of functionally skilled technocrats within the continent.The subject of this paper is computer worm detection in a network. Computers worms have been defined as a process that can cause a possibly evolved copy of it to execute on a remote computer. They do not require human intervention to propagate; neither do they need to attach themselves to existing files. Computer worms spread very rapidly and modern worm authors obfuscate their code to make it difficult to detect them. This paper proposes to use machine learning to detect them. The paper deviates from existing approaches in that it uses the darkspace network traffic attributed to an actual worm attack to validate the algorithms. In addition, it attempts to understand the threat model, the feature set and the detection algorithms to explain the best combination of features and why the best algorithms succeeds where others have failed.Strathmore University; Institute of Electrical and Electronics Engineers (IEEE

    A Review of Predictive Analytic Applications of Bayesian Network

    Get PDF
    Malware can be defined as malicious software that infiltrates a network and computer host in a variety of ways, from software flaws to social engineering. Due to the polymorphic and stealth nature of malware attacks, a signature-based analysis that is done statically is no longer sufficient to solve such a problem. Therefore, a behavioral or anomalous analysis will provide a more dynamic approach for the solution. However recent studies have shown that current behavioral methods at the network-level have several issues such as the inability to predict zero-day attacks, high-level assumptions, non-inferential analysis and performance issues. Other than performance issues, this study has identified common scientific characteristics which are reduced parameter, Ξ and lack of priori information p(Ξ) that causes the problems. Previous methods were proposed to address the problem however were still unable to resolve the stated scientific hitches. Due to the shortcomings, the Bayesian Network in terms of its probabilistic modelling would be the best method to deal with the stated scientific glitches which also have been proven in the area of Clinical Expert Systems, Artificial Intelligence and Pattern Recognition. This study will critically review the predictive analytic applications of Bayesian Network model in different research domain such as Clinical Expert Systems, Artificial Intelligence and Pattern Recognition and discover any potential approach available in the domain of Computer Networks. Based on the review, this paper has identified several Bayesian Network properties which have been used to overcome the abovementioned problems. Those properties will be applied in future studies to model the Behavioral Malware Predictive Analytics

    Mimicking anti-viruses with machine learning and entropy profiles

    Get PDF
    The quality of anti-virus software relies on simple patterns extracted from binary files. Although these patterns have proven to work on detecting the specifics of software, they are extremely sensitive to concealment strategies, such as polymorphism or metamorphism. These limitations also make anti-virus software predictable, creating a security breach. Any black hat with enough information about the anti-virus behaviour can make its own copy of the software, without any access to the original implementation or database. In this work, we show how this is indeed possible by combining entropy patterns with classification algorithms. Our results, applied to 57 different anti-virus engines, show that we can mimic their behaviour with an accuracy close to 98% in the best case and 75% in the worst, applied on Windows’ disk resident malware

    Mimicking anti-viruses with machine learning and entropy profiles

    Get PDF
    The quality of anti-virus software relies on simple patterns extracted from binary files. Although these patterns have proven to work on detecting the specifics of software, they are extremely sensitive to concealment strategies, such as polymorphism or metamorphism. These limitations also make anti-virus software predictable, creating a security breach. Any black hat with enough information about the anti-virus behaviour can make its own copy of the software, without any access to the original implementation or database. In this work, we show how this is indeed possible by combining entropy patterns with classification algorithms. Our results, applied to 57 different anti-virus engines, show that we can mimic their behaviour with an accuracy close to 98% in the best case and 75% in the worst, applied on Windows’ disk resident malware

    GRASE: Granulometry Analysis with Semi Eager Classifier to Detect Malware

    Get PDF
    Technological advancement in communication leading to 5G, motivates everyone to get connected to the internet including ‘Devices’, a technology named Web of Things (WoT). The community benefits from this large-scale network which allows monitoring and controlling of physical devices. But many times, it costs the security as MALicious softWARE (MalWare) developers try to invade the network, as for them, these devices are like a ‘backdoor’ providing them easy ‘entry’. To stop invaders from entering the network, identifying malware and its variants is of great significance for cyberspace. Traditional methods of malware detection like static and dynamic ones, detect the malware but lack against new techniques used by malware developers like obfuscation, polymorphism and encryption. A machine learning approach to detect malware, where the classifier is trained with handcrafted features, is not potent against these techniques and asks for efforts to put in for the feature engineering. The paper proposes a malware classification using a visualization methodology wherein the disassembled malware code is transformed into grey images. It presents the efficacy of Granulometry texture analysis technique for improving malware classification. Furthermore, a Semi Eager (SemiE) classifier, which is a combination of eager learning and lazy learning technique, is used to get robust classification of malware families. The outcome of the experiment is promising since the proposed technique requires less training time to learn the semantics of higher-level malicious behaviours. Identifying the malware (testing phase) is also done faster. A benchmark database like malimg and Microsoft Malware Classification challenge (BIG-2015) has been utilized to analyse the performance of the system. An overall average classification accuracy of 99.03 and 99.11% is achieved, respectively

    MDFRCNN: Malware Detection using Faster Region Proposals Convolution Neural Network

    Get PDF
    Technological advancement of smart devices has opened up a new trend: Internet of Everything (IoE), where all devices are connected to the web. Large scale networking benefits the community by increasing connectivity and giving control of physical devices. On the other hand, there exists an increased ‘Threat’ of an ‘Attack’. Attackers are targeting these devices, as it may provide an easier ‘backdoor entry to the users’ network’.MALicious softWARE (MalWare) is a major threat to user security. Fast and accurate detection of malware attacks are the sine qua non of IoE, where large scale networking is involved. The paper proposes use of a visualization technique where the disassembled malware code is converted into gray images, as well as use of Image Similarity based Statistical Parameters (ISSP) such as Normalized Cross correlation (NCC), Average difference (AD), Maximum difference (MaxD), Singular Structural Similarity Index Module (SSIM), Laplacian Mean Square Error (LMSE), MSE and PSNR. A vector consisting of gray image with statistical parameters is trained using a Faster Region proposals Convolution Neural Network (F-RCNN) classifier. The experiment results are promising as the proposed method includes ISSP with F-RCNN training. Overall training time of learning the semantics of higher-level malicious behaviors is less. Identification of malware (testing phase) is also performed in less time. The fusion of image and statistical parameter enhances system performance with greater accuracy. The benchmark database from Microsoft Malware Classification challenge has been used to analyze system performance, which is available on the Kaggle website. An overall average classification accuracy of 98.12% is achieved by the proposed method
    corecore