99 research outputs found

    Detection and Analysis of Drive-by Downloads and Malicious Websites

    Get PDF
    A drive by download is a download that occurs without users action or knowledge. It usually triggers an exploit of vulnerability in a browser to downloads an unknown file. The malicious program in the downloaded file installs itself on the victims machine. Moreover, the downloaded file can be camouflaged as an installer that would further install malicious software. Drive by downloads is a very good example of the exponential increase in malicious activity over the Internet and how it affects the daily use of the web. In this paper, we try to address the problem caused by drive by downloads from different standpoints. We provide in depth understanding of the difficulties in dealing with drive by downloads and suggest appropriate solutions. We propose machine learning and feature selection solutions to remedy the the drive-by download problem. Experimental results reported 98.2% precision, 98.2% F-Measure and 97.2% ROC area

    Evaluation with uncertainty

    Get PDF
    Experimental uncertainty arises as a consequence of: (1) bias (systematic error), and (2) variance in measurements. Popular evaluation techniques only account for the variance due to sampling of experimental units, and assume the other sources of uncertainty can be ignored. For example, only the uncertainty due to sampling of topics (queries) and sampling of training:test datasets is considered in standard information retrieval (IR) and classifier system evaluation respectively. However, incomplete relevance judgements, assessor disagreement, non-deterministic systems, and the measurement bias can also cause uncertainty in these experiments. In this thesis, the impact of other sources of uncertainty on evaluating IR and classification experiments are investigated. The uncertainty due to:(1) incomplete relevance judgements in IR test collections,(2) non-determinism in IR systems / classifiers, and (3) high variance of classifiers is analysed using case studies from distributed information retrieval and information security. The thesis illustrates the importance of reducing and accurately accounting for uncertainty when evaluating complex IR and classifier systems. Novel techniques to(1) reduce uncertainty due to test collection bias in IR evaluation and high classifier variance (overfitting) in detecting drive-by download attacks,(2) account for multidimensional variance due to sampling of IR systems instances from non-deterministic IR systems in addition to sampling of topics, and (3) account for repeated measurements due to non-deterministic classification algorithms are introduced

    Malware Resistant Data Protection in Hyper-connected Networks: A survey

    Full text link
    Data protection is the process of securing sensitive information from being corrupted, compromised, or lost. A hyperconnected network, on the other hand, is a computer networking trend in which communication occurs over a network. However, what about malware. Malware is malicious software meant to penetrate private data, threaten a computer system, or gain unauthorised network access without the users consent. Due to the increasing applications of computers and dependency on electronically saved private data, malware attacks on sensitive information have become a dangerous issue for individuals and organizations across the world. Hence, malware defense is critical for keeping our computer systems and data protected. Many recent survey articles have focused on either malware detection systems or single attacking strategies variously. To the best of our knowledge, no survey paper demonstrates malware attack patterns and defense strategies combinedly. Through this survey, this paper aims to address this issue by merging diverse malicious attack patterns and machine learning (ML) based detection models for modern and sophisticated malware. In doing so, we focus on the taxonomy of malware attack patterns based on four fundamental dimensions the primary goal of the attack, method of attack, targeted exposure and execution process, and types of malware that perform each attack. Detailed information on malware analysis approaches is also investigated. In addition, existing malware detection techniques employing feature extraction and ML algorithms are discussed extensively. Finally, it discusses research difficulties and unsolved problems, including future research directions.Comment: 30 pages, 9 figures, 7 tables, no where submitted ye

    Multi-level analysis of Malware using Machine Learning

    Get PDF
    Multi-level analysis of Malware using Machine Learnin

    Ransomware Simulator for In-Depth Analysis and Detection: Leveraging Centralized Logging and Sysmon for Improved Cybersecurity

    Get PDF
    Abstract Ransomware attacks have become increasingly prevalent and sophisticated, posing significant threats to organizations and individuals worldwide. To effectively combat these threats, security professionals must continuously develop and adapt their detection and mitigation strategies. This master thesis presents the design and implementation of a ransomware simulator to facilitate an in-depth analysis of ransomware Tactics, Techniques, and Procedures (TTPs) and to evaluate the effectiveness of centralized logging and Sysmon, including the latest event types, in detecting and responding to such attacks. The study explores the advanced capabilities of Sysmon as a logging tool and data source, focusing on its ability to capture multiple event types, such as file creation, process execution, and network traffic, as well as the newly added event types. The aim is to demonstrate the effectiveness of Sysmon in detecting and analyzing malicious activities, with an emphasis on the latest features. By focusing on the comprehensive aspects of a cyber-attack, the study showcases the versatility and utility of Sysmon in detecting and addressing various attack vectors. The ransomware simulator is developed using a PowerShell script that emulates various ransomware TTPs and attack scenarios, providing a comprehensive and realistic simulation of a ransomware attack. Sysmon, a powerful system monitoring tool, is utilized to monitor and log the activities associated with the simulated attack, including the events generated by the new Sysmon features. Centralized logging is achieved through the integration of Splunk Enterprise, a widely used platform for log analysis and management. The collected logs are then analyzed to identify patterns, indicators of compromise (IoCs), and potential detection and mitigation strategies. Through the development of the ransomware simulator and the subsequent analysis of Sysmon logs, this research contributes to strengthening the security posture of organizations and improving cybersecurity measures against ransomware threats, with a focus on the latest Sysmon capabilities. The results demonstrate the importance of monitoring and analyzing system events to effectively detect and respond to ransomware attacks. This research can serve as a basis for further exploration of ransomware detection and response strategies, contributing to the advancement of cybersecurity practices and the development of more robust security measures against ransomware threats

    Machine learning approaches for malware classification based on hybrid artefacts

    Get PDF
    Malware could be developed and transformed into various forms to deceive users and evade antivirus and security endpoint detection. Furthermore, if one machine in the network is compromised, it could be used for lateral movement--when malware spreads stealthily without sending an alarm to monitoring systems. Malware attacks pose security threats to modern enterprises and can cause massive financial, reputation, and data loss to major enterprises. Therefore, it is important to detect these attacks effectively to reduce the loss to the minimum level. The current research uses different approaches, including static and dynamic analysis, to detect and analyze malware categories using distinct feature sets, such as imported modules, opcodes, and API calls, which can improve performance in binary and multi-class classification problems. This thesis proposes a method for identifying and analyzing malware samples via static and dynamic approaches, including memory analysis and consecutive application operation sequences performed on the Windows 10 virtual environment. Standard classifiers and frequently used sequence models are utilized to expose the malware characteristics and benefit predictive capabilities. The features used in these algorithms are extracted from the static and dynamic analysis of malware samples, such as the rich header feature, debug information, temporary files, prefetch files, and event logs. The measurement of the classifiers and the degree of correctness are calculated using the accuracy, f1-score, Mean Absolute Error (MAE), confusion matrix, and Area under the ROC Curve (AUC). Combining two feature sets can provide the best classification performance on static file properties and dynamic analysis results, regardless of whether applying feature selection or not, achieving the accuracy and f1_score at 97% for integrating two datasets. For consecutive sequences, concatenating the Gated Recurrent Unit (GRU) and Transformers model can yield the highest accuracy at 97% for Noriben operations, while GRU can achieve the maximum accuracy for Opcode sequences at 89%

    Identifying Code Injection and Reuse Payloads In Memory Error Exploits

    Get PDF
    Today's most widely exploited applications are the web browsers and document readers we use every day. The immediate goal of these attacks is to compromise target systems by executing a snippet of malicious code in the context of the exploited application. Technical tactics used to achieve this can be classified as either code injection - wherein malicious instructions are directly injected into the vulnerable program - or code reuse, where bits of existing program code are pieced together to form malicious logic. In this thesis, I present a new code reuse strategy that bypasses existing and up-and-coming mitigations, and two methods for detecting attacks by identifying the presence of code injection or reuse payloads. Fine-grained address space layout randomization efficiently scrambles program code, limiting one's ability to predict the location of useful instructions to construct a code reuse payload. To expose the inadequacy of this exploit mitigation, a technique for "just-in-time" exploitation is developed. This new technique maps memory on-the-fly and compiles a code reuse payload at runtime to ensure it works in a randomized application. The attack also works in face of all other widely deployed mitigations, as demonstrated with a proof-of-concept attack against Internet Explorer 10 in Windows 8. This motivates the need for detection of such exploits rather than solely relying on prevention. Two new techniques are presented for detecting attacks by identifying the presence of a payload. Code reuse payloads are identified by first taking a memory snapshot of the target application, then statically profiling the memory for chains of code pointers that reuse code to implement malicious logic. Code injection payloads are identified with runtime heuristics by leveraging hardware virtualization for efficient sandboxed execution of all buffers in memory. Employing both detection methods together to scan program memory takes about a second and produces negligible false positives and false negatives provided that the given exploit is functional and triggered in the target application version. Compared to other strategies, such as the use of signatures, this approach requires relatively little effort spent on maintenance over time and is capable of detecting never before seen attacks. Moving forward, one could use these contributions to form the basis of a unique and effective network intrusion detection system (NIDS) to augment existing systems.Doctor of Philosoph

    Proposed Framework to Improving Performance of Familial Classification in Android Malware

    Get PDF
    Because of the recent developments in hardware and software technologies for mobile phones, people depend on their smartphones more than ever before. Today, people conduct a variety of business, health, and financial transactions on their mobile devices. This trend has caused an influx of mobile applications that require users' sensitive information. As these applications increase so too have the number of malicious applications increased, which may compromise users' sensitive information. Between all smartphone, Android receives major attention from security practitioners and researchers due to the large number of malicious applications. For the past twelve years, Android malicious applications have been clustered into groups for better identification. Characterizing the malware families can improve the detection process and understand the malware patterns. However, in the research community, detecting new malware families is a challenge. In this research, a framework is proposed to improve the performance of familial classification in Android malware. The framework is named a Reverse Engineering Framework (RevEng). Within RevEng, applications' permissions were selected and then fed into machine learning algorithms. Through our research, we created a reduced set of permissions using Extremely Randomized Trees algorithm that achieved high accuracy and a shorter execution time. Furthermore, we conducted two approaches based on the extracted information. The first approach used a binary value representation of the permissions. The second approach used the features' importance. We represented each selected permission in latter approach by its weight value instead of its binary value in the former approach. We conducted a comparison between the results of our two approaches and other relevant works. Our approaches achieved better results in both accuracy and time performance with a reduced number of permissions

    MalDetConv: Automated Behaviour-based Malware Detection Framework Based on Natural Language Processing and Deep Learning Techniques

    Full text link
    The popularity of Windows attracts the attention of hackers/cyber-attackers, making Windows devices the primary target of malware attacks in recent years. Several sophisticated malware variants and anti-detection methods have been significantly enhanced and as a result, traditional malware detection techniques have become less effective. This work presents MalBehavD-V1, a new behavioural dataset of Windows Application Programming Interface (API) calls extracted from benign and malware executable files using the dynamic analysis approach. In addition, we present MalDetConV, a new automated behaviour-based framework for detecting both existing and zero-day malware attacks. MalDetConv uses a text processing-based encoder to transform features of API calls into a suitable format supported by deep learning models. It then uses a hybrid of convolutional neural network (CNN) and bidirectional gated recurrent unit (CNN-BiGRU) automatic feature extractor to select high-level features of the API Calls which are then fed to a fully connected neural network module for malware classification. MalDetConv also uses an explainable component that reveals features that contributed to the final classification outcome, helping the decision-making process for security analysts. The performance of the proposed framework is evaluated using our MalBehavD-V1 dataset and other benchmark datasets. The detection results demonstrate the effectiveness of MalDetConv over the state-of-the-art techniques with detection accuracy of 96.10%, 95.73%, 98.18%, and 99.93% achieved while detecting unseen malware from MalBehavD-V1, Allan and John, Brazilian, and Ki-D datasets, respectively. The experimental results show that MalDetConv is highly accurate in detecting both known and zero-day malware attacks on Windows devices
    corecore