4 research outputs found

    Using Four Learning Algorithms for Evaluating Questionable Uniform Resource Locators (URLs)

    Get PDF
    Malicious Uniform Resource Locator (URL) is a common and serious threat to cyber security. Malicious URLs host unsolicited contents (spam, phishing, drive-by exploits, etc.) and lure unsuspecting internet users to become victims of scams such as monetary loss, theft, loss of information privacy and unexpected malware installation. This phenomenon has resulted in the increase of cybercrime on social media via transfer of malicious URLs. This situation prompted an efficient and reliable classification of a web-page based on the information contained in the URL to have a clear understanding of the nature and status of the site to be accessed. It is imperative to detect and act on URLs shared on social media platform in a timely manner. Though researchers have carried out similar researches in the past, there are however conflicting results regarding the conclusions drawn at the end of their experimentations. Against this backdrop, four machine learning algorithms:Naïve Bayes Algorithm, K-means Algorithm, Decision Tree Algorithm and Logistic Regression Algorithm were selected for classification of fake and vulnerable URLs. The implementation of algorithms was implemented with Java programming language. Through statistical analysis and comparison made on the four algorithms, Naïve Bayes algorithm is the most efficient and effective based on the metrics used

    Real-time classification of malicious URLs on Twitter using Machine Activity Data

    Get PDF
    Massive online social networks with hundreds of millions of active users are increasingly being used by Cyber criminals to spread malicious software (malware) to exploit vulnerabilities on the machines of users for personal gain. Twitter is particularly susceptible to such activity as, with its 140 character limit, it is common for people to include URLs in their tweets to link to more detailed information, evidence, news reports and so on. URLs are often shortened so the endpoint is not obvious before a person clicks the link. Cyber criminals can exploit this to propagate malicious URLs on Twitter, for which the endpoint is a malicious server that performs unwanted actions on the person’s machine. This is known as a drive-by-download. In this paper we develop a machine classification system to distinguish between malicious and benign URLs within seconds of the URL being clicked (i.e. ‘real-time’). We train the classifier using machine activity logs created while interacting with URLs extracted from Twitter data collected during a large global event – the Superbowl – and test it using data from another large sporting event – the Cricket World Cup. The results show that machine activity logs produce precision performances of up to 0.975 on training data from the first event and 0.747 on a test data from a second event. Furthermore, we examine the properties of the learned model to explain the relationship between machine activity and malicious software behaviour, and build a learning curve for the classifier to illustrate that very small samples of training data can be used with only a small detriment to performance

    Two-stage classification model to detect malicious web pages

    No full text
    Malicious web pages are an emerging security concern on the Internet due to their popularity and their potential serious impacts. Detecting and analyzing them is very costly because of their qualities and complexities. There has been some research approaches carried out in order to detect them. The approaches can be classified into two main groups based on their used analysis features: static feature based and run-time feature based approaches. While static feature based approach shows it strengthens as light-weight system, run-time feature based approach has better performance in term of detection accuracy. This paper presents a novel two-stage classification model to detect malicious web pages. Our approach divided detection process into two stages: Estimating maliciousness of web pages and then identifying malicious web pages. Static features are light-weight but less valuable so they are used to identify potential malicious web pages in the first stage. Only potential malicious web pages are forwarded to the second stage for further investigation. On the other hand, run-time features are costly but more valuable so they are used in the final stage to identify malicious web pages

    A behavioural study in runtime analysis environments and drive-by download attacks

    Get PDF
    In the information age, the growth in availability of both technology and exploit kits have continuously contributed in a large volume of websites being compromised or set up with malicious intent. The issue of drive-by-download attacks formulate a high percentage (77%) of the known attacks against client systems. These attacks originate from malicious web-servers or compromised web-servers and attack client systems by pushing malware upon interaction. Within the detection and intelligence gathering area of research, high-interaction honeypot approaches have been a longstanding and well-established technology. These are however not without challenges: analysing the entirety of the world wide web using these approaches is unviable due to time and resource intensiveness. Furthermore, the volume of data that is generated as a result of a run-time analysis of the interaction between website and an analysis environment is huge, varied and not well understood. The volume of malicious servers in addition to the large datasets created as a result of run-time analysis are contributing factors in the difficulty of analysing and verifying actual malicious behaviour. The work in this thesis attempts to overcome the difficulties in the analysis process of log files to optimise malicious and anomaly behaviour detection. The main contribution of this work is focused on reducing the volume of data generated from run-time analysis to reduce the impact of noise within behavioural log file datasets. This thesis proposes an alternate approach that uses an expert lead approach to filtering benign behaviour from potentially malicious and unknown behaviour. Expert lead filtering is designed in a risk-averse method that takes into account known benign and expected behaviours before filtering the log file. Moreover, the approach relies upon behavioural investigation as well as potential for 5 system compromisation before filtering out behaviour within dynamic analysis log files. Consequently, this results in a significantly lower volume of data that can be analysed in greater detail. The proposed filtering approach has been implemented and tested in real-world context using a prudent experimental framework. An average of 96.96% reduction in log file size has been achieved which is transferable to behaviour analysis environments. The other contributions of this work include the understanding of observable operating system interactions. Within the study of behaviour analysis environments, it was concluded that run-time analysis environments are sensitive to application and operating system versions. Understanding key changes in operating systems behaviours within Windows is an unexplored area of research yet Windows is currently one of the most popular client operating system. As part of understanding system behaviours for the creation of behavioural filters, this study undertakes a number of experiments to identify the key behaviour differences between operating systems. The results show that there are significant changes in core processes and interactions which can be taken into account in the development of filters for updated systems. Finally, from the analysis of 110,000 potentially malicious websites, typical attacks are explored. These attacks actively exploited the honeypot and offer knowledge on a section of the active web-based attacks faced in the world wide web. Trends and attack vectors are identified and evaluated
    corecore