498 research outputs found

    Malicious Web Sites Detection using C4.5 Decision Tree

    Get PDF
    The technology advancement poses the challenge to the cybercriminals for doing various online criminal acts, such as identity theft, extortion of money or simply, viruses and worms spreading. The common aim of the online criminals is to attract visitors to the Web site, which can be easily accessed by clicking on the URL. Blacklisting seems not to be the successful way of marking Web sites with the “bad” content, considering that many malicious Web sites are not blacklisted. The aim of this paper is to evaluate the ability of C4.5 decision tree classifier in detecting malicious Web sites, based on the features that characterize URLs. The classifier is evaluated through several performance evaluation criteria, namely accuracy, sensitivity, specificity and area under the ROC curve. C4.5 decision tree classifier achieved significant success in malicious Web sites detection, considering all four criteria (accuracy 96.5, sensitivity 96.4, specificity 96.5 and area under the curve 0.958)

    Bot recognition in a Web store: An approach based on unsupervised learning

    Get PDF
    Abstract Web traffic on e-business sites is increasingly dominated by artificial agents (Web bots) which pose a threat to the website security, privacy, and performance. To develop efficient bot detection methods and discover reliable e-customer behavioural patterns, the accurate separation of traffic generated by legitimate users and Web bots is necessary. This paper proposes a machine learning solution to the problem of bot and human session classification, with a specific application to e-commerce. The approach studied in this work explores the use of unsupervised learning (k-means and Graded Possibilistic c-Means), followed by supervised labelling of clusters, a generative learning strategy that decouples modelling the data from labelling them. Its efficiency is evaluated through experiments on real e-commerce data, in realistic conditions, and compared to that of supervised learning classifiers (a multi-layer perceptron neural network and a support vector machine). Results demonstrate that the classification based on unsupervised learning is very efficient, achieving a similar performance level as the fully supervised classification. This is an experimental indication that the bot recognition problem can be successfully dealt with using methods that are less sensitive to mislabelled data or missing labels. A very small fraction of sessions remain misclassified in both cases, so an in-depth analysis of misclassified samples was also performed. This analysis exposed the superiority of the proposed approach which was able to correctly recognize more bots, in fact, and identified more camouflaged agents, that had been erroneously labelled as humans

    Denial of Service in Web-Domains: Building Defenses Against Next-Generation Attack Behavior

    Get PDF
    The existing state-of-the-art in the field of application layer Distributed Denial of Service (DDoS) protection is generally designed, and thus effective, only for static web domains. To the best of our knowledge, our work is the first that studies the problem of application layer DDoS defense in web domains of dynamic content and organization, and for next-generation bot behaviour. In the first part of this thesis, we focus on the following research tasks: 1) we identify the main weaknesses of the existing application-layer anti-DDoS solutions as proposed in research literature and in the industry, 2) we obtain a comprehensive picture of the current-day as well as the next-generation application-layer attack behaviour and 3) we propose novel techniques, based on a multidisciplinary approach that combines offline machine learning algorithms and statistical analysis, for detection of suspicious web visitors in static web domains. Then, in the second part of the thesis, we propose and evaluate a novel anti-DDoS system that detects a broad range of application-layer DDoS attacks, both in static and dynamic web domains, through the use of advanced techniques of data mining. The key advantage of our system relative to other systems that resort to the use of challenge-response tests (such as CAPTCHAs) in combating malicious bots is that our system minimizes the number of these tests that are presented to valid human visitors while succeeding in preventing most malicious attackers from accessing the web site. The results of the experimental evaluation of the proposed system demonstrate effective detection of current and future variants of application layer DDoS attacks

    Clustering Web Users By Mouse Movement to Detect Bots and Botnet Attacks

    Get PDF
    The need for website administrators to efficiently and accurately detect the presence of web bots has shown to be a challenging problem. As the sophistication of modern web bots increases, specifically their ability to more closely mimic the behavior of humans, web bot detection schemes are more quickly becoming obsolete by failing to maintain effectiveness. Though machine learning-based detection schemes have been a successful approach to recent implementations, web bots are able to apply similar machine learning tactics to mimic human users, thus bypassing such detection schemes. This work seeks to address the issue of machine learning based bots bypassing machine learning-based detection schemes, by introducing a novel unsupervised learning approach to cluster users based on behavioral biometrics. The idea is that, by differentiating users based on their behavior, for example how they use the mouse or type on the keyboard, information can be provided for website administrators to make more informed decisions on declaring if a user is a human or a bot. This approach is similar to how modern websites require users to login before browsing their website; which in doing so, website administrators can make informed decisions on declaring if a user is a human or a bot. An added benefit of this approach is that it is a human observational proof (HOP); meaning that it will not inconvenience the user (user friction) with human interactive proofs (HIP) such as CAPTCHA, or with login requirement

    Categorization of Phishing Detection Features And Using the Feature Vectors to Classify Phishing Websites

    Get PDF
    abstract: Phishing is a form of online fraud where a spoofed website tries to gain access to user's sensitive information by tricking the user into believing that it is a benign website. There are several solutions to detect phishing attacks such as educating users, using blacklists or extracting phishing characteristics found to exist in phishing attacks. In this thesis, we analyze approaches that extract features from phishing websites and train classification models with extracted feature set to classify phishing websites. We create an exhaustive list of all features used in these approaches and categorize them into 6 broader categories and 33 finer categories. We extract 59 features from the URL, URL redirects, hosting domain (WHOIS and DNS records) and popularity of the website and analyze their robustness in classifying a phishing website. Our emphasis is on determining the predictive performance of robust features. We evaluate the classification accuracy when using the entire feature set and when URL features or site popularity features are excluded from the feature set and show how our approach can be used to effectively predict specific types of phishing attacks such as shortened URLs and randomized URLs. Using both decision table classifiers and neural network classifiers, our results indicate that robust features seem to have enough predictive power to be used in practice.Dissertation/ThesisMasters Thesis Computer Science 201

    Cross Site Scripting Attacks in Web-Based Applications

    Get PDF
    Web-based applications has turn out to be very prevalent due to the ubiquity of web browsers to deliver service oriented application on-demand to diverse client over the Internet and cross site scripting (XSS) attack is a foremost security risk that has continuously ravage the web applications over the years. This paper critically examines the concept of XSS and some recent approaches for detecting and preventing XSS attacks in terms of architectural framework, algorithm used, solution location, and so on. The techniques were analysed and results showed that most of the available recognition and avoidance solutions to XSS attacks are more on the client end than the server end because of the peculiar nature of web application vulnerability and they also lack support for self-learning ability in order to detect new XSS attacks. Few researchers as cited in this paper inculcated the self-learning ability to detect and prevent XSS attacks in their design architecture using artificial neural networks and soft computing approach; a lot of improvement is still needed to effectively and efficiently handle the web application security menace as recommended

    Machine Learning Interpretability in Malware Detection

    Get PDF
    The ever increasing processing power of modern computers, as well as the increased availability of large and complex data sets, has led to an explosion in machine learning research. This has led to increasingly complex machine learning algorithms, such as Convolutional Neural Networks, with increasingly complex applications, such as malware detection. Recently, malware authors have become increasingly successful in bypassing traditional malware detection methods, partly due to advanced evasion techniques such as obfuscation and server-side polymorphism. Further, new programming paradigms such as fileless malware, that is malware that exist only in the main memory (RAM) of the infected host, add to the challenges faced with modern day malware detection. This has led security specialists to turn to machine learning to augment their malware detection systems. However, with this new technology comes new challenges. One of these challenges is the need for interpretability in machine learning. Machine learning interpretability is the process of giving explanations of a machine learning model\u27s predictions to humans. Rather than trying to understand everything that is learnt by the model, it is an attempt to find intuitive explanations which are simple enough and provide relevant information for downstream tasks. Cybersecurity analysts always prefer interpretable solutions because of the need to fine tune these solutions. If malware analysts can\u27t interpret the reason behind a misclassification, they will not accept the non-interpretable or black box detector. In this thesis, we provide an overview of machine learning and discuss its roll in cyber security, the challenges it faces, and potential improvements to current approaches in the literature. We showcase its necessity as a result of new computing paradigms by implementing a proof of concept fileless malware with JavaScript. We then present techniques for interpreting machine learning based detectors which leverage n-gram analysis and put forward a novel and fully interpretable approach for malware detection which uses convolutional neural networks. We also define a novel approach for evaluating the robustness of a machine learning based detector
    • …
    corecore