7 research outputs found

    Identifikasi Malicious Web Menggunakan Metode Random Forest

    Get PDF
    Abstrak — Malicious web adalah sebuah situs jahat yang dapat mengganggu operasi komputer, penjahat internet mengelabui pengguna dengan menyusupi situs web tertentu, terkadang malicious web sering terlihat seperti benign web sehingga pengguna sulit membedakan keduanya. Pada penelitian kali ini kami akan mengidentifikasi Malicious Web menggunakan metode RandomForest. Penelitian ini menggunakan 7 variabel prediksi dan 1 variabel respon .  Dari hasil penelitian di dapatkan akurasinya sebesar 94% .Kata kunci — Identifikasi, Random Forest, Malicious Web(Web bebahaya

    MadDroid: Characterising and Detecting Devious Ad Content for Android Apps

    Get PDF
    Advertisement drives the economy of the mobile app ecosystem. As a key component in the mobile ad business model, mobile ad content has been overlooked by the research community, which poses a number of threats, e.g., propagating malware and undesirable contents. To understand the practice of these devious ad behaviors, we perform a large-scale study on the app contents harvested through automated app testing. In this work, we first provide a comprehensive categorization of devious ad contents, including five kinds of behaviors belonging to two categories: \emph{ad loading content} and \emph{ad clicking content}. Then, we propose MadDroid, a framework for automated detection of devious ad contents. MadDroid leverages an automated app testing framework with a sophisticated ad view exploration strategy for effectively collecting ad-related network traffic and subsequently extracting ad contents. We then integrate dedicated approaches into the framework to identify devious ad contents. We have applied MadDroid to 40,000 Android apps and found that roughly 6\% of apps deliver devious ad contents, e.g., distributing malicious apps that cannot be downloaded via traditional app markets. Experiment results indicate that devious ad contents are prevalent, suggesting that our community should invest more effort into the detection and mitigation of devious ads towards building a trustworthy mobile advertising ecosystem.Comment: To be published in The Web Conference 2020 (WWW'20

    SIGL:Securing Software Installations Through Deep Graph Learning

    Get PDF
    Many users implicitly assume that software can only be exploited after it is installed. However, recent supply-chain attacks demonstrate that application integrity must be ensured during installation itself. We introduce SIGL, a new tool for detecting malicious behavior during software installation. SIGL collects traces of system call activity, building a data provenance graph that it analyzes using a novel autoencoder architecture with a graph long short-term memory network (graph LSTM) for the encoder and a standard multilayer perceptron for the decoder. SIGL flags suspicious installations as well as the specific installation-time processes that are likely to be malicious. Using a test corpus of 625 malicious installers containing real-world malware, we demonstrate that SIGL has a detection accuracy of 96%, outperforming similar systems from industry and academia by up to 87% in precision and recall and 45% in accuracy. We also demonstrate that SIGL can pinpoint the processes most likely to have triggered malicious behavior, works on different audit platforms and operating systems, and is robust to training data contamination and adversarial attack. It can be used with application-specific models, even in the presence of new software versions, as well as application-agnostic meta-models that encompass a wide range of applications and installers.Comment: 18 pages, to appear in the 30th USENIX Security Symposium (USENIX Security '21

    Revealing Malicious Contents Hidden In The Internet

    Get PDF
    In this age of ubiquitous communication in which we can stay constantly connected with the rest of the world, for most of the part, we have to be grateful for one particular invention - the Internet. But as the popularity of Internet connectivity grows, it has become a very dangerous place where objects of malicious content and intent can be hidden in plain sight. In this dissertation, we investigate different ways to detect and capture these malicious contents hidden in the Internet. First, we propose an automated system that mimics high-risk browsing activities such as clicking on suspicious online ads, and as a result collects malicious executable files for further analysis and diagnosis. Using our system we crawled over the Internet and collected a considerable amount of malicious executables with very limited resources. Malvertising has been one of the major recent threats against cyber security. Malvertisers apply a variety of evasion techniques to evade detection, whereas the ad networks apply inspection techniques to reveal the malicious ads. However, both the malvertiser and the ad network are under the constraints of resource and time. In the second part of this dissertation, we propose a game theoretic approach to formulate the problem of inspecting the malware inserted by the malvertisers into the Web-based advertising system. During malware collection, we used the online multi-AV scanning service VirusTotal to scan and analyze the samples, which can only generate an aggregation of antivirus scan reports. We need a multi-scanner solution that can accurately determine the maliciousness of a given sample. In the third part of this dissertation, we introduce three theoretical models, which enable us to predict the accuracy levels of different combination of scanners and determine the optimum configuration of a multi-scanner detection system to achieve maximum accuracy. Malicious communication generated by malware also can reveal the presence of it. In the case of botnets, their command and control (C&C) communication is good candidate for it. Among the widely used C&C protocols, HTTP is becoming the most preferred one. However, detecting HTTP-based C&C packets that constitute a minuscule portion of everyday HTTP traffic is a formidable task. In the final part of this dissertation, we present an anomaly detection based approach to detect HTTP-based C&C traffic using statistical features based on client generated HTTP request packets and DNS server generated response packets

    Analyse et perturbation d'un écosystème de fraude au clic

    Get PDF
    RÉSUMÉ La publicité en ligne est devenue une ressource économique importante et indispensable pour de nombreux services en ligne. Cependant, on note que ce marché est particulièrement touché par la fraude et notamment la fraude au clic. Ainsi, en 2015, il est estimé que, dans le monde, les annonceurs allaient perdre plus de sept milliards de dollars américains en raison de la fraude publicitaire. Les méthodes de luttes actuelles contre la fraude publicitaire sont concentrées sur la détection de logiciels malveillant et le démantèlement des réseaux de machines zombies qui y sont associés. Bien qu’indispensables pour limiter le nombre d’infections, ces démantèlements ne diminuent pas l’attrait pour cette fraude. Il est donc indispensable de s’attaquer en plus à l’incitatif économique. Pour cela, nous avons d’une part essayé de mieux comprendre l’écosystème de la fraude au clic et d’autre part évalué des possibilités de perturbations de cet écosystème afin de diminuer l’attractivité de la fraude. Dans un premier temps, nous avons collecté des données réseau générées par un logiciel malveillant de fraude au clic, Boaxxe. Ces données sont des chaînes de redirection HTTP qui montrent les liens entre les différents acheteurs et revendeurs d’une publicité, c’est-à-dire la chaîne de valeur. Celles-ci commencent au moteur de recherche d’entrée, opéré par des fraudeurs, passe à travers plusieurs régies publicitaires et termine sur le site d’un annonceur, celui ayant acheté le trafic. Dans un second temps, nous avons agrégé les données collectées afin de constituer un graphe montrant les relations entre les différents noms de domaine et adresses IP. Ce graphe est ensuite consolidé, grâce à des données de source ouverte, en regroupant les noeuds réseaux appartenant au même acteur. Le graphe ainsi obtenu constitue une représentation de l’écosystème de la fraude au clic de Boaxxe. Dans un troisième temps, nous avons évalué différentes stratégies de perturbation de l’écosystème. L’objectif de la perturbation est d’empêcher la monétisation de trafic généré par Boaxxe, c’est-à-dire d’empêcher le transit du trafic du moteur de recherche vers le site de l’annonceur. Il s’avère que la stratégie la plus adaptée à notre problème est celle utilisant la méthode du Keyplayer. Nous avons ainsi montré qu’il était possible de protéger un nombre important d’annonceurs en supprimant un faible nombre d’intermédiaires. Enfin, nous discutons des possibilités de mise en pratique de l’opération de perturbation. Nous insistons sur le fait qu’il est important de sensibiliser les annonceurs à la fraude afin qu’ils puissent prendre des mesures contraignantes envers les régies publicitaires les moins scrupuleuses.----------ABSTRACT Online advertising is a growing market with global revenues of 159.8 billion dollars in 2015. Thus, it is a good target for fraudsters to make money on. In 2015, it is estimated that, globally, advertisers were defrauded of more than seven billion dollars. The security community is concerned by this kind of fraud, known as click fraud, and a lot of research aims to limit it. Current methods are more focused on studying malware binaries and performing botnet take-downs. These operations are useful to limit the propagation of malware and to protect users from known threats. However, it does not have an impact on the economic incentives of perpetrating click fraud. In order to diminish the attractiveness of the fraud we first tried to better understand the click-fraud ecosystem and then evaluate disruption strategies on this ecosystem. Firstly, we collected network traces generated by a well-known click-fraud malware, Boaxxe. This data are HTTP redirection chains showing the links between all the intermediaries involved in the reselling of an ad. This constitutes the value chain. The redirection chains begin at a doorway search engine, operated by fraudsters, pass through several ad networks and land on an advertiser web site, that bought the traffic. Secondly, we aggregated the data collected into a single graph. It shows the relationships between the domain names and IP addresses involved in the Boaxxe fraud. We then consolidate this graph by merging all the network nodes operated by a single organization by leveraging information obtained from open sources. Thus, the graph is a representation of the fraud ecosystem. Thirdly, we evaluated disruption strategies on this ecosystem. The aim is to stop the monetization of the traffic generated by Boaxxe. This is equivalent to stopping the traffic going from the doorway search engine to the web sites of the advertisers. Among the strategies tested, the most suitable for our problem was the Keyplayer strategy. We showed that it is possible to protect numerous advertisers from this fraud by disrupting the ecosystem graph. Finally, we discuss how to perform the disruption operation in practice. We focus on increasing the level of awareness of advertisers that could have a strong position to limit click fraud. One way in which they could do so is by implementing controls to make sure they are not maintaining business relationships with unscrupulous ad networks

    Using Context to Improve Network-based Exploit Kit Detection

    Get PDF
    Today, our computers are routinely compromised while performing seemingly innocuous activities like reading articles on trusted websites (e.g., the NY Times). These compromises are perpetrated via complex interactions involving the advertising networks that monetize these sites. Web-based compromises such as exploit kits are similar to any other scam -- the attacker wants to lure an unsuspecting client into a trap to steal private information, or resources -- generating 10s of millions of dollars annually. Exploit kits are web-based services specifically designed to capitalize on vulnerabilities in unsuspecting client computers in order to install malware without a user's knowledge. Sadly, it only takes a single successful infection to ruin a user's financial life, or lead to corporate breaches that result in millions of dollars of expense and loss of customer trust. Exploit kits use a myriad of techniques to obfuscate each attack instance, making current network-based defenses such as signature-based network intrusion detection systems far less effective than in years past. Dynamic analysis or honeyclient analysis on these exploits plays a key role in identifying new attacks for signature generation, but provides no means of inspecting end-user traffic on the network to identify attacks in real time. As a result, defenses designed to stop such malfeasance often arrive too late or not at all resulting in high false positive and false negative (error) rates. In order to deal with these drawbacks, three new detection approaches are presented. To deal with the issue of a high number of errors, a new technique for detecting exploit kit interactions on a network is proposed. The technique capitalizes on the fact that an exploit kit leads its potential victim through a process of exploitation by forcing the browser to download multiple web resources from malicious servers. This process has an inherent structure that can be captured in HTTP traffic and used to significantly reduce error rates. The approach organizes HTTP traffic into tree-like data structures, and, using a scalable index of exploit kit traces as samples, models the detection process as a subtree similarity search problem. The technique is evaluated on 3,800 hours of web traffic on a large enterprise network, and results show that it reduces false positive rates by four orders of magnitude over current state-of-the-art approaches. While utilizing structure can vastly improve detection rates over current approaches, it does not go far enough in helping defenders detect new, previously unseen attacks. As a result, a new framework that applies dynamic honeyclient analysis directly on network traffic at scale is proposed. The framework captures and stores a configurable window of reassembled HTTP objects network wide, uses lightweight content rendering to establish the chain of requests leading up to a suspicious event, then serves the initial response content back to the honeyclient in an isolated network. The framework is evaluated on a diverse collection of exploit kits as they evolve over a 1 year period. The empirical evaluation suggests that the approach offers significant operational value, and a single honeyclient can support a campus deployment of thousands of users. While the above approaches attempt to detect exploit kits before they have a chance to infect the client, they cannot protect a client that has already been infected. The final technique detects signs of post infection behavior by intrusions that abuses the domain name system (DNS) to make contact with an attacker. Contemporary detection approaches utilize the structure of a domain name and require hundreds of DNS messages to detect such malware. As a result, these detection mechanisms cannot detect malware in a timely manner and are susceptible to high error rates. The final technique, based on sequential hypothesis testing, uses the DNS message patterns of a subset of DNS traffic to detect malware in as little as four DNS messages, and with orders of magnitude reduction in error rates. The results of this work can make a significant operational impact on network security analysis, and open several exciting future directions for network security research.Doctor of Philosoph
    corecore