13 research outputs found
Pattern Programmable Kernel Filter for Bot Detection
Bots earn their unique name as they perform a wide variety of automated task. These tasks include stealing sensitive user information. Detection of bots using solutions such as behavioral correlation of flow records, group activity in DNS traffic, observing the periodic repeatability in communication, etc., lead to monitoring the network traffic and then classifying them as Bot or normal traffic. Other solutions for Bot detection include kernel level key stroke verification, system call initialization, IP black listing, etc. In the first two solutions there is no assurance that the packet carrying user information is prevented from being sent to the attacker and the latter suffers from the problem of IP spoofing. This motivated us to think of a solution that would filter out the malicious packets before being put onto the network. To come out with such a solution, a real time bot attack was generated with SpyEye Exploit kit and traffic characteristics were analyzed. The analysis revealed the existence of a unique repeated communication between the Zombie machine and the botmaster. This motivated us to propose, a Pattern Programmable Kernel Filter (PPKF) for filtering out the malicious packets generated by bots. PPKF was developed using the windows filtering platform (WFP) filter engine. PPKF was programmed to filter out the packets with unique pattern which were observed from the bot attack experiments. Further PPKF was found to completely suppress the flow of packets having the programmed uniqueness in them thus preventing the functioning of bots in terms of user information being sent to the Botmaster.Defence Science Journal, 2012, 62(1), pp.174-179, DOI:http://dx.doi.org/10.14429/dsj.62.142
Wie repräsentativ sind die Messdaten eines Honeynet?
Zur Früherkennung von kritischen Netzphänomenen wurden in der Vergangenheit viele Arten von verteilten Sensornetze im Internet etabliert und erforscht. Wir betrachten das Phänomen Verteilung von bösartiger Software im Netz'', das punktuell etwa mit dem InMAS-Sensorsystem gemessen werden kann. Unklar war jedoch immer die Frage, wie repräsentativ die Daten sind, die durch ein solches Sensornetz gesammelt werden. In diesem Dokument wird ein methodisches Rahmenwerk beschrieben, mit dem Maßzahlen der Repräsentativität an Messungen von Malware-Sensornetzen geheftet werden können. Als methodischer Ansatz wurden Techniken der empirischen Sozialforschung verwendet. Als Ergebnis ist festzuhalten, dass ein Sensornetz mit mindestens 100 zufällig über den Netzbereich verteilten Sensoren notwendig erscheint, um überhaupt belastbare Aussagen über die Repräsentativität von Sensornetz-Messungen machen zu können
Machine Learning and Security of Non-Executable Files
Computer malware is a well-known threat in security which, despite the enormous time and effort invested in fighting it, is today more prevalent than ever. Recent years have brought a surge in one particular type: malware embedded in non-executable file formats, e.g., PDF, SWF and various office file formats. The result has been a massive number of infections, owed primarily to the trust that ordinary computer users have in these file formats. In addition, their feature-richness and implementation complexity have created enormous attack surfaces in widely deployed client software, resulting in regular discoveries of new vulnerabilities.
The traditional approach to malware detection – signature matching, heuristics and behavioral profiling – has from its inception been a labor-intensive manual task, always lagging one step behind the attacker. With the exponential growth of computers and networks, malware has become more diverse, wide-spread and adaptive than ever, scaling much faster than the available talent pool of human malware analysts. An automated and scalable approach is needed to fill the gap between automated malware adaptation and manual malware detection, and machine learning is emerging as a viable solution. Its branch called adversarial machine learning studies the security of machine learning algorithms and the special conditions that arise when machine learning is applied for security.
This thesis is a study of adversarial machine learning in the context of static detection of malware in non-executable file formats. It evaluates the effectiveness, efficiency and security of machine learning applications in this context. To this end, it introduces 3 data-driven detection methods developed using very large, high quality datasets. PJScan detects malicious PDF files based on lexical properties of embedded JavaScript code and is the fastest method published to date. SL2013 extends its coverage to all PDF files, regardless of JavaScript presence, by analyzing the hierarchical structure of PDF logical building blocks and demonstrates excellent performance in a novel long-term realistic experiment. Finally, Hidost generalizes the hierarchical-structure-based feature set to become the first machine-learning-based malware detector operating on multiple file formats. In a comprehensive experimental evaluation on PDF and SWF, it outperforms other academic methods and commercial antivirus systems in detection effectiveness.
Furthermore, the thesis presents a framework for security evaluation of machine learning classifiers in a case study performed on an independent PDF malware detector. The results show that the ability to manipulate a part of the classifier’s feature set allows a malicious adversary to disguise malware so that it appears benign to the classifier with a high success rate. The presented methods are released as open-source software.Schadsoftware ist eine gut bekannte Sicherheitsbedrohung. Trotz der enormen Zeit und des Aufwands die investiert werden, um sie zu beseitigen, ist sie heute weiter verbreitet als je zuvor. In den letzten Jahren kam es zu einem starken Anstieg von Schadsoftware, welche in nicht-ausführbaren Dateiformaten, wie PDF, SWF und diversen Office-Formaten, eingebettet ist. Die Folge war eine massive Anzahl von Infektionen, ermöglicht durch das Vertrauen, das normale Rechnerbenutzer in diese Dateiformate haben. Außerdem hat die Komplexität und Vielseitigkeit dieser Dateiformate große Angriffsflächen in weitverbreiteter Klient-Software verursacht, und neue Sicherheitslücken werden regelmäßig entdeckt.
Der traditionelle Ansatz zur Erkennung von Schadsoftware – Mustererkennung, Heuristiken und Verhaltensanalyse – war vom Anfang an eine äußerst mühevolle Handarbeit, immer einen Schritt hinter den Angreifern zurück. Mit dem exponentiellen Wachstum von Rechenleistung und Netzwerkgeschwindigkeit ist Schadsoftware diverser, zahlreicher und schneller-anpassend geworden als je zuvor, doch die Verfügbarkeit von menschlichen Schadsoftware-Analysten kann nicht so schnell skalieren. Ein automatischer und skalierbarer Ansatz ist gefragt, und maschinelles Lernen tritt als eine brauchbare Lösung hervor. Ein Bereich davon, Adversarial Machine Learning, untersucht die Sicherheit von maschinellen Lernverfahren und die besonderen Verhältnisse, die bei der Anwendung von machinellem Lernen für Sicherheit entstehen.
Diese Arbeit ist eine Studie von Adversarial Machine Learning im Kontext statischer Schadsoftware-Erkennung in nicht-ausführbaren Dateiformaten. Sie evaluiert die Wirksamkeit, Leistungsfähigkeit und Sicherheit von maschinellem Lernen in diesem Kontext. Zu diesem Zweck stellt sie 3 datengesteuerte Erkennungsmethoden vor, die alle auf sehr großen und diversen Datensätzen entwickelt wurden. PJScan erkennt bösartige PDF-Dateien anhand lexikalischer Eigenschaften von eingebettetem JavaScript-Code und ist die schnellste bisher veröffentliche Methode. SL2013 erweitert die Erkennung auf alle PDF-Dateien, unabhängig davon, ob sie JavaScript enthalten, indem es die hierarchische Struktur von logischen PDF-Bausteinen analysiert. Es zeigt hervorragende Leistung in einem neuen, langfristigen und realistischen Experiment. Schließlich generalisiert Hidost den auf hierarchischen Strukturen basierten Merkmalsraum und wurde zum ersten auf maschinellem Lernen basierten Schadsoftware-Erkennungssystem, das auf mehreren Dateiformaten anwendbar ist. In einer umfassenden experimentellen Evaulierung auf PDF- und SWF-Formaten schlägt es andere akademische Methoden und kommerzielle Antiviren-Lösungen bezüglich Erkennungswirksamkeit.
Überdies stellt diese Doktorarbeit ein Framework für Sicherheits-Evaluierung von auf machinellem Lernen basierten Klassifikatoren vor und wendet es in einer Fallstudie auf eine unabhängige akademische Schadsoftware-Erkennungsmethode an. Die Ergebnisse zeigen, dass die Fähigkeit, nur einen Teil von Features, die ein Klasifikator verwendet, zu manipulieren, einem Angreifer ermöglicht, Schadsoftware in Dateien so einzubetten, dass sie von der Erkennungsmethode mit hoher Erfolgsrate als gutartig fehlklassifiziert wird. Die vorgestellten Methoden wurden als Open-Source-Software veröffentlicht
Exploiting tactics, techniques, and procedures for malware detection
There has been a meteoric rise in the use of malware to perpetrate cybercrime and more generally, serve the interests of malicious actors. As a result, malware has evolved both in terms of its sheer variety and sophistication. There is hence a need for developing effective malware detection systems to counter this surge. Typically, most such systems nowadays are purely data-driven - they utilise Machine Learning (ML) based approaches which rely on large volumes of data, to spot patterns, detect anomalies, and thus detect malware. In this thesis, we propose a methodology for malware detection on networks that combines human domain knowledge with conventional malware detection approaches to more effectively identify, reason about, and be resilient to malware. Specifically, we use domain knowledge in the form of the Tactics, Techniques, and Procedures (TTPs) described in the MITRE ATT\&CK ontology of adversarial behaviour to build Network Intrusion Detection Systems (NIDS). Through the course of our research, we design and evaluate the first such NIDS that can effectively exploit TTPs for the purpose of malware detection. We then attempt to expand the scope of usability of these TTPs to systems other than our specialised NIDS, and develop a methodology that lets any generic ML-based NIDS exploit these TTPs as model features. We further expand and generalise our approach by modelling it as a multi-label classification problem, which enables us to: (i) detect malware more precisely on the basis of individual TTPs, and (ii) identify the malicious usage of uncommon or rarely-used TTPs. Throughout all our experiments, we rigorously evaluate all our systems on several metrics using large datasets of real-world malware and benign samples. We empirically demonstrate the usefulness of TTPs in the malware detection process, the benefits of a TTP-based approach in reasoning about malware and responding to various challenging conditions, and the overall robustness of our systems to adversarial attack. As a consequence, we establish and improve the state-of-the-art when it comes to detecting network-based malware using TTP-based information. This thesis overall represents a step forward in building automated systems that combine purely-data driven approaches with human expertise in the field of malware analysis
EmPoWeb: Empowering Web Applications with Browser Extensions
Browser extensions are third party programs, tightly integrated to browsers,
where they execute with elevated privileges in order to provide users with
additional functionalities. Unlike web applications, extensions are not subject
to the Same Origin Policy (SOP) and therefore can read and write user data on
any web application. They also have access to sensitive user information
including browsing history, bookmarks, cookies and list of installed
extensions. Extensions have a permanent storage in which they can store data
and can trigger the download of arbitrary files on the user's device. For
security reasons, browser extensions and web applications are executed in
separate contexts. Nonetheless, in all major browsers, extensions and web
applications can interact by exchanging messages. Through these communication
channels, a web application can exploit extension privileged capabilities and
thereby access and exfiltrate sensitive user information. In this work, we
analyzed the communication interfaces exposed to web applications by Chrome,
Firefox and Opera browser extensions. As a result, we identified many
extensions that web applications can exploit to access privileged capabilities.
Through extensions' APIS, web applications can bypass SOP, access user cookies,
browsing history, bookmarks, list of installed extensions, extensions storage,
and download arbitrary files on the user's device. Our results demonstrate that
the communications between browser extensions and web applications pose serious
security and privacy threats to browsers, web applications and more importantly
to users. We discuss countermeasures and proposals, and believe that our study
and in particular the tool we used to detect and exploit these threats, can be
used as part of extensions review process by browser vendors to help them
identify and fix the aforementioned problems in extensions.Comment: 40th IEEE Symposium on Security and Privacy May 2019 Application
security; Attacks and defenses; Malware and unwanted software; Mobile and Web
security and privacy; Privacy technologies and mechanism
EmPoWeb: Empowering Web Applications with Browser Extensions
International audienceBrowser extensions are third party programs, tightly integrated to browsers, where they execute with elevated privileges in order to provide users with additional functionalities. Unlike web applications, extensions are not subject to the Same Origin Policy (SOP) and therefore can read and write user data on any web application. They also have access to sensitive user information including browsing history, bookmarks, credentials (cookies) and list of installed extensions. They have access to a permanent storage in which they can store data as long as they are installed in the user's browser. They can trigger the download of arbitrary files and save them on the user's device. For security reasons, browser extensions and web applications are executed in separate contexts. Nonetheless, in all major browsers, extensions and web applications can interact by exchanging messages. Through these communication channels, a web application can exploit extension privileged capabilities and thereby access and exfiltrate sensitive user information. In this work, we analyzed the communication interfaces exposed to web applications by Chrome, Firefox and Opera browser extensions. As a result, we identified many extensions that web applications can exploit to access privileged capabilities. Through extensions' APIS, web applications can bypass SOP and access user data on any other web application, access user credentials (cookies), browsing history, bookmarks, list of installed extensions, extensions storage, and download and save arbitrary files in the user's device. Our results demonstrate that the communications between browser extensions and web applications pose serious security and privacy threats to browsers, web applications and more importantly to users. We discuss countermeasures and proposals, and believe that our study and in particular the tool we used to detect and exploit these threats, can be used as part of extensions review process by browser vendors to help them identify and fix the aforementioned problems in extensions
Generating of DNS Service Attacks
Dostupnost a bezpeÄŤnost patřà mezi nejdĹŻleĹľitÄ›jšà poĹľadavky internetovĂ˝ch sluĹľeb. Proto je nezbytnĂ© detekovat sĂĹĄovĂ© anomálie, kterĂ© majĂ na tyto potĹ™eby nejvÄ›tšà vliv. Aby detekÄŤnĂ mechanismy drĹľely krok se stále propracovanÄ›jšĂmi anomáliemi, musĂ bĂ˝t na jejich vĂ˝voj kladen velkĂ˝ dĹŻraz. CĂlem tĂ©to práce je analĂ˝za a replikovánĂ nejÄŤastÄ›jšĂch ĂştokĹŻ na sluĹľbu DNS. SesbĂraná data z tÄ›chto generátorĹŻ ĂştokĹŻ mohou byt pouĹľitá pro lepšà pochopenĂ jejich chovánĂ, co vede ke tvorbÄ› kvalitnÄ›jšĂch detekÄŤnĂch mechanizmĹŻ.Availability and security are the most important requirements for Internet services. It is therefore necessary to detect the network anomalies, which have the greatest impact on these requirements. Therefore the great emphasis must be placed on development of detection mechanisms that keep pace with increasingly sophisticated network anomalies. The aim of this work is to analyze and replicate the most common DNS service attacks. Collected data from these attack generators can be used for better understanding of their behavior, what leads to improved and more effective detection mechanisms.
Recommended from our members
Identifying and Preventing Large-scale Internet Abuse
The widespread access to the Internet and the ubiquity of web-based services make it easy to communicate and interact globally. Unfortunately, the software and protocols implementing the functionality of these services are often vulnerable to attacks. In turn, an attacker can exploit them to compromise, take over, and abuse the services for her own nefarious purposes. In this dissertation, we aim to better understand such attacks, and we develop methods and algorithms to detect and prevent them, which we evaluate on large-scale datasets.First, we detail Meerkat, a system to detect a visible way in which websites are being compromised, namely website defacements. They can inflict significant harm on the websites’ operators through the loss of sales, the loss in reputation, or because of legal ramifications. Meerkat requires no prior knowledge about the websites’ content or their structure, but only the Uniform Resource Identifier (URI) at which they can be reached. By design, Meerkat mimics how a human analyst decides if a website was defaced when viewing it in a browser, by using computer vision techniques. Thus, it tackles the problem of detecting website defacements through their attention-seeking nature, their goal and purpose, rather than code or data artifacts that they might exhibit. In turn, it is much harder for an attacker to evade our system, as she needs to change her modus operandi. When Meerkat detects a website as defaced, the website can automatically be put into maintenance mode or restored to a known good state.An attacker, however, is not limited to abuse a compromised website in a way that is visible to the website’s visitors. Instead, she can misuse the website to infect its visitors with malicious software (malware). Although malware is well studied, identifying malicious websites remains a major challenge in today’s Internet. Second, we introduce Delta, a novel, purely static analysis approach that extracts change-related features between two versions of the same website, uses machine learning to derive a model of website changes, detects if an introduced change was malicious or benign, identifies the underlying infection vector based on clustering, and generates an identifying signature. Furthermore, due to the way Delta clusters campaigns, it can uncover infection campaigns that leverage specific vulnerable applications as a distribution channel, and it can greatly reduce the human labor necessary to uncover the application responsible for a service’s compromise.Third, we investigate the practicality and impact of domain takeover attacks, which an attacker can similarly abuse to spread misinformation or malware, and we present a defense on how such takeover attacks can be rendered toothless. Specifically, the new elasticity of Internet resources, in particular Internet protocol (IP) addresses in the context of Infrastructure-as-a-Service cloud service providers, combined with previously made protocol assumptions can lead to security issues. In Cloud Strife, we show that this dynamic component paired with recent developments in trust-based ecosystems (e.g., Transport Layer Security (TLS) certificates) creates so far unknown attack vectors. For example, a substantial number of stale domain name system (DNS) records points to readily available IP addresses in clouds, yet, they are still actively attempted to be accessed. Often, these records belong to discontinued services that were previously hosted in the cloud. We demonstrate that it is practical, and time and cost-efficient for attackers to allocate the IP addresses to which stale DNS records point. Further considering the ubiquity of domain validation in trust ecosystems, an attacker can impersonate the service by obtaining and using a valid certificate that is trusted by all major operating systems and browsers, which severely increases the attackers’ capabilities. The attacker can then also exploit residual trust in the domain name for phishing, receiving and sending emails, or possibly distributing code to clients that load remote code from the domain (e.g., loading of native code by mobile apps, or JavaScript libraries by websites). To prevent such attacks, we introduce a new authentication method for trust-based domain validation that mitigates staleness issues without incurring additional certificate requester effort by incorporating existing trust into the validation process.Finally, the analyses of Delta, Meerkat, and Cloud Strife have made use of large-scale measurements to assess our approaches’ impact and viability. Indeed, security research in general has made extensive use of exhaustive Internet-wide scans over the recent years, as they can provide significant insights into the state of security of the Internet (e.g., if classes of devices are behaving maliciously, or if they might be insecure and could turn malicious in an instant). However, the address space of the Internet’s core addressing protocol (Internet Protocol version 4; IPv4) is exhausted, and a migration to its successor (Internet Protocol version 6; IPv6), the only accepted long-term solution, is inevitable. In turn, to better understand the security of devices connected to the Internet, in particular Internet of Things devices, it is imperative to include IPv6 addresses in security evaluations and scans. Unfortunately, it is practically infeasible to iterate through the entire IPv6 address space, as it is 296 times larger than the IPv4 address space. Without enumerating hosts prior to scanning, we will be unable to retain visibility into the overall security of Internet-connected devices in the future, and we will be unable to detect and prevent their abuse or compromise. To mitigate this blind spot, we introduce a novel technique to enumerate part of the IPv6 address space by walking DNSSEC-signed IPv6 reverse zones. We show (i) that enumerating active IPv6 hosts is practical without a preferential network position contrary to common belief, (ii) that the security of active IPv6 hosts is currently still lagging behind the security state of IPv4 hosts, and (iii) that unintended default IPv6 connectivity is a major security issue