330 research outputs found
Lessons Learnt on Reproducibility in Machine Learning Based Android Malware Detection
A well-known curse of computer security research is that it often produces systems that, while technically sound, fail operationally. To overcome this curse, the community generally seeks to assess proposed systems under a variety of settings in order to make explicit every potential bias. In this respect, recently, research achievements on machine learning based malware detection are being considered for thorough evaluation by the community. Such an effort of comprehensive evaluation supposes first and foremost the possibility to perform an independent reproduction study in order to sharpen evaluations presented by approachesâ authors. The question Can published approaches actually be reproduced? thus becomes paramount despite the little interest such mundane and practical aspects seem to attract in the malware detection field. In this paper, we attempt a complete reproduction of five Android Malware Detectors from the literature and discuss to what extent they are âreproducibleâ. Notably, we provide insights on the implications around the guesswork that may be required to finalise a working implementation. Finally, we discuss how barriers to reproduction could be lifted, and how the malware detection field would benefit from stronger reproducibility standardsâlike many various fields already have
SoK: Cryptographically Protected Database Search
Protected database search systems cryptographically isolate the roles of
reading from, writing to, and administering the database. This separation
limits unnecessary administrator access and protects data in the case of system
breaches. Since protected search was introduced in 2000, the area has grown
rapidly; systems are offered by academia, start-ups, and established companies.
However, there is no best protected search system or set of techniques.
Design of such systems is a balancing act between security, functionality,
performance, and usability. This challenge is made more difficult by ongoing
database specialization, as some users will want the functionality of SQL,
NoSQL, or NewSQL databases. This database evolution will continue, and the
protected search community should be able to quickly provide functionality
consistent with newly invented databases.
At the same time, the community must accurately and clearly characterize the
tradeoffs between different approaches. To address these challenges, we provide
the following contributions:
1) An identification of the important primitive operations across database
paradigms. We find there are a small number of base operations that can be used
and combined to support a large number of database paradigms.
2) An evaluation of the current state of protected search systems in
implementing these base operations. This evaluation describes the main
approaches and tradeoffs for each base operation. Furthermore, it puts
protected search in the context of unprotected search, identifying key gaps in
functionality.
3) An analysis of attacks against protected search for different base
queries.
4) A roadmap and tools for transforming a protected search system into a
protected database, including an open-source performance evaluation platform
and initial user opinions of protected search.Comment: 20 pages, to appear to IEEE Security and Privac
NoVT: Eliminating C++ Virtual Calls to Mitigate Vtable Hijacking
The vast majority of nowadays remote code execution attacks target virtual function tables (vtables).
Attackers hijack vtable pointers to change the control flow of a vulnerable program to their will, resulting in full control over the underlying system.
In this paper, we present NoVT, a compiler-based defense against vtable hijacking.
Instead of protecting vtables for virtual dispatch, our solution replaces them with switch-case constructs that are inherently control-flow safe, thus preserving control flow integrity of C++ virtual dispatch.
NoVT extends Clang to perform a class hierarchy analysis on C++ source code.
Instead of a vtable, each class gets unique identifier numbers which are used to dispatch the correct method implementation.
Thereby, NoVT inherently protects all usages of a vtable, not just virtual dispatch.
We evaluate NoVT on common benchmark applications and real-world programs including Chromium.
Despite its strong security guarantees, NoVT improves runtime performance of most programs (mean overhead -0.5%, -3.7% min, 2% max).
In addition, protected binaries are slightly smaller than unprotected ones.
NoVT works on different CPU architectures and protects complex C++ programs against strong attacks like COOP and ShrinkWrap
Combatting Advanced Persistent Threat via Causality Inference and Program Analysis
Cyber attackers are becoming more and more sophisticated. In particular, Advanced Persistent Threat (APT) is a new class of attack that targets a specifc organization and compromises systems over a long time without being detected. Over the years, we have seen notorious examples of APTs including Stuxnet which disrupted Iranian nuclear centrifuges and data breaches affecting millions of users. Investigating APT is challenging as it occurs over an extended period of time and the attack process is highly sophisticated and stealthy. Also, preventing APTs is diffcult due to ever-expanding attack vectors.
In this dissertation, we present proposals for dealing with challenges in attack investigation. Specifcally, we present LDX which conducts precise counter-factual causality inference to determine dependencies between system calls (e.g., between input and output system calls) and allows investigators to determine the origin of an attack (e.g., receiving a spam email) and the propagation path of the attack, and assess the consequences of the attack. LDX is four times more accurate and two orders of magnitude faster than state-of-the-art taint analysis techniques. Moreover, we then present a practical model-based causality inference system, MCI, which achieves precise and accurate causality inference without requiring any modifcation or instrumentation in end-user systems.
Second, we show a general protection system against a wide spectrum of attack vectors and methods. Specifcally, we present A2C that prevents a wide range of attacks by randomizing inputs such that any malicious payloads contained in the inputs are corrupted. The protection provided by A2C is both general (e.g., against various attack vectors) and practical (7% runtime overhead)
Prompted User Retrieval of Secret Entropy: The Passmaze Protocol
A prompting protocol permits users to securely retrieve secrets with
greater entropy than passwords. The retrieved user secrets can have
enough entropy to be used to derive cryptographic keys
Machine Learning and Security of Non-Executable Files
Computer malware is a well-known threat in security which, despite the enormous time and effort invested in fighting it, is today more prevalent than ever. Recent years have brought a surge in one particular type: malware embedded in non-executable file formats, e.g., PDF, SWF and various office file formats. The result has been a massive number of infections, owed primarily to the trust that ordinary computer users have in these file formats. In addition, their feature-richness and implementation complexity have created enormous attack surfaces in widely deployed client software, resulting in regular discoveries of new vulnerabilities.
The traditional approach to malware detection â signature matching, heuristics and behavioral profiling â has from its inception been a labor-intensive manual task, always lagging one step behind the attacker. With the exponential growth of computers and networks, malware has become more diverse, wide-spread and adaptive than ever, scaling much faster than the available talent pool of human malware analysts. An automated and scalable approach is needed to fill the gap between automated malware adaptation and manual malware detection, and machine learning is emerging as a viable solution. Its branch called adversarial machine learning studies the security of machine learning algorithms and the special conditions that arise when machine learning is applied for security.
This thesis is a study of adversarial machine learning in the context of static detection of malware in non-executable file formats. It evaluates the effectiveness, efficiency and security of machine learning applications in this context. To this end, it introduces 3 data-driven detection methods developed using very large, high quality datasets. PJScan detects malicious PDF files based on lexical properties of embedded JavaScript code and is the fastest method published to date. SL2013 extends its coverage to all PDF files, regardless of JavaScript presence, by analyzing the hierarchical structure of PDF logical building blocks and demonstrates excellent performance in a novel long-term realistic experiment. Finally, Hidost generalizes the hierarchical-structure-based feature set to become the first machine-learning-based malware detector operating on multiple file formats. In a comprehensive experimental evaluation on PDF and SWF, it outperforms other academic methods and commercial antivirus systems in detection effectiveness.
Furthermore, the thesis presents a framework for security evaluation of machine learning classifiers in a case study performed on an independent PDF malware detector. The results show that the ability to manipulate a part of the classifierâs feature set allows a malicious adversary to disguise malware so that it appears benign to the classifier with a high success rate. The presented methods are released as open-source software.Schadsoftware ist eine gut bekannte Sicherheitsbedrohung. Trotz der enormen Zeit und des Aufwands die investiert werden, um sie zu beseitigen, ist sie heute weiter verbreitet als je zuvor. In den letzten Jahren kam es zu einem starken Anstieg von Schadsoftware, welche in nicht-ausfĂŒhrbaren Dateiformaten, wie PDF, SWF und diversen Office-Formaten, eingebettet ist. Die Folge war eine massive Anzahl von Infektionen, ermöglicht durch das Vertrauen, das normale Rechnerbenutzer in diese Dateiformate haben. AuĂerdem hat die KomplexitĂ€t und Vielseitigkeit dieser Dateiformate groĂe AngriffsflĂ€chen in weitverbreiteter Klient-Software verursacht, und neue SicherheitslĂŒcken werden regelmĂ€Ăig entdeckt.
Der traditionelle Ansatz zur Erkennung von Schadsoftware â Mustererkennung, Heuristiken und Verhaltensanalyse â war vom Anfang an eine Ă€uĂerst mĂŒhevolle Handarbeit, immer einen Schritt hinter den Angreifern zurĂŒck. Mit dem exponentiellen Wachstum von Rechenleistung und Netzwerkgeschwindigkeit ist Schadsoftware diverser, zahlreicher und schneller-anpassend geworden als je zuvor, doch die VerfĂŒgbarkeit von menschlichen Schadsoftware-Analysten kann nicht so schnell skalieren. Ein automatischer und skalierbarer Ansatz ist gefragt, und maschinelles Lernen tritt als eine brauchbare Lösung hervor. Ein Bereich davon, Adversarial Machine Learning, untersucht die Sicherheit von maschinellen Lernverfahren und die besonderen VerhĂ€ltnisse, die bei der Anwendung von machinellem Lernen fĂŒr Sicherheit entstehen.
Diese Arbeit ist eine Studie von Adversarial Machine Learning im Kontext statischer Schadsoftware-Erkennung in nicht-ausfĂŒhrbaren Dateiformaten. Sie evaluiert die Wirksamkeit, LeistungsfĂ€higkeit und Sicherheit von maschinellem Lernen in diesem Kontext. Zu diesem Zweck stellt sie 3 datengesteuerte Erkennungsmethoden vor, die alle auf sehr groĂen und diversen DatensĂ€tzen entwickelt wurden. PJScan erkennt bösartige PDF-Dateien anhand lexikalischer Eigenschaften von eingebettetem JavaScript-Code und ist die schnellste bisher veröffentliche Methode. SL2013 erweitert die Erkennung auf alle PDF-Dateien, unabhĂ€ngig davon, ob sie JavaScript enthalten, indem es die hierarchische Struktur von logischen PDF-Bausteinen analysiert. Es zeigt hervorragende Leistung in einem neuen, langfristigen und realistischen Experiment. SchlieĂlich generalisiert Hidost den auf hierarchischen Strukturen basierten Merkmalsraum und wurde zum ersten auf maschinellem Lernen basierten Schadsoftware-Erkennungssystem, das auf mehreren Dateiformaten anwendbar ist. In einer umfassenden experimentellen Evaulierung auf PDF- und SWF-Formaten schlĂ€gt es andere akademische Methoden und kommerzielle Antiviren-Lösungen bezĂŒglich Erkennungswirksamkeit.
Ăberdies stellt diese Doktorarbeit ein Framework fĂŒr Sicherheits-Evaluierung von auf machinellem Lernen basierten Klassifikatoren vor und wendet es in einer Fallstudie auf eine unabhĂ€ngige akademische Schadsoftware-Erkennungsmethode an. Die Ergebnisse zeigen, dass die FĂ€higkeit, nur einen Teil von Features, die ein Klasifikator verwendet, zu manipulieren, einem Angreifer ermöglicht, Schadsoftware in Dateien so einzubetten, dass sie von der Erkennungsmethode mit hoher Erfolgsrate als gutartig fehlklassifiziert wird. Die vorgestellten Methoden wurden als Open-Source-Software veröffentlicht
Recommended from our members
Design and Implementation of Algorithms for Traffic Classification
Traffic analysis is the practice of using inherent characteristics of a network flow such as timings, sizes, and orderings of the packets to derive sensitive information about it. Traffic analysis techniques are used because of the extensive adoption of encryption and content-obfuscation mechanisms, making it impossible to infer any information about the flows by analyzing their content. In this thesis, we use traffic analysis to infer sensitive information for different objectives and different applications. Specifically, we investigate various applications: p2p cryptocurrencies, flow correlation, and messaging applications. Our goal is to tailor specific traffic analysis algorithms that best capture network trafficâs intrinsic characteristics in those applications for each of these applications. Also, the objective of traffic analysis is different for each of these applications. Specifically, in Bitcoin, our goal is to evaluate Bitcoin trafficâs resilience to blocking by powerful entities such as governments and ISPs. Bitcoin and similar cryptocurrencies play an important role in electronic commerce and other trust-based distributed systems because of their significant advantage over traditional currencies, including open access to global e-commerce. Therefore, it is essential to
the consumers and the industry to have reliable access to their Bitcoin assets. We also examine stepping stone attacks for flow correlation. A stepping stone is a host that an attacker uses to relay her traffic to hide her identity. We introduce two fingerprinting systems, TagIt and FINN. TagIt embeds a secret fingerprint into the flows by moving the packets to specific time intervals. However, FINN utilizes DNNs to embed the fingerprint by changing the inter-packet delays (IPDs) in the flow. In messaging applications, we analyze the WhatsApp messaging service to determine if traffic leaks any sensitive information such as membersâ identity in a particular conversation to the adversaries who watch their encrypted traffic. These messaging applicationsâ privacy is essential because these services provide an environment to dis- cuss politically sensitive subjects, making them a target to government surveillance and censorship in totalitarian countries. We take two technical approaches to design our traffic analysis techniques. The increasing use of DNN-based classifiers inspires our first direction: we train DNN classifiers to perform some specific traffic analysis task. Our second approach is to inspect and model the shape of traffic in the target application and design a statistical classifier for the expected shape of traffic. DNN- based methods are useful when the network is complex, and the trafficâs underlying noise is not linear. Also, these models do not need a meticulous analysis to extract the features. However, deep learning techniques need a vast amount of training data to work well. Therefore, they are not beneficial when there is insufficient data avail- able to train a generalized model. On the other hand, statistical methods have the advantage that they do not have training overhead
POISED: Spotting Twitter Spam Off the Beaten Paths
Cybercriminals have found in online social networks a propitious medium to
spread spam and malicious content. Existing techniques for detecting spam
include predicting the trustworthiness of accounts and analyzing the content of
these messages. However, advanced attackers can still successfully evade these
defenses.
Online social networks bring people who have personal connections or share
common interests to form communities. In this paper, we first show that users
within a networked community share some topics of interest. Moreover, content
shared on these social network tend to propagate according to the interests of
people. Dissemination paths may emerge where some communities post similar
messages, based on the interests of those communities. Spam and other malicious
content, on the other hand, follow different spreading patterns.
In this paper, we follow this insight and present POISED, a system that
leverages the differences in propagation between benign and malicious messages
on social networks to identify spam and other unwanted content. We test our
system on a dataset of 1.3M tweets collected from 64K users, and we show that
our approach is effective in detecting malicious messages, reaching 91%
precision and 93% recall. We also show that POISED's detection is more
comprehensive than previous systems, by comparing it to three state-of-the-art
spam detection systems that have been proposed by the research community in the
past. POISED significantly outperforms each of these systems. Moreover, through
simulations, we show how POISED is effective in the early detection of spam
messages and how it is resilient against two well-known adversarial machine
learning attacks
- âŠ