25 research outputs found
Leveraging the Cloud for Software Security Services.
This thesis seeks to leverage the advances in cloud computing in order to address modern
security threats, allowing for completely novel architectures that provide dramatic
improvements and asymmetric gains beyond what is possible using current approaches.
Indeed, many of the critical security problems facing the Internet and its users are inadequately
addressed by current security technologies. Current security measures often are deployed
in an exclusively network-based or host-based model, limiting their efficacy against
modern threats. However, recent advancements in the past decade in cloud computing and
high-speed networking have ushered in a new era of software services. Software services
that were previously deployed on-premise in organizations and enterprises are now being
outsourced to the cloud, leading to fundamentally new models in how software services are
sold, consumed, and managed.
This thesis focuses on how novel software security services can be deployed that leverage
the cloud to scale elegantly in their capabilities, performance, and management. First,
we introduce a novel architecture for malware detection in the cloud. Next, we propose
a cloud service to protect modern mobile devices, an ever-increasing target for malicious
attackers. Then, we discuss and demonstrate the ability for attackers to leverage the same
benefits of cloud-centric services for malicious purposes. Next, we present new techniques
for the large-scale analysis and classification of malicious software. Lastly, to demonstrate
the benefits of cloud-centric architectures outside the realm of malicious software,
we present a threshold signature scheme that leverages the cloud for robustness and resiliency.Ph.D.Computer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/91385/1/jonojono_1.pd
Reviewer Integration and Performance Measurement for Malware Detection
We present and evaluate a large-scale malware detection system integrating
machine learning with expert reviewers, treating reviewers as a limited
labeling resource. We demonstrate that even in small numbers, reviewers can
vastly improve the system's ability to keep pace with evolving threats. We
conduct our evaluation on a sample of VirusTotal submissions spanning 2.5 years
and containing 1.1 million binaries with 778GB of raw feature data. Without
reviewer assistance, we achieve 72% detection at a 0.5% false positive rate,
performing comparable to the best vendors on VirusTotal. Given a budget of 80
accurate reviews daily, we improve detection to 89% and are able to detect 42%
of malicious binaries undetected upon initial submission to VirusTotal.
Additionally, we identify a previously unnoticed temporal inconsistency in the
labeling of training datasets. We compare the impact of training labels
obtained at the same time training data is first seen with training labels
obtained months later. We find that using training labels obtained well after
samples appear, and thus unavailable in practice for current training data,
inflates measured detection by almost 20 percentage points. We release our
cluster-based implementation, as well as a list of all hashes in our evaluation
and 3% of our entire dataset.Comment: 20 papers, 11 figures, accepted at the 13th Conference on Detection
of Intrusions and Malware & Vulnerability Assessment (DIMVA 2016
RanDeter : using novel statistical and physical controls to deter ransomware attacks : a thesis presented in partial fulfillment of the requirements for the degree of Master of Information Sciences in Software Engineering at Massey University, Auckland, New Zealand
Crypto-Ransomware are a type of extortion-based malware that encrypt victims’
personal files with strong encryption algorithms and blackmail victims to pay ransom to
recover their files. The recurrent episodes of high-profile ransomware attacks like
WannaCry and Petya, particularly on healthcare, government agencies and big
corporates, have highlighted the immediate demand for effective defense mechanisms.
In this paper, RANDETER is introduced as a novel anti-crypto-ransomware solution
that deters ransomware activities, using novel statistical and physical controls inspired by
the police anti-terrorism practice. Police try to maintain public safety by maintaining a
constant presence to patrol key public areas, identifying suspects who exhibit out-ofordinary
characteristics, and restricting access to protected areas. Ransomware are in
many ways like terrorists; their attacks are unexpected, malicious and aim for the largest
number of victims. It is possible to try to detect and deter crypto-ransomware by
maintaining a constant surveillance on the potential victims – MBR and user files
especially documents and photos.
RANDETER is implemented as two compatible and complementary modules:
PARTITION GUARD and FILE PATROL. PARTITION GUARD blocks modifications to the area
of MBR on the booting disk. FILE PATROL checks all file activities of directories protected
by RANDETER against a list of Recognized Processed with Multi-Tier Security Rules.
Upon detection of violations of such rules, which may have been initiated by cryptoransomware
as judged by FILE PATROL, FILE PATROL will freeze access of the monitored
directories, terminate the offending processes, and resume access of those directories.
Our evaluation demonstrated that RANDETER could ensure less and often no
irrecoverable file damage by current ransomware families, while imposing less disk
performance overheads, compared to existing competitor anti-ransomware
implementations like CRYPTOLOCK, SHIELDFS and REDEMPTION. In addition, RANDETER
was shown to be resilient against masquerading attacks and ransomware polymorphism
Effizientes Maschinelles Lernen fĂĽr die Angriffserkennung
Detecting and fending off attacks on computer systems is an enduring
problem in computer security. In light of a plethora of different
threats and the growing automation used by attackers, we are in urgent
need of more advanced methods for attack detection.
In this thesis, we address the necessity of advanced attack detection
and develop methods to detect attacks using machine learning to
establish a higher degree of automation for reactive security. Machine
learning is data-driven and not void of bias. For the effective
application of machine learning for attack detection, thus, a periodic
retraining over time is crucial. However, the training complexity of
many learning-based approaches is substantial. We show that with the
right data representation, efficient algorithms for mining substring
statistics, and implementations based on probabilistic data structures,
training the underlying model can be achieved in linear time.
In two different scenarios, we demonstrate the effectiveness of
so-called language models that allow to generically portray the content
and structure of attacks: On the one hand, we are learning malicious
behavior of Flash-based malware using classification, and on the other
hand, we detect intrusions by learning normality in industrial control
networks using anomaly detection. With a data throughput of up to
580 Mbit/s during training, we do not only meet our expectations with
respect to runtime but also outperform related approaches by up to an
order of magnitude in detection performance. The same techniques that
facilitate learning in the previous scenarios can also be used for
revealing malicious content, embedded in passive file formats, such as
Microsoft Office documents. As a further showcase, we additionally
develop a method based on the efficient mining of substring statistics
that is able to break obfuscations irrespective of the used key length,
with up to 25 Mbit/s and thus, succeeds where related approaches fail.
These methods significantly improve detection performance and enable
operation in linear time. In doing so, we counteract the trend of
compensating increasing runtime requirements with resources. While the
results are promising and the approaches provide urgently needed
automation, they cannot and are not intended to replace human experts or
traditional approaches, but are designed to assist and complement them.Die Erkennung und Abwehr von Angriffen auf Endnutzer und Netzwerke ist
seit vielen Jahren ein anhaltendes Problem in der Computersicherheit.
Angesichts der hohen Anzahl an unterschiedlichen Angriffsvektoren und
der zunehmenden Automatisierung von Angriffen, bedarf es dringend
moderner Methoden zur Angriffserkennung.
In dieser Doktorarbeit werden Ansätze entwickelt, um Angriffe mit Hilfe
von Methoden des maschinellen Lernens zuverlässig, aber auch effizient
zu erkennen. Sie stellen der Automatisierung von Angriffen einen
entsprechend hohen Grad an Automatisierung von VerteidigungsmaĂźnahmen
entgegen. Das Trainieren solcher Methoden ist allerdings rechnerisch
aufwändig und erfolgt auf sehr großen Datenmengen. Laufzeiteffiziente
Lernverfahren sind also entscheidend. Wir zeigen, dass durch den Einsatz
von effizienten Algorithmen zur statistischen Analyse von Zeichenketten
und Implementierung auf Basis von probabilistischen Datenstrukturen, das
Lernen von effektiver Angriffserkennung auch in linearer Zeit möglich
ist.
Anhand von zwei unterschiedlichen Anwendungsfällen, demonstrieren wir
die Effektivität von Modellen, die auf der Extraktion von sogenannten
n-Grammen basieren: Zum einen, betrachten wir die Erkennung von
Flash-basiertem Schadcode mittels Methoden der Klassifikation, und zum
anderen, die Erkennung von Angriffen auf Industrienetzwerke bzw.
SCADA-Systeme mit Hilfe von Anomaliedetektion. Dabei erzielen wir
während des Trainings dieser Modelle einen Datendurchsatz von bis zu
580 Mbit/s und ĂĽbertreffen gleichzeitig die Erkennungsleistung von
anderen Ansätzen deutlich. Die selben Techniken, um diese lernenden
Ansätze zu ermöglichen, können außerdem für die Erkennung von Schadcode
verwendet werden, der in anderen Dateiformaten eingebettet und mittels
einfacher VerschlĂĽsselungen obfuskiert wurde. Hierzu entwickeln wir eine
Methode die basierend auf der statistischen Auswertung von Zeichenketten
einfache VerschlĂĽsselungen bricht. Der entwickelte Ansatz arbeitet
unabhängig von der verwendeten Schlüssellänge, mit einem Datendurchsatz
von bis zu 25 Mbit/s und ermöglicht so die erfolgreiche Deobfuskierung
in Fällen an denen andere Ansätze scheitern.
Die erzielten Ergebnisse in Hinsicht auf Laufzeiteffizienz und
Erkennungsleistung sind vielversprechend. Die vorgestellten Methoden
ermöglichen die dringend nötige Automatisierung von
VerteidigungsmaĂźnahmen, sollen den Experten oder etablierte Methoden
aber nicht ersetzen, sondern diese unterstützen und ergänzen
Resilient and Scalable Android Malware Fingerprinting and Detection
Malicious software (Malware) proliferation reaches hundreds of thousands daily. The manual analysis of such a large volume of malware is daunting and time-consuming. The diversity of targeted systems in terms of architecture and platforms compounds the challenges of Android malware detection and malware in general. This highlights the need to design and implement new scalable and robust methods, techniques, and tools to detect Android malware. In this thesis, we develop a malware fingerprinting framework to cover accurate Android malware detection and family attribution. In this context, we emphasize the following: (i) the scalability over a large malware corpus; (ii) the resiliency to common obfuscation techniques; (iii) the portability over different platforms and architectures.
In the context of bulk and offline detection on the laboratory/vendor level: First, we propose an approximate fingerprinting technique for Android packaging that captures the underlying static structure of the Android apps. We also propose a malware clustering framework on top of this fingerprinting technique to perform unsupervised malware detection and grouping by building and partitioning a similarity network of malicious apps. Second, we propose an approximate fingerprinting technique for Android malware's behavior reports generated using dynamic analyses leveraging natural language processing techniques. Based on this fingerprinting technique, we propose a portable malware detection and family threat attribution framework employing supervised machine learning techniques. Third, we design an automatic framework to produce intelligence about the underlying malicious cyber-infrastructures of Android malware. We leverage graph analysis techniques to generate relevant, actionable, and granular intelligence that can be used to identify the threat effects induced by malicious Internet activity associated to Android malicious apps.
In the context of the single app and online detection on the mobile device level, we further propose the following: Fourth, we design a portable and effective Android malware detection system that is suitable for deployment on mobile and resource constrained devices, using machine learning classification on raw method call sequences. Fifth, we elaborate a framework for Android malware detection that is resilient to common code obfuscation techniques and adaptive to operating systems and malware change overtime, using natural language processing and deep learning techniques.
We also evaluate the portability of the proposed techniques and methods beyond Android platform malware, as follows: Sixth, we leverage the previously elaborated techniques to build a framework for cross-platform ransomware fingerprinting relying on raw hybrid features in conjunction with advanced deep learning techniques