119 research outputs found

    Provenance-Aware Tracing of Worm Break-in and Contaminations: A Process Coloring Approach

    Get PDF
    To investigate the exploitation and contamination by self-propagating Internet worms, a provenanceaware tracing mechanism is highly desirable. Provenance unawareness causes difficulties in fast and accurate identification of a worm’s break-in point (namely, a remotely-accessible vulnerable service running in the infected host), and incurs significant log data inspection overhead. This paper presents the design, implementation, and evaluation of process coloring, an efficient provenance-aware approach to worm breakin and contamination tracing. More specifically, process coloring assigns a “color”, a unique system-wide identifier, to each remotely-accessible server or process. The color will then be either inherited by spawned child processes or diffused indirectly through process actions (e.g., read or write operations). Process coloring brings two major advantages: (1) It enables fast color-based identification of the break-in point exploited by a worm even before detailed log analysis; (2) It naturally partitions log data according to their associated colors, effectively reducing the volume of log data that need to be examined and correspondingly, log processing overhead for worm investigation. A tamper-resistant log collection method is developed based on the virtual machine introspection technique. Our experiments with a number of real-world worms demonstrate the advantages of processing coloring. For example, to reveal detaile

    Detecting worm mutations using machine learning

    Get PDF
    Worms are malicious programs that spread over the Internet without human intervention. Since worms generally spread faster than humans can respond, the only viable defence is to automate their detection. Network intrusion detection systems typically detect worms by examining packet or flow logs for known signatures. Not only does this approach mean that new worms cannot be detected until the corresponding signatures are created, but that mutations of known worms will remain undetected because each mutation will usually have a different signature. The intuitive and seemingly most effective solution is to write more generic signatures, but this has been found to increase false alarm rates and is thus impractical. This dissertation investigates the feasibility of using machine learning to automatically detect mutations of known worms. First, it investigates whether Support Vector Machines can detect mutations of known worms. Support Vector Machines have been shown to be well suited to pattern recognition tasks such as text categorisation and hand-written digit recognition. Since detecting worms is effectively a pattern recognition problem, this work investigates how well Support Vector Machines perform at this task. The second part of this dissertation compares Support Vector Machines to other machine learning techniques in detecting worm mutations. Gaussian Processes, unlike Support Vector Machines, automatically return confidence values as part of their result. Since confidence values can be used to reduce false alarm rates, this dissertation determines how Gaussian Process compare to Support Vector Machines in terms of detection accuracy. For further comparison, this work also compares Support Vector Machines to K-nearest neighbours, known for its simplicity and solid results in other domains. The third part of this dissertation investigates the automatic generation of training data. Classifier accuracy depends on good quality training data -- the wider the training data spectrum, the higher the classifier's accuracy. This dissertation describes the design and implementation of a worm mutation generator whose output is fed to the machine learning techniques as training data. This dissertation then evaluates whether the training data can be used to train classifiers of sufficiently high quality to detect worm mutations. The findings of this work demonstrate that Support Vector Machines can be used to detect worm mutations, and that the optimal configuration for detection of worm mutations is to use a linear kernel with unnormalised bi-gram frequency counts. Moreover, the results show that Gaussian Processes and Support Vector Machines exhibit similar accuracy on average in detecting worm mutations, while K-nearest neighbours consistently produces lower quality predictions. The generated worm mutations are shown to be of sufficiently high quality to serve as training data. Combined, the results demonstrate that machine learning is capable of accurately detecting mutations of known worms

    An Introduction to Malware

    Get PDF

    Malware: A future framework for Device, Network and Service Management

    Get PDF
    While worms and their propagation have been a major security threat over the past years, causing major financial losses and down times for many enterprises connected to the Internet, we will argue in this paper that valuable lessons can be learned from them and that network management, which is the activity supposed to prevent them, can actually benefit from their use. We focus on five lessons learned from current malware that can benefit to the network management community. For each topic, we analyse how it is been addressed in standard management frameworks, we identify their limits and describe how current malware already provides efficient solutions to these limits. We illustrate our claim through a case study on a realistic application of worm based network management, which is currently developed in our group

    Polygraph: Automatically generating signatures for polymorphic worms

    Get PDF
    It is widely believed that content-signature-based intrusion detection systems (IDSes) are easily evaded by polymorphic worms, which vary their payload on every infection attempt. In this paper, we present Polygraph, a signature generation system that successfully produces signatures that match polymorphic worms. Polygraph generates signatures that consist of multiple disjoint content sub-strings. In doing so, Polygraph leverages our insight that for a real-world exploit to function properly, multiple invariant substrings must often be present in all variants of a payload; these substrings typically correspond to protocol framing, return addresses, and in some cases, poorly obfuscated code. We contribute a definition of the polymorphic signature generation problem; propose classes of signature suited for matching polymorphic worm payloads; and present algorithms for automatic generation of signatures in these classes. Our evaluation of these algorithms on a range of polymorphic worms demonstrates that Polygraph produces signatures for polymorphic worms that exhibit low false negatives and false positives. © 2005 IEEE

    The malSource dataset: quantifying complexity and code reuse in malware development

    Get PDF
    During the last decades, the problem of malicious and unwanted software (malware) has surged in numbers and sophistication. Malware plays a key role in most of today's cyberattacks and has consolidated as a commodity in the underground economy. In this paper, we analyze the evolution of malware from 1975 to date from a software engineering perspective. We analyze the source code of 456 samples from 428 unique families and obtain measures of their size, code quality, and estimates of the development costs (effort, time, and number of people). Our results suggest an exponential increment of nearly one order of magnitude per decade in aspects such as size and estimated effort, with code quality metrics similar to those of benign software. We also study the extent to which code reuse is present in our dataset. We detect a significant number of code clones across malware families and report which features and functionalities are more commonly shared. Overall, our results support claims about the increasing complexity of malware and its production progressively becoming an industry.This work was supported in part by the Spanish Government through MINECO grants SMOG-DEV (TIN2016-79095-C2-2-R) and DEDETIS (TIN2015-7013-R), and in part by the Regional Government of Madrid through grantsCIBERDINE (S2013/ICE-3095) and N-GREENS (S2013/ICE-2731)
    corecore