34 research outputs found

    Probabilistic Naming of Functions in Stripped Binaries

    Get PDF
    Debugging symbols in binary executables carry the names of functions and global variables. When present, they greatly simplify the process of reverse engineering, but they are almost always removed (stripped) for deployment. We present the design and implementation of punstrip, a tool which combines a probabilistic fingerprint of binary code based on high-level features with a probabilistic graphical model to learn the relationship between function names and program structure. As there are many naming conventions and developer styles, functions from different applications do not necessarily have the exact same name, even if they implement the exact same functionality. We therefore evaluate punstrip across three levels of name matching: exact; an approach based on natural language processing of name components; and using Symbol2Vec, a new embedding of function names based on random walks of function call graphs. We show that our approach is able to recognize functions compiled across different compilers and optimization levels and then demonstrate that punstrip can predict semantically similar function names based on code structure. We evaluate our approach over open source C binaries from the Debian Linux distribution and compare against the state of the art

    Malware classification using self organising feature maps and machine activity data

    Get PDF
    In this article we use machine activity metrics to automatically distinguish between malicious and trusted portable executable software samples. The motivation stems from the growth of cyber attacks using techniques that have been employed to surreptitiously deploy Advanced Persistent Threats (APTs). APTs are becoming more sophisticated and able to obfuscate much of their identifiable features through encryption, custom code bases and in-memory execution. Our hypothesis is that we can produce a high degree of accuracy in distinguishing malicious from trusted samples using Machine Learning with features derived from the inescapable footprint left behind on a computer system during execution. This includes CPU, RAM, Swap use and network traffic at a count level of bytes and packets. These features are continuous and allow us to be more flexible with the classification of samples than discrete features such as API calls (which can also be obfuscated) that form the main feature of the extant literature. We use these continuous data and develop a novel classification method using Self Organizing Feature Maps to reduce over fitting during training through the ability to create unsupervised clusters of similar ‘behaviour’ that are subsequently used as features for classification, rather than using the raw data. We compare our method to a set of machine classification methods that have been applied in previous research and demonstrate an increase of between 7.24% and 25.68% in classification accuracy using our method and an unseen dataset over the range of other machine classification methods that have been applied in previous research

    Exploiting tactics, techniques, and procedures for malware detection

    Get PDF
    There has been a meteoric rise in the use of malware to perpetrate cybercrime and more generally, serve the interests of malicious actors. As a result, malware has evolved both in terms of its sheer variety and sophistication. There is hence a need for developing effective malware detection systems to counter this surge. Typically, most such systems nowadays are purely data-driven - they utilise Machine Learning (ML) based approaches which rely on large volumes of data, to spot patterns, detect anomalies, and thus detect malware. In this thesis, we propose a methodology for malware detection on networks that combines human domain knowledge with conventional malware detection approaches to more effectively identify, reason about, and be resilient to malware. Specifically, we use domain knowledge in the form of the Tactics, Techniques, and Procedures (TTPs) described in the MITRE ATT\&CK ontology of adversarial behaviour to build Network Intrusion Detection Systems (NIDS). Through the course of our research, we design and evaluate the first such NIDS that can effectively exploit TTPs for the purpose of malware detection. We then attempt to expand the scope of usability of these TTPs to systems other than our specialised NIDS, and develop a methodology that lets any generic ML-based NIDS exploit these TTPs as model features. We further expand and generalise our approach by modelling it as a multi-label classification problem, which enables us to: (i) detect malware more precisely on the basis of individual TTPs, and (ii) identify the malicious usage of uncommon or rarely-used TTPs. Throughout all our experiments, we rigorously evaluate all our systems on several metrics using large datasets of real-world malware and benign samples. We empirically demonstrate the usefulness of TTPs in the malware detection process, the benefits of a TTP-based approach in reasoning about malware and responding to various challenging conditions, and the overall robustness of our systems to adversarial attack. As a consequence, we establish and improve the state-of-the-art when it comes to detecting network-based malware using TTP-based information. This thesis overall represents a step forward in building automated systems that combine purely-data driven approaches with human expertise in the field of malware analysis

    Automated Approach to Intrusion Detection in VM-based Dynamic Execution Environment

    Get PDF
    Because virtual computing platforms are dynamically changing, it is difficult to build high-quality intrusion detection system. In this paper, we present an automated approach to intrusions detection in order to maintain sufficient performance and reduce dependence on execution environment. We discuss a hidden Markov model strategy for abnormality detection using frequent system call sequences, letting us identify attacks and intrusions automatically and efficiently. We also propose an automated mining algorithm, named AGAS, to generate frequent system call sequences. In our approach, the detection performance is adaptively tuned according to the execution state every period. To improve performance, the period value is also under self-adjustment

    Effective, Efficient, and Robust Packing Detection and Classification

    Get PDF
    International audiencePacking is a widespread tool to prevent static malware detection and analysis. Detecting and classifying the packer used by a given malware sample is fundamental to being able to unpack and study the malware, whether manually or automatically. Existing literature on packing detection and classification has focused on effectiveness, but does not consider the efficiency required to be part of a practical malware-analysis workflow. This paper studies how to train packing detection and classification algorithms based on machine learning to be both highly effective and efficient. Initially, we create ground truths by labeling more than 280,000 samples with three different techniques. Then we perform feature selection considering the contribution and computation cost of features. Then we iterate over more than 1,500 combinations of features, scenarios, and algorithms to determine which algorithms are the most effective and efficient, finding that a reduction of 1-2% effectiveness can increase efficiency by 17-44 times. Then, we test how the best algorithms perform against malware collected after the training data to assess them against new packing techniques and versions, finding a large impact of the ground truth used on algorithm robustness. Finally, we perform an economic analysis and find simple algorithms with small feature sets to be more economical than complex algorithms with large feature sets based on uptime/training time ratio

    A Systematic Review of the State of Cyber-Security in Water Systems

    Get PDF
    Critical infrastructure systems are evolving from isolated bespoke systems to those that use general-purpose computing hosts, IoT sensors, edge computing, wireless networks and artificial intelligence. Although this move improves sensing and control capacity and gives better integration with business requirements, it also increases the scope for attack from malicious entities that intend to conduct industrial espionage and sabotage against these systems. In this paper, we review the state of the cyber-security research that is focused on improving the security of the water supply and wastewater collection and treatment systems that form part of the critical national infrastructure. We cover the publication statistics of the research in this area, the aspects of security being addressed, and future work required to achieve better cyber-security for water systems

    Análise de malware com suporte de hardware

    Get PDF
    Orientadores: Paulo Lício de Geus, André Ricardo Abed GrégioDissertação (mestrado) - Universidade Estadual de Campinas, Instituto de ComputaçãoResumo: O mundo atual é impulsionado pelo uso de sistemas computacionais, estando estes pre- sentes em todos aspectos da vida cotidiana. Portanto, o correto funcionamento destes é essencial para se assegurar a manutenção das possibilidades trazidas pelos desenvolvi- mentos tecnológicos. Contudo, garantir o correto funcionamento destes não é uma tarefa fácil, dado que indivíduos mal-intencionados tentam constantemente subvertê-los visando benefíciar a si próprios ou a terceiros. Os tipos mais comuns de subversão são os ataques por códigos maliciosos (malware), capazes de dar a um atacante controle total sobre uma máquina. O combate à ameaça trazida por malware baseia-se na análise dos artefatos coletados de forma a permitir resposta aos incidentes ocorridos e o desenvolvimento de contramedidas futuras. No entanto, atacantes têm se especializado em burlar sistemas de análise e assim manter suas operações ativas. Para este propósito, faz-se uso de uma série de técnicas denominadas de "anti-análise", capazes de impedir a inspeção direta dos códigos maliciosos. Dentre essas técnicas, destaca-se a evasão do processo de análise, na qual são empregadas exemplares capazes de detectar a presença de um sistema de análise para então esconder seu comportamento malicioso. Exemplares evasivos têm sido cada vez mais utilizados em ataques e seu impacto sobre a segurança de sistemas é considerá- vel, dado que análises antes feitas de forma automática passaram a exigir a supervisão de analistas humanos em busca de sinais de evasão, aumentando assim o custo de se manter um sistema protegido. As formas mais comuns de detecção de um ambiente de análise se dão através da detecção de: (i) código injetado, usado pelo analista para inspecionar a aplicação; (ii) máquinas virtuais, usadas em ambientes de análise por questões de escala; (iii) efeitos colaterais de execução, geralmente causados por emuladores, também usados por analistas. Para lidar com malware evasivo, analistas tem se valido de técnicas ditas transparentes, isto é, que não requerem injeção de código nem causam efeitos colaterais de execução. Um modo de se obter transparência em um processo de análise é contar com suporte do hardware. Desta forma, este trabalho versa sobre a aplicação do suporte de hardware para fins de análise de ameaças evasivas. No decorrer deste texto, apresenta-se uma avaliação das tecnologias existentes de suporte de hardware, dentre as quais máqui- nas virtuais de hardware, suporte de BIOS e monitores de performance. A avaliação crítica de tais tecnologias oferece uma base de comparação entre diferentes casos de uso. Além disso, são enumeradas lacunas de desenvolvimento existentes atualmente. Mais que isso, uma destas lacunas é preenchida neste trabalho pela proposição da expansão do uso dos monitores de performance para fins de monitoração de malware. Mais especificamente, é proposto o uso do monitor BTS para fins de construção de um tracer e um debugger. O framework proposto e desenvolvido neste trabalho é capaz, ainda, de lidar com ataques do tipo ROP, um dos mais utilizados atualmente para exploração de vulnerabilidades. A avaliação da solução demonstra que não há a introdução de efeitos colaterais, o que per- mite análises de forma transparente. Beneficiando-se desta característica, demonstramos a análise de aplicações protegidas e a identificação de técnicas de evasãoAbstract: Today¿s world is driven by the usage of computer systems, which are present in all aspects of everyday life. Therefore, the correct working of these systems is essential to ensure the maintenance of the possibilities brought about by technological developments. However, ensuring the correct working of such systems is not an easy task, as many people attempt to subvert systems working for their own benefit. The most common kind of subversion against computer systems are malware attacks, which can make an attacker to gain com- plete machine control. The fight against this kind of threat is based on analysis procedures of the collected malicious artifacts, allowing the incident response and the development of future countermeasures. However, attackers have specialized in circumventing analysis systems and thus keeping their operations active. For this purpose, they employ a series of techniques called anti-analysis, able to prevent the inspection of their malicious codes. Among these techniques, I highlight the analysis procedure evasion, that is, the usage of samples able to detect the presence of an analysis solution and then hide their malicious behavior. Evasive examples have become popular, and their impact on systems security is considerable, since automatic analysis now requires human supervision in order to find evasion signs, which significantly raises the cost of maintaining a protected system. The most common ways for detecting an analysis environment are: i) Injected code detec- tion, since injection is used by analysts to inspect applications on their way; ii) Virtual machine detection, since they are used in analysis environments due to scalability issues; iii) Execution side effects detection, usually caused by emulators, also used by analysts. To handle evasive malware, analysts have relied on the so-called transparent techniques, that is, those which do not require code injection nor cause execution side effects. A way to achieve transparency in an analysis process is to rely on hardware support. In this way, this work covers the application of the hardware support for the evasive threats analysis purpose. In the course of this text, I present an assessment of existing hardware support technologies, including hardware virtual machines, BIOS support, performance monitors and PCI cards. My critical evaluation of such technologies provides basis for comparing different usage cases. In addition, I pinpoint development gaps that currently exists. More than that, I fill one of these gaps by proposing to expand the usage of performance monitors for malware monitoring purposes. More specifically, I propose the usage of the BTS monitor for the purpose of developing a tracer and a debugger. The proposed framework is also able of dealing with ROP attacks, one of the most common used technique for remote vulnerability exploitation. The framework evaluation shows no side-effect is introduced, thus allowing transparent analysis. Making use of this capability, I demonstrate how protected applications can be inspected and how evasion techniques can be identifiedMestradoCiência da ComputaçãoMestre em Ciência da ComputaçãoCAPE

    Avatar captcha : telling computers and humans apart via face classification and mouse dynamics.

    Get PDF
    Bots are malicious, automated computer programs that execute malicious scripts and predefined functions on an affected computer. They pose cybersecurity threats and are one of the most sophisticated and common types of cybercrime tools today. They spread viruses, generate spam, steal personal sensitive information, rig online polls and commit other types of online crime and fraud. They sneak into unprotected systems through the Internet by seeking vulnerable entry points. They access the system’s resources like a human user does. Now the question arises how do we counter this? How do we prevent bots and on the other hand allow human users to access the system resources? One solution is by designing a CAPTCHA (Completely Automated Public Turing Tests to tell Computers and Humans Apart), a program that can generate and grade tests that most humans can pass but computers cannot. It is used as a tool to distinguish humans from malicious bots. They are a class of Human Interactive Proofs (HIPs) meant to be easily solvable by humans and economically infeasible for computers. Text CAPTCHAs are very popular and commonly used. For each challenge, they generate a sequence of alphabets by distorting standard fonts, requesting users to identify them and type them out. However, they are vulnerable to character segmentation attacks by bots, English language dependent and are increasingly becoming too complex for people to solve. A solution to this is to design Image CAPTCHAs that use images instead of text and require users to identify certain images to solve the challenges. They are user-friendly and convenient for human users and a much more challenging problem for bots to solve. In today’s Internet world the role of user profiling or user identification has gained a lot of significance. Identity thefts, etc. can be prevented by providing authorized access to resources. To achieve timely response to a security breach frequent user verification is needed. However, this process must be passive, transparent and non-obtrusive. In order for such a system to be practical it must be accurate, efficient and difficult to forge. Behavioral biometric systems are usually less prominent however, they provide numerous and significant advantages over traditional biometric systems. Collection of behavior data is non-obtrusive and cost-effective as it requires no special hardware. While these systems are not unique enough to provide reliable human identification, they have shown to be highly accurate in identity verification. In accomplishing everyday tasks, human beings use different styles, strategies, apply unique skills and knowledge, etc. These define the behavioral traits of the user. Behavioral biometrics attempts to quantify these traits to profile users and establish their identity. Human computer interaction (HCI)-based biometrics comprise of interaction strategies and styles between a human and a computer. These unique user traits are quantified to build profiles for identification. A specific category of HCI-based biometrics is based on recording human interactions with mouse as the input device and is known as Mouse Dynamics. By monitoring the mouse usage activities produced by a user during interaction with the GUI, a unique profile can be created for that user that can help identify him/her. Mouse-based verification approaches do not record sensitive user credentials like usernames and passwords. Thus, they avoid privacy issues. An image CAPTCHA is proposed that incorporates Mouse Dynamics to help fortify it. It displays random images obtained from Yahoo’s Flickr. To solve the challenge the user must identify and select a certain class of images. Two theme-based challenges have been designed. They are Avatar CAPTCHA and Zoo CAPTCHA. The former displays human and avatar faces whereas the latter displays different animal species. In addition to the dynamically selected images, while attempting to solve the CAPTCHA, the way each user interacts with the mouse i.e. mouse clicks, mouse movements, mouse cursor screen co-ordinates, etc. are recorded nonobtrusively at regular time intervals. These recorded mouse movements constitute the Mouse Dynamics Signature (MDS) of the user. This MDS provides an additional secure technique to segregate humans from bots. The security of the CAPTCHA is tested by an adversary executing a mouse bot attempting to solve the CAPTCHA challenges
    corecore