1,984 research outputs found

    Neural-Augmented Static Analysis of Android Communication

    Full text link
    We address the problem of discovering communication links between applications in the popular Android mobile operating system, an important problem for security and privacy in Android. Any scalable static analysis in this complex setting is bound to produce an excessive amount of false-positives, rendering it impractical. To improve precision, we propose to augment static analysis with a trained neural-network model that estimates the probability that a communication link truly exists. We describe a neural-network architecture that encodes abstractions of communicating objects in two applications and estimates the probability with which a link indeed exists. At the heart of our architecture are type-directed encoders (TDE), a general framework for elegantly constructing encoders of a compound data type by recursively composing encoders for its constituent types. We evaluate our approach on a large corpus of Android applications, and demonstrate that it achieves very high accuracy. Further, we conduct thorough interpretability studies to understand the internals of the learned neural networks.Comment: Appears in Proceedings of the 2018 ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE

    Deep Learning Software Repositories

    Get PDF
    Bridging the abstraction gap between artifacts and concepts is the essence of software engineering (SE) research problems. SE researchers regularly use machine learning to bridge this gap, but there are three fundamental issues with traditional applications of machine learning in SE research. Traditional applications are too reliant on labeled data. They are too reliant on human intuition, and they are not capable of learning expressive yet efficient internal representations. Ultimately, SE research needs approaches that can automatically learn representations of massive, heterogeneous, datasets in situ, apply the learned features to a particular task and possibly transfer knowledge from task to task. Improvements in both computational power and the amount of memory in modern computer architectures have enabled new approaches to canonical machine learning tasks. Specifically, these architectural advances have enabled machines that are capable of learning deep, compositional representations of massive data depots. The rise of deep learning has ushered in tremendous advances in several fields. Given the complexity of software repositories, we presume deep learning has the potential to usher in new analytical frameworks and methodologies for SE research and the practical applications it reaches. This dissertation examines and enables deep learning algorithms in different SE contexts. We demonstrate that deep learners significantly outperform state-of-the-practice software language models at code suggestion on a Java corpus. Further, these deep learners for code suggestion automatically learn how to represent lexical elements. We use these representations to transmute source code into structures for detecting similar code fragments at different levels of granularity—without declaring features for how the source code is to be represented. Then we use our learning-based framework for encoding fragments to intelligently select and adapt statements in a codebase for automated program repair. In our work on code suggestion, code clone detection, and automated program repair, everything for representing lexical elements and code fragments is mined from the source code repository. Indeed, our work aims to move SE research from the art of feature engineering to the science of automated discovery

    Software weaknesses detection using static-code analysis and machine learning techniques

    Get PDF
    Dissertação para obtenção do Grau de Mestre em Engenharia Informática e de ComputadoresA indústria de software desempenha um papel essencial no mundo moderno em quase todos os domínios. As vulnerabilidades são predominantes nos sistemas de software e podem resultar num impacto negativo na segurança informática. Embora existam ferramentas para detetar códigos vulneráveis, sua precisão e eficácia ainda é uma questão de pesquisa desafiante. Para definir mecanismos que identificam vulnerabilidades, muitas soluções existentes requerem trabalho árduo dos especialistas. O constante aumento do número de vulnerabilidades reveladas tornou-se uma preocupação importante na indústria de software e no campo da cibersegurança, o que implica que as atuais abordagens para a deteção de vulnerabilidades exigem melhorias adicionais. Isso tem motivado investigadores nas comunidades de engenharia de software e segurança cibernética a aplicar aprendizagem automática para reconhecimento de padrões e características de códigos vulneráveis. Seguindo esta linha de pesquisa, este trabalho apresenta um sistema de deteção de vulnerabilidades baseado em aprendizagem automática que usa análise estática de código para extrair dependências no código e construir o conjunto de dados a partir destes. A dataset foi recolhida a partir da National Vulnerability Database (NVD) e o SAMATE. A dataset contém códigos fonte Java com as vulnerabilidades Null pointer deference e command injections como alvos selecionados para caso de estudo. A Control Flow Graph (CFG) foi utilizada em conjunto com as técnicas de análise estática de código para extração de caracteristicas. Os resultados experimentais demonstram que nossa ferramenta pode alcançar significativamente menos falsos negativos (com um número razoável de falsos positivos) em comparação com outras abordagens. Além disso, aplicamos a ferramenta a produtos de software reais e fomos capazes de identificar vulnerabilidades, apesar do número de falsos positivos.Software industry plays an essential role in modern world in almost all fields. Vulnerabilities are predominant in software systems and can result in a negative impact to the computer security. Although there are tools to detect vulnerable code, their accuracy and efficacy is still a challenging research question. To define features that identify vulnerabilities, many existing solutions require hard work from human experts. The constant increasing number of revealed security vulnerabilities have become an importante concern in the software industry and in the field of cybersecurity, implying that the current approaches for vulnerability detection demand further improvement. This has motivated researchers in the software engineering and cybersecurity communities to apply machine learning for patterns recognition and characteristics of vulnerable code. Following this research line, this work presents a machine learning based vulnerability detection system that uses static-code analysis to extract dependencies in the code and build data features from these. The dataset was collected from the National Vulnerability Database (NVD) and test cases NIST SAMATE project and contains Java code as selected target programming language with Null pointer deference and command injections vulnerabilities as selected weaknesses. The data samples were generated from the source code of the vulnerable files by utilizing a control flow graph (CFG) to extract features. Data-flow analysis techniques were also used for feature extraction. Experimental results demonstrate that our tool can achieve significantly fewer false negatives (with a reasonable number of false positives) compared to other approaches. We further applied the tool to real software products and were able to identify vulnerabilities, despite the number of false positives.N/

    Man-machine partial program analysis for malware detection

    Get PDF
    With the meteoric rise in popularity of the Android platform, there is an urgent need to combat the accompanying proliferation of malware. Existing work addresses the area of consumer malware detection, but cannot detect novel, sophisticated, domain-specific malware that is targeted specifically at one aspect of an organization (eg. ground operations of the US Military). Adversaries can exploit domain knowledge to camoflauge malice within the legitimate behaviors of an app and behind a domain-specific trigger, rendering traditional approaches such as signature-matching, machine learning, and dynamic monitoring ineffective. Manual code inspections are also inadequate, scaling poorly and introducing human error. Yet, there is a dire need to detect this kind of malware before it causes catastrophic loss of life and property. This dissertation presents the Security Toolbox, our novel solution for this challenging new problem posed by DARPA\u27s Automated Program Analysis for Cybersecurity (APAC) program. We employ a human-in-the-loop approach to amplify the natural intelligence of our analysts. Our automation detects interesting program behaviors and exposes them in an analysis Dashboard, allowing the analyst to brainstorm flaw hypotheses and ask new questions, which in turn can be answered by our automated analysis primitives. The Security Toolbox is built on top of Atlas, a novel program analysis platform made by EnSoft. Atlas uses a graph-based mathematical abstraction of software to produce a unified property multigraph, exposes a powerful API for writing analyzers using graph traversals, and provides both automated and interactive capabilities to facilitate program comprehension. The Security Toolbox is also powered by FlowMiner, a novel solution to mine fine-grained, compact data flow summaries of Java libraries. FlowMiner allows the Security Toolbox to complete a scalable and accurate partial program analysis of an application without including all of the libraries that it uses (eg. Android). This dissertation presents the Security Toolbox, Atlas, and FlowMiner. We provide empirical evidence of the effectiveness of the Security Toolbox for detecting novel, sophisticated, domain-specific Android malware, demonstrating that our approach outperforms other cutting-edge research tools and state-of-the-art commercial programs in both time and accuracy metrics. We also evaluate the effectiveness of Atlas as a program analysis platform and FlowMiner as a library summary tool
    • …
    corecore