61 research outputs found

    Security Analysis of the OWASP Benchmark with Julia

    Get PDF
    Among the various facets of cybersecurity, software security plays a crucial role. This requires the assessment of the security of programs and web applications exposed to the external world and consequently potential targets of attacks like SQL-injections, crosssite scripting, boundary violations, and command injections. The OWASP Benchmark Project developed a Java benchmark that contains thousands of test programs, featuring such security breaches. Its goal is to measure the ability of an analysis tool to identify vulnerabilities and its precision. We present how the Julia static analyzer, a sound tool based on abstract interpretation, performs on this benchmark in terms of soundness and precision. We discuss the details of its security analysis over a taint analysis of data, implemented through binary decision diagrams

    Static analysis for discovering IoT vulnerabilities

    Get PDF
    The Open Web Application Security Project (OWASP), released the \u201cOWASP Top 10 Internet of Things 2018\u201d list of the high-priority security vulnerabilities for IoT systems. The diversity of these vulnerabilities poses a great challenge toward development of a robust solution for their detection and mitigation. In this paper, we discuss the relationship between these vulnerabilities and the ones listed by OWASP Top 10 (focused on Web applications rather than IoT systems), how these vulnerabilities can actually be exploited, and in which cases static analysis can help in preventing them. Then, we present an extension of an industrial analyzer (Julia) that already covers five out of the top seven vulnerabilities of OWASP Top 10, and we discuss which IoT Top 10 vulnerabilities might be detected by the existing analyses or their extension. The experimental results present the application of some existing Julia\u2019s analyses and their extension to IoT systems, showing its effectiveness of the analysis of some representative case studies

    Cross-Programming Language Taint Analysis for the IoT Ecosystem

    Get PDF
    The Internet of Things (IoT) is a key component for the next disruptive technologies. However, IoT merges together several diverse software layers: embedded, enterprise, and cloud programs interact with each other. In addition, security and privacy vulnerabilities of IoT software might be particularly dangerous due to the pervasiveness and physical nature of these systems. During the last decades, static analysis, and in particular taint analysis, has been widely applied to detect software vulnerabilities. Unfortunately, these analyses assume that software is entirely written in a single programming language, and they are not immediately suitable to detect IoT vulnerabilities where many different software components, written in different programming languages, interact. This paper discusses how to leverage existing static taint analyses to a cross-programming language scenario

    Concurrency and static analysis

    Get PDF
    The thesis describes three important contributions developed during my doctoral course, all involving the use and the verification of concurrent Java code: Binary decision diagrams, or BDDs, are data structures for the representation of Boolean functions. These functions are of great importance in many fields. It turns out that BDDs are the state-of-the-art representation for Boolean functions, and indeed all real world applications use a BDD library to represent and manipulate Boolean functions. It can be desirable to perform Boolean operations from different threads at the same time. In order to do this, the BDD library in use must allow threads to access BDD data safely, avoiding race conditions. We developed a Java BDD library, that is fast in both single and multi-threaded applications, that we use in the Julia static program analyzer. We defined a sound static analysis that identifies if and where a Java bytecode program lets data flow from tainted user input (including servlet requests) into critical operations that might give rise to injections. Data flow is a prerequisite to injections, but the user of the analysis must later gage the actual risk of the flow. Namely, analysis approximations might lead to false alarms and proper input validation might make actual flows harmless. Our analysis works by translating Java bytecode into Boolean formulas that express all possible explicit flows of tainted data. The choice of Java bytecode simplifies the semantics and its abstraction (many high-level constructs must not be explicitly considered) and lets us analyze programs whose source code is not available, as is typically the case in industrial contexts that use software developed by third parties, such as banks. The standard approach to prevent data races is to follow a locking discipline while accessing shared data: always hold a given lock when accessing a given shared datum. It is all too easy for a programmer to violate the locking discipline. Therefore, tools are desirable for formally expressing the locking discipline and for verifying adherence to it. The book Java Concurrency in Practice (JCIP) proposed the @GuardedBy annotation to express a locking discipline. The original @GuardedBy annotation was designed for simple intra-class synchronization policy declaration. @GuardedBy fields and methods are supposed to be accessed only when holding the appropriate lock, referenced by another field, in the body of the class (or this). In simple cases, a quick visual inspection of the class code performed by the programmer is sufficient to verify the synchronization policy correctness. However, when we think deeper about the meaning of this annotation, and when we try to check and infer it, some ambiguities rise. Given these ambiguities of the specification for @GuardedBy, different tools interpret it in different ways. Moreover, it does not prevent data races, thus not satisfying its design goals. We provide a formal specification that satisfies its design goals and prevents data races. We have also implemented our specification in the Julia analyzer, that uses abstract interpretation to infer valid @GuardedBy annotations for unannotated programs. It is not the goal of this implementation to detect data races or give a guarantee that they do not exist. Julia determines what locking discipline a program uses, without judging whether the discipline is too strict or too lax for some particular purpose

    Sawja: Static Analysis Workshop for Java

    Get PDF
    Static analysis is a powerful technique for automatic verification of programs but raises major engineering challenges when developing a full-fledged analyzer for a realistic language such as Java. This paper describes the Sawja library: a static analysis framework fully compliant with Java 6 which provides OCaml modules for efficiently manipulating Java bytecode programs. We present the main features of the library, including (i) efficient functional data-structures for representing program with implicit sharing and lazy parsing, (ii) an intermediate stack-less representation, and (iii) fast computation and manipulation of complete programs

    Preface

    Get PDF

    Improving the Usability of Static Analysis Tools Using Machine Learning

    Get PDF
    Static analysis can be useful for developers to detect critical security flaws and bugs in software. However, due to challenges such as scalability and undecidability, static analysis tools often have performance and precision issues that reduce their usability and thus limit their wide adoption. In this dissertation, we present machine learning-based approaches to improve the adoption of static analysis tools by addressing two usability challenges: false positive error reports and proper tool configuration. First, false positives are one of the main reasons developers give for not using static analysis tools. To address this issue, we developed a novel machine learning approach for learning directly from program code to classify the analysis results as true or false positives. The approach has two steps: (1) data preparation that transforms source code into certain input formats for processing by sophisticated machine learning techniques; and (2) using the sophisticated machine learning techniques to discover code structures that cause false positive error reports and to learn false positive classification models. To evaluate the effectiveness and efficiency of this approach, we conducted a systematic, comparative empirical study of four families of machine learning algorithms, namely hand-engineered features, bag of words, recurrent neural networks, and graph neural networks, for classifying false positives. In this study, we considered two application scenarios using multiple ground-truth program sets. Overall, the results suggest that recurrent neural networks outperformed the other algorithms, although interesting tradeoffs are present among all techniques. Our observations also provide insight into the future research needed to speed the adoption of machine learning approaches in practice. Second, many static program verification tools come with configuration options that present tradeoffs between performance, precision, and soundness to allow users to customize the tools for their needs. However, understanding the impact of these options and correctly tuning the configurations is a challenging task, requiring domain expertise and extensive experimentation. To address this issue, we developed an automatic approach, auto-tune, to configure verification tools for given target programs. The key idea of auto-tune is to leverage a meta-heuristic search algorithm to probabilistically scan the configuration space using machine learning models both as a fitness function and as an incorrect result filter. auto-tune is tool- and language-agnostic, making it applicable to any off-the-shelf configurable verification tool. To evaluate the effectiveness and efficiency of auto-tune, we applied it to four popular program verification tools for C and Java and conducted experiments under two use-case scenarios. Overall, the results suggest that running verification tools using auto-tune produces results that are comparable to configurations manually-tuned by experts, and in some cases improve upon them with reasonable precision

    Web Bot detection using mouse movement

    Get PDF
    Non-Legitime traffic in terms of automated internet bot traffic is a long-standing problem causing a huge economic impact and lack of trust in companies and administrations worldwide. For years, Artificial Intelligence and especially Machine Learning have been a key players fighting and helping the stakeholder to analyse and detect fraud instances automatically. However, it does not exist a reliable ground truth public dataset to evaluate and compare the proposed methodologies in the literature. Throughout this thesis, it is developed a public dataset consisting of legitimate and fraudulent web mouse movements extracted from real bot engines. In addition, it is evaluated using two Machine Learning models based on Decisions Tree classifier called LightGBM whilst the second one is based on Recurrent Neural Networks outperforming the accuracyEl tráfico no legítimo en términos de tráfico automatizado de bots de Internet es un problema que se perpetua durante el tiempo. Este tráfico provoca un gran impacto económico y desconfianza en empresas y administraciones en todo el mundo. Durante años, la Inteligencia Artificial y especialmente el Machine Learning han sido un jugador clave para lucha y ayuda a la parte interesada a analizar y detectar instancias de fraude automáticamente. Sin embargo, no existe un conjunto de datos públicos fiables y verídicos para evaluar y comparar las metodologías propuestas en la literatura. A lo largo de esta tesis, se ha desarrolado un conjunto de datos público que consiste en movimientos legítimos y fraudulentos de ratón. Además, se evalúa mediante dos modelos Machine Learning basados en un clasificador Decisions Tree llamado LightGBM y el segundo se basa en Redes Neuronales Recurrentes.El tràfic no legítim en termes de trànsit automatitzat de bots d'Internet és un problema que es perpetua durant el temps. Aquest tràfic provoca un gran impacte econòmic i desconfiança en empreses i administracions en tot el món. Durant anys, la Intel·ligència Artificial i especialment el Machine Learning han estat un jugador clau per a la lluita i l'ajuda a la part interessada a analitzar i detectar instàncies de frau automàticament. Però, no existeix un conjunt de dades públiques fiables i verídiques per a avaluar i comparar les metodologies propostes en la literatura. Alllarg d'aquesta tesi, s'ha desenvolupat un conjunt de dades públics que consisteix en moviments legítims i fraudulents de ratolí. A més, s'avalua mitjançant dos models de Machine Learning basats en un classificador Decisions Tree anomenat LightGBM i el segon es basa en Xarxes Neuronals Recurrents
    • …
    corecore