10 research outputs found

    Path projection for user-centered static analysis tools

    Get PDF
    The research and industrial communities have made great strides in developing sophisticated defect detection tools based on static analysis. However, to date most of the work in this area has focused on developing novel static analysis algorithms, and neglected study of other aspects of static analysis tools, in particular user interfaces. In this work, we present a novel user interface toolkit called Path Projection that helps users visualize, navigate, and understand program paths, a common component of many static analysis tools’ error reports. We performed a controlled user study to measure the benefit of Path Projection in triaging error reports from Locksmith, a data race detection tool for C. We found that Path Projection improved participants’ time to complete this task, without affecting accuracy, and that participants felt Path Projection was useful

    Enhancing Usability Of Malware Analysis Pipelines With Reverse Engineering

    Get PDF
    Lots of work has been done on analyzing software distributed in binary form. This is a challenging problem because of the relatively unstructured nature of binaries. To recover high-level structure, various attempts have included static and dynamic analysis. However, human inspection is often required, as high-level structure is compiled away. Recent success in this area includes work on variable-name recovery, vulnerability discovery, class recovery for object-oriented languages. We are interested in building a pipeline for user to analyze malware. In this thesis we tackle two problems central to malware analysis pipelines. The first is D3RE, an interactive querying tool that allows users to analyze binaries interactively by writing declarative rules and visualizing their results projected onto a binary. The second is Assmeblage, a tool which automatically scrapes GitHub for C and C++ repositories and builds these repositories automatically using different compilation settings to produce a variety of configurations. These two tools will enable users to get enough data to do analysis as well for them to do interactive analysis. Finally, we present future work demonstrating a possible visualization combining d3re and Ghidra along with some specific questions for future user studies

    Enhancing Usability of Malware Analysis Pipelines With Reverse Engineering

    Get PDF
    Lots of work has been done on analyzing software distributed in binary form. This is a challenging problem because of the relatively unstructured nature of binaries. To recover high-level structure, various attempts have included static and dynamic analysis. However, human inspection is often required, as high-level structure is compiled away. Recent success in this area includes work on variable-name recovery, vulnerability discovery, class recovery for object-oriented languages. We are interested in building a pipeline for user to analyze malware. In this thesis we tackle two problems central to malware analysis pipelines. The first is D3RE, an interactive querying tool that allows users to analyze binaries interactively by writing declarative rules and visualizing their results projected onto a binary. The second is Assmeblage, a tool which automatically scrapes GitHub for C and C++ repositories and builds these repositories automatically using different compilation settings to produce a variety of configurations. These two tools will enable users to get enough data to do analysis as well for them to do interactive analysis. Finally, we present future work demonstrating a possible visualization combining d3re and Ghidra along with some specific questions for future user studies

    The Relational Database: a New Static Analysis Tool?

    Get PDF
    Code comprehension is pivotal to reducing errors in software. Reading source code improves code comprehension and enables effective fixes but as a code base grows meta-data become increasingly important. Static Analysis techniques provide an avenue for software developers to learn more about their code through meta-data while also helping them safely detect potential errors in their source. Unfortunately, many Static Analysis tools have a steep learning curve and are limited in scope. This thesis seeks to make Static Analysis accessible and extensible by asking what ubiquitous tools like SQL and relational databases can offer and what they cannot. We begin to answer these questions by exploring the source code of three C++ projects (libodbc++, log4cxx, C++ Sockets Library) using a new Static Analysis tool called Trike. Initial results indicate Trike is a promising and accessible tool for analyzing the structure of a code base. With further improvements, Trike should equal more established Static Analysis tools in scope and surpass them in usabilit

    Erweiterung der automatischen statischen Codeanalyse um Social Coding

    Get PDF
    In dieser Masterarbeit wird zunächst eine Definition für Social Coding hergeleitet. Danach werden verschiedenen Ansätze für Social Coding in die drei Kategorien, Kommunikation, Kooperation und Koordination des 3C-Modells sowie nach der grundlegenden Art des Ansatzes eingeteilt. Zu den analysierten Ansätzen gehören Online-Plattformen wie Stack Overflow und GitHub sowie Entwicklungsumgebungen und Erweiterungen davon wie Cloud9 und Visual Studio Anywhere. Im Weiteren werden zwei Ansätze zur Erweiterung der statischen Code-Analyse Software FindBugs um Social Coding vorgestellt. Die erste Erweiterung bietet dem Benutzer die Möglichkeit gefundene Bugs zu Online-Plattformen zu exportieren während die zweite Erweiterung ein eigenes Bug-Tracking-System mit dem Hauptaugenmerk auf einem Kommentarsystem im Quellcode-Repository des Projekts abbildet und mit einer modernen Oberfläche präsentiert.A definition for the term social coding is derived first in this thesis. Afterwards different approaches for social coding are put in the three categories, communication, cooperation and coordination of the 3C-Model as well as grouped by the kind of their approach. The analyzed approaches consist of online platforms like Stack Overflow and GitHub and also development environments and extensions of them like Cloud9 and Visual Studio Anywhere. Apart from that two approaches that add social coding to the static code analysis software FindBugs are being presented. The first approach offers an export possibility to an online platform for bugs whereas the second approach implements social coding itself in form of a bug-tracking-system with focus on a commenting-system and presenting that through a modern user interface

    Collective program analysis

    Get PDF
    Encouraged by the success of data-driven software engineering (SE) techniques that have found numerous applications e.g. in defect prediction, specification inference, etc, the demand for mining and analyzing source code repositories at scale has significantly increased. However, analyzing source code at scale remains expensive to the extent that data-driven solutions to certain SE problems are beyond our reach today. Extant techniques have focused on leveraging distributed computing to solve this problem, but with a concomitant increase in computational resource needs. In this thesis, we propose collective program analysis (CPA), a technique to accelerate ultra-large-scale source code mining without demanding more computational resources and by utilizing the similarity between millions of source code artifacts. First, we describe the general concept of collective program analysis. Given a mining task that is required to be run on thousands of artifacts, the artifacts with similar interactions are clustered together, such that the mining task is required to be run on only one candidate from each cluster to produce the mining result and the results for other candidates in the same cluster can be produced using extrapolation. The two technical innovations of collective program analysis are: mining task specific similarity and interaction pattern graph. Mining task specific similarity is about whether two or more artifacts can be considered similar for a given mining task. An interaction pattern graph represents the interaction between the mining task and the artifact when the mining task is run on the artifact. An interaction pattern graph is used to determine mining task specific similarity between artifacts. Given a mining task and an artifact producing an interaction pattern graph soundly and efficiently can be very challenging. We propose a pre-analysis and program compaction technique to achieve this. Given a source code mining task and thousands of input programs on which the mining task needs to be run, our technique first extracts the information about what parts of an input program are relevant for the mining task and then removes the irrelevant parts from input programs, prior to running the mining task on them. Our key technical contributions are a static analysis to extract information about the parts of program that are relevant for a mining task and a sound program compaction technique that produces a reduced program on which the mining task has similar output as original program. Upon producing interaction pattern graphs of thousands of artifacts, they have to be clustered and the mining task results have to be reused between similar artifacts to achieve acceleration. In the final part of this thesis, we fully describes collective program analysis and illustrate mining millions of control flow graphs (CFGs) by clustering similar CFGs
    corecore