6 research outputs found

    Taming the Static Analysis Beast

    Get PDF
    While industrial-strength static analysis over large, real-world codebases has become commonplace, so too have difficult-to-analyze language constructs, large libraries, and popular frameworks. These features make constructing and evaluating a novel, sound analysis painful, error-prone, and tedious. We motivate the need for research to address these issues by highlighting some of the many challenges faced by static analysis developers in today\u27s software ecosystem. We then propose our short- and long-term research agenda to make static analysis over modern software less burdensome

    Lossless, Persisted Summarization of Static Callgraph, Points-To and Data-Flow Analysis

    Get PDF
    Static analysis is used to automatically detect bugs and security breaches, and aids compiler optimization. Whole-program analysis (WPA) can yield high precision, however causes long analysis times and thus does not match common software-development workflows, making it often impractical to use for large, real-world applications. This paper thus presents the design and implementation of ModAlyzer, a novel static-analysis approach that aims at accelerating whole-program analysis by making the analysis modular and compositional. It shows how to compute lossless, persisted summaries for callgraph, points-to and data-flow information, and it reports under which circumstances this function-level compositional analysis outperforms WPA. We implemented ModAlyzer as an extension to LLVM and PhASAR, and applied it to 12 real-world C and C++ applications. At analysis time, ModAlyzer modularly and losslessly summarizes the analysis effect of the library code those applications share, hence avoiding its repeated re-analysis. The experimental results show that the reuse of these summaries can save, on average, 72% of analysis time over WPA. Moreover, because it is lossless, the module-wise analysis fully retains precision and recall. Surprisingly, as our results show, it sometimes even yields precision superior to WPA. The initial summary generation, on average, takes about 3.67 times as long as WPA

    Securing the software-defined networking control plane by using control and data dependency techniques

    Get PDF
    Software-defined networking (SDN) fundamentally changes how network and security practitioners design, implement, and manage their networks. SDN decouples the decision-making about traffic forwarding (i.e., the control plane) from the traffic being forwarded (i.e., the data plane). SDN also allows for network applications, or apps, to programmatically control network forwarding behavior and policy through a logically centralized control plane orchestrated by a set of SDN controllers. As a result of logical centralization, SDN controllers act as network operating systems in the coordination of shared data plane resources and comprehensive security policy implementation. SDN can support network security through the provision of security services and the assurances of policy enforcement. However, SDN’s programmability means that a network’s security considerations are different from those of traditional networks. For instance, an adversary who manipulates the programmable control plane can leverage significant control over the data plane’s behavior. In this dissertation, we demonstrate that the security posture of SDN can be enhanced using control and data dependency techniques that track information flow and enable understanding of application composability, control and data plane decoupling, and control plane insight. We support that statement through investigation of the various ways in which an attacker can use control flow and data flow dependencies to influence the SDN control plane under different threat models. We systematically explore and evaluate the SDN security posture through a combination of runtime, pre-runtime, and post-runtime contributions in both attack development and defense designs. We begin with the development a conceptual accountability framework for SDN. We analyze the extent to which various entities within SDN are accountable to each other, what they are accountable for, mechanisms for assurance about accountability, standards by which accountability is judged, and the consequences of breaching accountability. We discover significant research gaps in SDN’s accountability that impact SDN’s security posture. In particular, the results of applying the accountability framework showed that more control plane attribution is necessary at different layers of abstraction, and that insight motivated the remaining work in this dissertation. Next, we explore the influence of apps in the SDN control plane’s secure operation. We find that existing access control protections that limit what apps can do, such as role-based access controls, prove to be insufficient for preventing malicious apps from damaging control plane operations. The reason is SDN’s reliance on shared network state. We analyze SDN’s shared state model to discover that benign apps can be tricked into acting as “confused deputies”; malicious apps can poison the state used by benign apps, and that leads the benign apps to make decisions that negatively affect the network. That violates an implicit (but unenforced) integrity policy that governs the network’s security. Because of the strong interdependencies among apps that result from SDN’s shared state model, we show that apps can be easily co-opted as “gadgets,” and that allows an attacker who minimally controls one app to make changes to the network state beyond his or her originally granted permissions. We use a data provenance approach to track the lineage of the network state objects by assigning attribution to the set of processes and agents responsible for each control plane object. We design the ProvSDN tool to track API requests from apps as they access the shared network state’s objects, and to check requests against a predefined integrity policy to ensure that low-integrity apps cannot poison high-integrity apps. ProvSDN acts as both a reference monitor and an information flow control enforcement mechanism. Motivated by the strong inter-app dependencies, we investigate whether implicit data plane dependencies affect the control plane’s secure operation too. We find that data plane hosts typically have an outsized effect on the generation of the network state in reactive-based control plane designs. We also find that SDN’s event-based design, and the apps that subscribe to events, can induce dependencies that originate in the data plane and that eventually change forwarding behaviors. That combination gives attackers that are residing on data plane hosts significant opportunities to influence control plane decisions without having to compromise the SDN controller or apps. We design the EventScope tool to automatically identify where such vulnerabilities occur. EventScope clusters apps’ event usage to decide in which cases unhandled events should be handled, statically analyzes controller and app code to understand how events affect control plane execution, and identifies valid control flow paths in which a data plane attacker can reach vulnerable code to cause unintended data plane changes. We use EventScope to discover 14 new vulnerabilities, and we develop exploits that show how such vulnerabilities could allow an attacker to bypass an intended network (i.e., data plane) access control policy. This research direction is critical for SDN security evaluation because such vulnerabilities could be induced by host-based malware campaigns. Finally, although there are classes of vulnerabilities that can be removed prior to deployment, it is inevitable that other classes of attacks will occur that cannot be accounted for ahead of time. In those cases, a network or security practitioner would need to have the right amount of after-the-fact insight to diagnose the root causes of such attacks without being inundated with too much informa- tion. Challenges remain in 1) the modeling of apps and objects, which can lead to overestimation or underestimation of causal dependencies; and 2) the omission of a data plane model that causally links control and data plane activities. We design the PicoSDN tool to mitigate causal dependency modeling challenges, to account for a data plane model through the use of the data plane topology to link activities in the provenance graph, and to account for network semantics to appropriately query and summarize the control plane’s history. We show how prior work can hinder investigations and analysis in SDN-based attacks and demonstrate how PicoSDN can track SDN control plane attacks.Ope

    Learning representations for effective and explainable software bug detection and fixing

    Get PDF
    Software has an integral role in modern life; hence software bugs, which undermine software quality and reliability, have substantial societal and economic implications. The advent of machine learning and deep learning in software engineering has led to major advances in bug detection and fixing approaches, yet they fall short of desired precision and recall. This shortfall arises from the absence of a \u27bridge,\u27 known as learning code representations, that can transform information from source code into a suitable representation for effective processing via machine and deep learning. This dissertation builds such a bridge. Specifically, it presents solutions for effectively learning code representations using four distinct methods?context-based, testing results-based, tree-based, and graph-based?thus improving bug detection and fixing approaches, as well as providing developers insight into the foundational reasoning. The experimental results demonstrate that using learning code representations can significantly enhance explainable bug detection and fixing, showcasing the practicability and meaningfulness of the approaches formulated in this dissertation toward improving software quality and reliability

    Effficient Graph-based Computation and Analytics

    Get PDF
    With data explosion in many domains, such as social media, big code repository, Internet of Things (IoT), and inertial sensors, only 32% of data available to academic and industry is put to work, and the remaining 68% goes unleveraged. Moreover, people are facing an increasing number of obstacles concerning complex analytics on the sheer size of data, which include 1) how to perform dynamic graph analytics in a parallel and robust manner within a reasonable time? 2) How to conduct performance optimizations on a property graph representing and consisting of the semantics of code, data, and runtime systems for big data applications? 3) How to innovate neural graph approaches (ie, Transformer) to solve realistic research problems, such as automated program repair and inertial navigation? To tackle these problems, I present two efforts along this road: efficient graph-based computation and intelligent graph analytics. Specifically, I firstly propose two theory-based dynamic graph models to characterize temporal trends in large social media networks, then implement and optimize them atop Apache Spark GraphX to improve their performances. In addition, I investigate a semantics-aware optimization framework consisting of offline static analysis and online dynamic analysis on a property graph representing the skeleton of a data-intensive application, to interactively and semi-automatically assist programmers to scrutinize the performance problems camouflaged in the source code. In the design of intelligent graph-based algorithms, I innovate novel neural graph-based approaches with multi-task learning techniques to repair a broad range of programming bugs automatically, and also improve the accuracy of pedestrian navigation systems in only consideration of sensor data of Inertial Measurement Units (IMU, ie accelerometer, gyroscope, and magnetometer). In this dissertation, I elaborate on the definitions of these research problems and leverage the knowledge of graph computation, program analysis, and deep learning techniques to seek solutions to them, followed by comprehensive comparisons with the state-of-the-art baselines and discussions on future research