828 research outputs found
Generalized techniques for using system execution traces to support software performance analysis
This dissertation proposes generalized techniques to support software performance analysis using system execution traces in the absence of software development artifacts such as source code. The proposed techniques do not require modifications to the source code, or to the software binaries, for the purpose of software analysis (non-intrusive). The proposed techniques are also not tightly coupled to the architecture specific details of the system being analyzed. This dissertation extends the current techniques of using system execution traces to evaluate software performance properties, such as response times, service times. The dissertation also proposes a novel technique to auto-construct a dataflow model from the system execution trace, which will be useful in evaluating software performance properties. Finally, it showcases how we can use execution traces in a novel technique to detect Excessive Dynamic Memory Allocations software performance anti-pattern. This is the first attempt, according to the author\u27s best knowledge, of a technique to detect automatically the excessive dynamic memory allocations anti-pattern. The contributions from this dissertation will ease the laborious process of software performance analysis and provide a foundation for helping software developers quickly locate the causes for negative performance results via execution traces
Generating Predicate Callback Summaries for the Android Framework
One of the challenges of analyzing, testing and debugging Android apps is
that the potential execution orders of callbacks are missing from the apps'
source code. However, bugs, vulnerabilities and refactoring transformations
have been found to be related to callback sequences. Existing work on control
flow analysis of Android apps have mainly focused on analyzing GUI events. GUI
events, although being a key part of determining control flow of Android apps,
do not offer a complete picture. Our observation is that orthogonal to GUI
events, the Android API calls also play an important role in determining the
order of callbacks. In the past, such control flow information has been modeled
manually. This paper presents a complementary solution of constructing program
paths for Android apps. We proposed a specification technique, called Predicate
Callback Summary (PCS), that represents the callback control flow information
(including callback sequences as well as the conditions under which the
callbacks are invoked) in Android API methods and developed static analysis
techniques to automatically compute and apply such summaries to construct apps'
callback sequences. Our experiments show that by applying PCSs, we are able to
construct Android apps' control flow graphs, including inter-callback
relations, and also to detect infeasible paths involving multiple callbacks.
Such control flow information can help program analysis and testing tools to
report more precise results. Our detailed experimental data is available at:
http://goo.gl/NBPrKsComment: 11 page
Model Based Analysis and Test Generation for Flight Software
We describe a framework for model-based analysis and test case generation in the context of a heterogeneous model-based development paradigm that uses and combines Math- Works and UML 2.0 models and the associated code generation tools. This paradigm poses novel challenges to analysis and test case generation that, to the best of our knowledge, have not been addressed before. The framework is based on a common intermediate representation for different modeling formalisms and leverages and extends model checking and symbolic execution tools for model analysis and test case generation, respectively. We discuss the application of our framework to software models for a NASA flight mission
Recommended from our members
End-to-end deep reinforcement learning in computer systems
Abstract
The growing complexity of data processing systems has long led systems designers to imagine systems (e.g. databases, schedulers) which can self-configure and adapt based on environmental cues. In this context, reinforcement learning (RL) methods have since their inception appealed to systems developers. They promise to acquire complex decision policies from raw feedback signals. Despite their conceptual popularity, RL methods are scarcely found in real-world data processing systems. Recently, RL has seen explosive growth in interest due to high profile successes when utilising large neural networks (deep reinforcement learning). Newly emerging machine learning frameworks and powerful hardware accelerators have given rise to a plethora of new potential applications.
In this dissertation, I first argue that in order to design and execute deep RL algorithms efficiently, novel software abstractions are required which can accommodate the distinct computational patterns of communication-intensive and fast-evolving algorithms. I propose an architecture which decouples logical algorithm construction from local and distributed execution semantics. I further present RLgraph, my proof-of-concept implementation of this architecture. In RLgraph, algorithm developers can explore novel designs by constructing a high-level data flow graph through combination of logical components. This dataflow graph is independent of specific backend frameworks or notions of execution, and is only later mapped to execution semantics via a staged build process. RLgraph enables high-performing algorithm implementations while maintaining flexibility for rapid prototyping.
Second, I investigate reasons for the scarcity of RL applications in systems themselves. I argue that progress in applied RL is hindered by a lack of tools for task model design which bridge the gap between systems and algorithms, and also by missing shared standards for evaluation of model capabilities. I introduce Wield, a first-of-its-kind tool for incremental model design in applied RL. Wield provides a small set of primitives which decouple systems interfaces and deployment-specific configuration from representation. Core to Wield is a novel instructive experiment protocol called progressive randomisation which helps practitioners to incrementally evaluate different dimensions of non-determinism. I demonstrate how Wield and progressive randomisation can be used to reproduce and assess prior work, and to guide implementation of novel RL applications
Semantics-driven dataflow diagram processing.
Dataflow diagram is a commonly used tool of structured analysis and design techniques in specifications and design of a software system, and in analysis of an existing system as well. While automatic generating dataflow diagram saves system designers from tedious drawing and help them develop a new system, simulating dataflow diagrams provides system analysts with a dynamic graph and help them understand an existing system. CASE tools for dataflow diagrams play an important role in software engineering. Methodologies applied to the tools are dominant issues extensively evaluated by tools designers. Executable specifications with dataflow diagrams turn out an opportunity to execute graphic dataflow diagrams for systems analysts to simulate the behavior of a system. In this thesis, a syntax representation of dataflow diagram was developed, and a formal specification for dataflow diagram was established. A parser of this developed CASE tool translates the syntax representation of DFDs into their semantic representation. An interpreter of this tool then analyzes the DFDs semantic notations and builds a set of services of a system represented by the DFDs. This CASE tool can be used to simulate system behavior, check equivalence of two systems and detect deadlock. Based on its features, this tool can be used in every phase through entire software life cycle. Paper copy at Leddy Library: Theses & Major Papers - Basement, West Bldg. / Call Number: Thesis1998 .Z46. Source: Masters Abstracts International, Volume: 39-02, page: 0535. Adviser: Indra A. Tjandra. Thesis (M.Sc.)--University of Windsor (Canada), 1998
NPS: A Framework for Accurate Program Sampling Using Graph Neural Network
With the end of Moore's Law, there is a growing demand for rapid
architectural innovations in modern processors, such as RISC-V custom
extensions, to continue performance scaling. Program sampling is a crucial step
in microprocessor design, as it selects representative simulation points for
workload simulation. While SimPoint has been the de-facto approach for decades,
its limited expressiveness with Basic Block Vector (BBV) requires
time-consuming human tuning, often taking months, which impedes fast innovation
and agile hardware development. This paper introduces Neural Program Sampling
(NPS), a novel framework that learns execution embeddings using dynamic
snapshots of a Graph Neural Network. NPS deploys AssemblyNet for embedding
generation, leveraging an application's code structures and runtime states.
AssemblyNet serves as NPS's graph model and neural architecture, capturing a
program's behavior in aspects such as data computation, code path, and data
flow. AssemblyNet is trained with a data prefetch task that predicts
consecutive memory addresses.
In the experiments, NPS outperforms SimPoint by up to 63%, reducing the
average error by 38%. Additionally, NPS demonstrates strong robustness with
increased accuracy, reducing the expensive accuracy tuning overhead.
Furthermore, NPS shows higher accuracy and generality than the state-of-the-art
GNN approach in code behavior learning, enabling the generation of high-quality
execution embeddings
Improving Model-Based Software Synthesis: A Focus on Mathematical Structures
Computer hardware keeps increasing in complexity. Software design needs to keep up with this. The right models and abstractions empower developers to leverage the novelties of modern hardware. This thesis deals primarily with Models of Computation, as a basis for software design, in a family of methods called software synthesis.
We focus on Kahn Process Networks and dataflow applications as abstractions, both for programming and for deriving an efficient execution on heterogeneous multicores. The latter we accomplish by exploring the design space of possible mappings of computation and data to hardware resources. Mapping algorithms are not at the center of this thesis, however. Instead, we examine the mathematical structure of the mapping
space, leveraging its inherent symmetries or geometric properties to improve mapping methods in general.
This thesis thoroughly explores the process of model-based design, aiming to go beyond the more established software synthesis on dataflow applications. We starting with the problem of assessing these methods through benchmarking, and go on to formally examine the general goals of benchmarks. In this context, we also consider the role modern machine learning methods play in benchmarking.
We explore different established semantics, stretching the limits of Kahn Process Networks. We also discuss novel models, like Reactors, which are designed to be a deterministic, adaptive model with time as a first-class citizen. By investigating abstractions and transformations in the Ohua language for implicit dataflow programming, we also focus on programmability.
The focus of the thesis is in the models and methods, but we evaluate them in diverse use-cases, generally centered around Cyber-Physical Systems. These include the 5G telecommunication standard, automotive and signal processing domains. We even go beyond embedded systems and discuss use-cases in GPU programming and microservice-based architectures
Recommended from our members
Automated Testing and Debugging for Big Data Analytics
The prevalence of big data analytics in almost every large-scale software system has generated a substantial push to build data-intensive scalable computing (DISC) frameworks such as Google MapReduce and Apache Spark that can fully harness the power of existing data centers. However, frameworks once used by domain experts are now being leveraged by data scientists, business analysts, and researchers. This shift in user demographics calls for immediate advancements in the development, debugging, and testing practices of big data applications, which are falling behind compared to the DISC framework design and implementation. In practice, big data applications often fail as users are unable to test all behaviors emerging from interleaving dataflow operators, user-defined functions, and framework's code. "Testing based on a random sample" rarely guarantees the reliability and "trial and error" and "print" debugging methods are expensive and time-consuming. Thus, the current practice of developing a big data application must be improved and the tools built to enhance the developer's productivity must adapt to the distinct characteristics of data-intensive scalable computing. By synthesizing ideas from software engineering and database systems, our hypothesis is that we can design effective and scalable testing and debugging algorithms for big data analytics without compromising the performance and efficiency of the underlying DISC framework. To design such techniques, we investigate how we can build interactive and responsive debugging primitives that significantly reduce the debugging time, yet do not pose much performance overhead on big data applications. Furthermore, we investigate how we can leverage data provenance techniques from databases and fault-isolation algorithms from software engineering to pinpoint the minimal subset of failure-inducing inputs efficiently. To improve the reliability of big data analytics, we investigate how we can abstract the semantics of dataflow operators and use them in tandem with the semantics of user-defined functions to generate a minimum set of synthetic test inputs capable of revealing more defects than the entire input dataset.To examine the first hypothesis, we introduce interactive, real-time debugging primitives for big data analytics through innovative and scalable debugging features such as simulated breakpoint, dynamic watchpoint, and crash culprit identification. Second, we design a new automated fault localization approach that combines insights from both the software engineering and database literature to bring delta debugging closer to a reality in the big data applications by leveraging data provenance and by constructing systems optimizations for debugging provenance queries. Lastly, we devise a new symbolic-execution based white-box testing algorithm for big data applications that abstracts the implementation of dataflow operators using logical specifications instead of modeling their implementations and combines them with the semantics of any arbitrary user-defined function. We instantiate the idea of an interactive debugging algorithm as BigDebug, the idea of an automated debugging algorithm as BigSift, and the idea of symbolic execution-based testing as BigTest. Our investigation shows that the interactive debugging primitives can scale to terabytes---our record-level tracing incurs less than 25% overhead on average and provides up to 100% time saving compared to the baseline replay debugger. Second, we observe that by combining data provenance with delta debugging, we can identify the minimum faulty input in just under 30% of the original job execution time. Lastly, we verify that by abstracting dataflow operators using logical specifications, we can efficiently generate the most concise test data suitable for local testing while revealing twice as many faults as prior approaches. Our investigations collectively demonstrate that developer productivity can be significantly improved through effective and scalable testing and debugging techniques for big data analytics, without impacting the DISC framework's performance. This dissertation affirms the feasibility of automated debugging and testing techniques for big data analytics---techniques that were previously considered infeasible for large-scale data processing
- …