408 research outputs found

    Automated tailoring of system software stacks

    Get PDF
    In many industrial sectors, device manufacturers are moving away from expensive special-purpose hardware units and consolidate their systems on commodity hardware. As part of this change, developers are enabled to run their applications on general-purpose operating systems like Linux, which already supports thousands of different devices out of the box and can be used in a wide range of target scenarios. Furthermore, the Linux ecosystem allows them to integrate existing implementations of standard functionality in the form of shared libraries. However, as the libraries and the Linux kernel are designed as generic building blocks in order to support as many applications as possible, they cannot make assumptions about specific use cases for a single-purpose device. This generality leads to unnecessary overheads in narrowly defined target scenarios, as unneeded components do not only take up space on the target system but have to be maintained over the lifetime of the device as well. While the Linux kernel provides a configuration system to disable unneeded functionality like device drivers, determining the required features from over 16000 options is an infeasible task. Even worse, most shared libraries cannot be customized even though only around 10 percent of their functions are ever used by applications. In this thesis, I present my approaches for the automated identification and removal of unnecessary components in all layers of the software stack. As the configuration system is an integral part of the Linux kernel, we embrace its presence and automatically generate custom-fitted configurations for observed target scenarios with the help of an extracted variability model. For the much more diverse realm of shared libraries, with different programming languages, build systems, and a lack of configurability, I demonstrate a different approach. By identifying individual functions as logically distinct units, we construct a symbol-level dependency graph across the applications and all their required libraries. We then remove unneeded code at the binary level and rearrange the remaining parts to take up minimal space in the binary file by formulating their placement as an optimization problem. To lower the number of unnecessary updates to unused components in a deployed system, I lastly present an automated method to determine the impact of software changes on a target scenario and provide guidance for developers on whether they need to update their systems. Applying these techniques to different target systems, I demonstrate that we can disable up to 87 percent of configuration options in a Debian Linux kernel, shrink the size of an embedded OpenWrt kernel by 59 percent, and speed up the boot process of the embedded system by 21 percent. As part of the shared library tailoring process, we can remove 13060 functions from all libraries in OpenWrt and reduce their total size by 31 percent. In the memcached Docker container, we identify 381 entirely unneeded shared libraries and shrink the container image size by 82 percent. An analysis of the development history of two large library projects over the course of more than two years further shows that between 68 and 82 percent of all changes are not required for an OpenWrt appliance, reducing the number of patch days by up to 69 percent. These results demonstrate the broad applicability of our automated methods for both the Linux kernel and shared libraries to a wide range of scenarios. From embedded systems to server applications, custom-tailored system software stacks contribute to the reduction of overheads in space and time

    LIPIcs, Volume 261, ICALP 2023, Complete Volume

    Get PDF
    LIPIcs, Volume 261, ICALP 2023, Complete Volum

    Analyzing the Unanalyzable: an Application to Android Apps

    Get PDF
    In general, software is unreliable. Its behavior can deviate from users’ expectations because of bugs, vulnerabilities, or even malicious code. Manually vetting software is a challenging, tedious, and highly-costly task that does not scale. To alleviate excessive costs and analysts’ burdens, automated static analysis techniques have been proposed by both the research and practitioner communities making static analysis a central topic in software engineering. In the meantime, mobile apps have considerably grown in importance. Today, most humans carry software in their pockets, with the Android operating system leading the market. Millions of apps have been proposed to the public so far, targeting a wide range of activities such as games, health, banking, GPS, etc. Hence, Android apps collect and manipulate a considerable amount of sensitive information, which puts users’ security and privacy at risk. Consequently, it is paramount to ensure that apps distributed through public channels (e.g., the Google Play) are free from malicious code. Hence, the research and practitioner communities have put much effort into devising new automated techniques to vet Android apps against malicious activities over the last decade. Analyzing Android apps is, however, challenging. On the one hand, the Android framework proposes constructs that can be used to evade dynamic analysis by triggering the malicious code only under certain circumstances, e.g., if the device is not an emulator and is currently connected to power. Hence, dynamic analyses can -easily- be fooled by malicious developers by making some code fragments difficult to reach. On the other hand, static analyses are challenged by Android-specific constructs that limit the coverage of off-the-shell static analyzers. The research community has already addressed some of these constructs, including inter-component communication or lifecycle methods. However, other constructs, such as implicit calls (i.e., when the Android framework asynchronously triggers a method in the app code), make some app code fragments unreachable to the static analyzers, while these fragments are executed when the app is run. Altogether, many apps’ code parts are unanalyzable: they are either not reachable by dynamic analyses or not covered by static analyzers. In this manuscript, we describe our contributions to the research effort from two angles: ① statically detecting malicious code that is difficult to access to dynamic analyzers because they are triggered under specific circumstances; and ② statically analyzing code not accessible to existing static analyzers to improve the comprehensiveness of app analyses. More precisely, in Part I, we first present a replication study of a state-of-the-art static logic bomb detector to better show its limitations. We then introduce a novel hybrid approach for detecting suspicious hidden sensitive operations towards triaging logic bombs. We finally detail the construction of a dataset of Android apps automatically infected with logic bombs. In Part II, we present our work to improve the comprehensiveness of Android apps’ static analysis. More specifically, we first show how we contributed to account for atypical inter-component communication in Android apps. Then, we present a novel approach to unify both the bytecode and native in Android apps to account for the multi-language trend in app development. Finally, we present our work to resolve conditional implicit calls in Android apps to improve static and dynamic analyzers

    Compiler-Based Approach to Enhance BliMe Hardware Usability

    Get PDF
    Outsourced computing has emerged as an efficient platform for data processing, but it has raised security concerns due to potential exposure of sensitive data through runtime and side-channel attacks. To address these concerns, the BliMe hardware extensions offer a hardware-enforced taint tracking policy to prevent secret-dependent data exposure. However, such strict policies can hinder software usability on BliMe hardware. While existing solutions can transform software to make it constant-time and more compatible with BliMe policies, they are not fully compatible with BliMe hardware. To strengthen the usability of BliMe hardware, we propose a compiler-based tool to detect and transform policy violations, ensuring constant-time compliance with BliMe. Our tool employs static analysis for taint tracking and employs transformation techniques including array access expansion, control-flow linearization and branchless select. We have implemented the tool on LLVM-11 to automatically convert existing source code. We then conducted experiments on WolfSSL and OISA to examine the accuracy of the analysis and the effect of the transformations. Our evaluation indicates that our tool can successfully transform multiple code patterns. However, we acknowledge that certain code patterns are challenging to transform. Therefore, we also discuss manual approaches and explore potential future work to expand the coverage of our automatic transformations

    Structured parallelism discovery with hybrid static-dynamic analysis and evaluation technique

    Get PDF
    Parallel computer architectures have dominated the computing landscape for the past two decades; a trend that is only expected to continue and intensify, with increasing specialization and heterogeneity. This creates huge pressure across the software stack to produce programming languages, libraries, frameworks and tools which will efficiently exploit the capabilities of parallel computers, not only for new software, but also revitalizing existing sequential code. Automatic parallelization, despite decades of research, has had limited success in transforming sequential software to take advantage of efficient parallel execution. This thesis investigates three approaches that use commutativity analysis as the enabler for parallelization. This has the potential to overcome limitations of traditional techniques. We introduce the concept of liveness-based commutativity for sequential loops. We examine the use of a practical analysis utilizing liveness-based commutativity in a symbolic execution framework. Symbolic execution represents input values as groups of constraints, consequently deriving the output as a function of the input and enabling the identification of further program properties. We employ this feature to develop an analysis and discern commutativity properties between loop iterations. We study the application of this approach on loops taken from real-world programs in the OLDEN and NAS Parallel Benchmark (NPB) suites, and identify its limitations and related overheads. Informed by these findings, we develop Dynamic Commutativity Analysis (DCA), a new technique that leverages profiling information from program execution with specific input sets. Using profiling information, we track liveness information and detect loop commutativity by examining the code’s live-out values. We evaluate DCA against almost 1400 loops of the NPB suite, discovering 86% of them as parallelizable. Comparing our results against dependence-based methods, we match the detection efficacy of two dynamic and outperform three static approaches, respectively. Additionally, DCA is able to automatically detect parallelism in loops which iterate over Pointer-Linked Data Structures (PLDSs), taken from wide range of benchmarks used in the literature, where all other techniques we considered failed. Parallelizing the discovered loops, our methodology achieves an average speedup of 3.6× across NPB (and up to 55×) and up to 36.9× for the PLDS-based loops on a 72-core host. We also demonstrate that our methodology, despite relying on specific input values for profiling each program, is able to correctly identify parallelism that is valid for all potential input sets. Lastly, we develop a methodology to utilize liveness-based commutativity, as implemented in DCA, to detect latent loop parallelism in the shape of patterns. Our approach applies a series of transformations which subsequently enable multiple applications of DCA over the generated multi-loop code section and match its loop commutativity outcomes against the expected criteria for each pattern. Applying our methodology on sets of sequential loops, we are able to identify well-known parallel patterns (i.e., maps, reduction and scans). This extends the scope of parallelism detection to loops, such as those performing scan operations, which cannot be determined as parallelizable by simply evaluating liveness-based commutativity conditions on their original form

    Away From Linear Models of Concurrent Programs

    Get PDF
    Traditional approaches to imperative programming language semantics rely on first defining how each individual statement modifies the memory state, and then composing these definitions into a whole program via the interpretation of the sequential composition operator: the humble semicolon. The creation of the multiprocessor and advent of parallelism began to challenge this model. No longer was a program a single, linear sequence of statements, but it had statements which might occur in one order or another, or even simultaneously. To add to the complexity, compilers and hardware began to optimise their input programs, reordering and removing statements to improve runtime performance. The resulting stack of transformations and complications caused runtime executions to drift progressively further away from the program that a programmer believed they were writing. Several approaches to this have appeared: process calculi which forbid processes from sharing memory and instead force them to communicate directly, maintaining sequential consistency, in which an execution must at least appear to be respecting the ordered sequence of statements model, and permitting weak memory ordering, in which an execution must maintain orders involving explicitly synchronised accesses but is free to reorder everything else. While weak memory is preferred by engineers building high-performance code, due to the relatively high cost of both passing messages and maintaining sequential consistency, the problem of creating a sound weak memory semantics for a real-world programming language with shared memory concurrency has yet to be fully solved. Here we present a weakly ordered semantics for shared memory concurrency, given as an extension to a previously published model. We show that the existing model can be integrated into reasoning techniques which rely on an operational semantics, and that program transformations which cannot introduce new behaviours can be expressed as a relation over the objects of this semantics. We then add a layer of abstraction to the model which allows us to represent dynamic memory allocation in a weak memory context for the first time

    Improving loop optimization with histogram profiling

    Get PDF
    Production compilers use numerous techniques to generate performant code. One such technique is Profile-guided optimization (PGO). The princi- ple of this technique is to insert instrumentation during compilation, gather information about program behaviour with training runs and use this infor- mation during recompilation to improve optimization. The thesis aims to improve the precision of Loop optimizations in GNU Compiler Collection (GCC) with PGO. Currently in GCC, only the average iteration count of a loop is known with PGO. This leads to inefficiencies in both the performance and size of the binary. We implement infrastructure for measuring more information about loop iterations and add new counters namely the histogram of iterations and his- togram of iterations modulo its size. With the histogram of iterations, we improve loop peeling and implement a new case of loop versioning optimiza- tion. This significantly improves the performance of the generated code with reasonable overhead.Produkční překladače používají mnoho různých technik optimalizace kódu. Jedna taková technika je Profile-guided optimization (PGO). Princip této techniky je, že během překládání programu je do něj vložena instrumentace, uživatel změří jeho chování pomocí testovacího běhu a při druhém překladu jsou změřená data použita ke zlepšení optimalizace. Cílem této práce je zlep- šit přesnost optimalizace smyček v GNU Compiler Collection (GCC) s PGO. Během PGO je aktuálně znám ve GCC pouze průměrný počet iterací dané smyčky. To vede k neefektivním optimalizacím, jak co se týče výkonu, tak co se týče velikosti generovaného programu. Tato práce přidává infrastrukturu pro měření dalších vlastností smyček. Implementujeme histogram iterací smyčky a histogram iterací smyčky mo- dulo jeho velikostí. Pomocí histogramu iterací pak zlepšíme optimalizaci loop peeling a přidáme novou verzi optimalizace loop versioning. To podstatně zlepšuje výkon za přiměřenou cenu.Katedra aplikované matematikyDepartment of Applied MathematicsFaculty of Mathematics and PhysicsMatematicko-fyzikální fakult

    Late-bound code generation

    Get PDF
    Each time a function or method is invoked during the execution of a program, a stream of instructions is issued to some underlying hardware platform. But exactly what underlying hardware, and which instructions, is usually left implicit. However in certain situations it becomes important to control these decisions. For example, particular problems can only be solved in real-time when scheduled on specialised accelerators, such as graphics coprocessors or computing clusters. We introduce a novel operator for hygienically reifying the behaviour of a runtime function instance as a syntactic fragment, in a language which may in general differ from the source function definition. Translation and optimisation are performed by recursively invoked, dynamically dispatched code generators. Side-effecting operations are permitted, and their ordering is preserved. We compare our operator with other techniques for pragmatic control, observing that: the use of our operator supports lifting arbitrary mutable objects, and neither requires rewriting sections of the source program in a multi-level language, nor interferes with the interface to individual software components. Due to its lack of interference at the abstraction level at which software is composed, we believe that our approach poses a significantly lower barrier to practical adoption than current methods. The practical efficacy of our operator is demonstrated by using it to offload the user interface rendering of a smartphone application to an FPGA coprocessor, including both statically and procedurally defined user interface components. The generated pipeline is an application-specific, statically scheduled processor-per-primitive rendering pipeline, suitable for place-and-route style optimisation. To demonstrate the compatibility of our operator with existing languages, we show how it may be defined within the Python programming language. We introduce a transformation for weakening mutable to immutable named bindings, termed let-weakening, to solve the problem of propagating information pertaining to named variables between modular code generating units.Open Acces

    An alternative SSA construction algorithm for GCC

    Get PDF
    SSA form is a very important concept in compiler internal code representa- tion. Φ-functions are an integral part of SSA form. Braun, Buchwald, Hack, Leißa, Mallon and Zwinkau introduce a new algorithm for SSA construction and another related algorithm for reducing the number of Φ-functions. These algorithms are not yet implemented in the GCC compiler. Firstly, we introduce, implement and test a basic code generation API based on the SSA construction algorithm. We list the possible extensions and usecases of the API. Then we implement the Φ optimization as a standalone pass. We use it to measure the number of redundant Φ-functions produced by other GCC passes. Finally, we conclude that GCC would benefit from including both of these algorithms. 1SSA forma je velice důležitý koncept týkající se interní reprezentace kódu v překladačích. Φ-funkce jsou nedílná součást SSA formy. Braun, Buchwald, Hack, Leißa, Mallon a Zwinkau představují nový algoritmus pro stavbu SSA a společně s ním také algoritmus redukující množství Φ-funkcí. V GCC zatím tyto algoritmy nebyly implementovány. V této práci nejprve představíme, naimplementujeme a otestujeme základní API určené pro generování kódu, které je založené na zmíněném algoritmu pro stavbu SSA. Následně předvedeme možnosti využití tohoto API a uvedeme jeho možná rozšíření. Poté naimplementujeme algoritmus pro optimalizaci Φ- funkcí. Pomocí tohoto algoritmu změříme, kolik redundantních Φ-funkcí pro- dukují GCC optimalizační průchody. Na základě získaných poznatků nakonec dojdeme k závěru, že by bylo užitečné tyto algoritmy do GCC přidat. 1Department of Applied MathematicsKatedra aplikované matematikyFaculty of Mathematics and PhysicsMatematicko-fyzikální fakult
    corecore