2 research outputs found

    Memory-aware platform description and framework for source-level embedded MPSoC software optimization

    Get PDF
    Developing optimizing source-level transformations, consists of numerous non-trivial subtasks. Besides identifying actual optimization goals within a particular target-platform and compiler setup, the actual implementation is a tedious, error-prone and often recurring work. Providing appropriate support for this development work is a challenging task. Defining and implementing a well-suited target-platform description which can be used by a wide set of optimization techniques while being precise and easy to maintain is one dimension of this challenging task. Another dimension, which has also been tackled in this work, deals with provision of an infrastructure for optimization-step representation, interaction and data retention. Finally, an appropriate source-code representation has been integrated into this approach. These contributions are tightly related to each other, they have been bundled into the MACCv2 framework, a fullfledged optimization-technique implementation and integration approach. Together, they significantly alleviate the effort required for implementation of source-level memory-aware optimization techniques for Multi Processor Systems on a Chip (MPSoCs). The system-modeling approach presented in this dissertation has been located at the processor-memory-switch (PMS) abstraction level. It offers a novel combined structural and semantical description. It combines a locally-scoped, structural modeling approach, as preferred by system designers, and a fast, database-like interface, best suited for optimization technique developers. It supports model refinement and requires only limited effort for an initial abstract system model. The general structure consists of components and channels. Based on this structure, the system model provides mechanisms for database-like access to system-global target-platform properties, while requiring only definition of locally-scoped input data annotated to system-model items. A typical set of these properties contains energy-consumption and access-latency values. The request-based retrieval of system properties is a unique feature, which makes this approach superior to state-of-the-art table-lookup-based or full-system-simulation-based approaches. Combining such component-local properties to system-global target-platform data is performed via aspect handlers. These handlers define computational rules which are applied to correlated locally-scoped data along access paths in the memory-subsystem hierarchy. This approach is capable of calculating these system-global values at a rate similar to plain table lookups, while maintaining a precision close to full-system-simulation-based estimations. This has been shown for both, energy-consumption values as well as access-latency values of the MPARM platform. The MACCv2 framework provides a set of fundamental services to the optimization technique developer. On top of these services, a system model and source-code representation are provided. Further, framework-based optimization-technique implementations are encapsulated into self-contained entities exposing well-defined interfaces. This framework has been successfully used within the European Commission funded MNEMEE project. The hierarchical processing-step representation in MACCv2 allows for encapsulation of tasks at various granularity levels. For simplified reuse in future projects, the entire toolchain as well as individual optimization techniques have been represented as processing-step entities in terms of MACCv2. A common notion of target-platform structure and properties as well as inter-processing-step communication, is achieved via framework-provided services. The system-modeling approach and the framework show the right set of properties needed to support development of memory-aware optimization techniques. The MNEMEE project, continued research work, teaching activities and PhD theses have been successfully founded on approaches and the framework proposed in this dissertation

    Efficient implementation of resource-constrained cyber-physical systems using multi-core parallelism

    Get PDF
    The quest for more performance of applications and systems became more challenging in the recent years. Especially in the cyber-physical and mobile domain, the performance requirements increased significantly. Applications, previously found in the high-performance domain, emerge in the area of resource-constrained domain. Modern heterogeneous high-performance MPSoCs provide a solid foundation to satisfy the high demand. Such systems combine general processors with specialized accelerators ranging from GPUs to machine learning chips. On the other side of the performance spectrum, the demand for small energy efficient systems exposed by modern IoT applications increased vastly. Developing efficient software for such resource-constrained multi-core systems is an error-prone, time-consuming and challenging task. This thesis provides with PA4RES a holistic semiautomatic approach to parallelize and implement applications for such platforms efficiently. Our solution supports the developer to find good trade-offs to tackle the requirements exposed by modern applications and systems. With PICO, we propose a comprehensive approach to express parallelism in sequential applications. PICO detects data dependencies and implements required synchronization automatically. Using a genetic algorithm, PICO optimizes the data synchronization. The evolutionary algorithm considers channel capacity, memory mapping, channel merging and flexibility offered by the channel implementation with respect to execution time, energy consumption and memory footprint. PICO's communication optimization phase was able to generate a speedup almost 2 or an energy improvement of 30% for certain benchmarks. The PAMONO sensor approach enables a fast detection of biological viruses using optical methods. With a sophisticated virus detection software, a real-time virus detection running on stationary computers was achieved. Within this thesis, we were able to derive a soft real-time capable virus detection running on a high-performance embedded system, commonly found in today's smart phones. This was accomplished with smart DSE algorithm which optimizes for execution time, energy consumption and detection quality. Compared to a baseline implementation, our solution achieved a speedup of 4.1 and 87\% energy savings and satisfied the soft real-time requirements. Accepting a degradation of the detection quality, which still is usable in medical context, led to a speedup of 11.1. This work provides the fundamentals for a truly mobile real-time virus detection solution. The growing demand for processing power can no longer satisfied following well-known approaches like higher frequencies. These so-called performance walls expose a serious challenge for the growing performance demand. Approximate computing is a promising approach to overcome or at least shift the performance walls by accepting a degradation in the output quality to gain improvements in other objectives. Especially for a safe integration of approximation into existing application or during the development of new approximation techniques, a method to assess the impact on the output quality is essential. With QCAPES, we provide a multi-metric assessment framework to analyze the impact of approximation. Furthermore, QCAPES provides useful insights on the impact of approximation on execution time and energy consumption. With ApproxPICO we propose an extension to PICO to consider approximate computing during the parallelization of sequential applications
    corecore