14 research outputs found

    A Methodology for Invasive Programming on Virtualizable Embedded MPSoC Architectures

    Get PDF
    AbstractExploiting the huge logic resources in current embedded devices has led to a plethora of on-chip multi-processor architec- tures. However, besides instantiating more and more soft-core processors on a chip, developing applications suited for such architectures still remains a hard task. A further step in the evolution of embedded multi-processing might be the so called Invasive Programming. In this paradigm, an application may be switched from sequential to parallel execution at runtime. A task may then dynamically invade currently unused processor resources in a multi-processor system to resume in parallel execution mode. This hardens existing problems, however, because not only the development of suited software, but also the creation of multi-processor architectures supporting this paradigm is needed. Therefore, this work presents a concise methodology to enable Invasive Programming properties on an embedded Multi-Processor System-on-Chip (MPSoC). This is achieved by combining a designer-guided code parallelization approach with a virtualizable, generic, and scalable embedded MPSoC architecture. To resolve data dependencies during task invasion, a processor-independent task-based communication scheme for the MPSoC is proposed. Moreover, a tool framework dedicated to the generic creation of virtualizable MPSoC is provided. The approach is demonstrated by the generation of an MPSoC featuring eight processors executing an application which dynamically switches at runtime between sequential and parallel execution

    Low power architectures for streaming applications

    Get PDF

    Design Space Exploration and Resource Management of Multi/Many-Core Systems

    Get PDF
    The increasing demand of processing a higher number of applications and related data on computing platforms has resulted in reliance on multi-/many-core chips as they facilitate parallel processing. However, there is a desire for these platforms to be energy-efficient and reliable, and they need to perform secure computations for the interest of the whole community. This book provides perspectives on the aforementioned aspects from leading researchers in terms of state-of-the-art contributions and upcoming trends

    Proceedings of the 5th International Workshop on Reconfigurable Communication-centric Systems on Chip 2010 - ReCoSoC\u2710 - May 17-19, 2010 Karlsruhe, Germany. (KIT Scientific Reports ; 7551)

    Get PDF
    ReCoSoC is intended to be a periodic annual meeting to expose and discuss gathered expertise as well as state of the art research around SoC related topics through plenary invited papers and posters. The workshop aims to provide a prospective view of tomorrow\u27s challenges in the multibillion transistor era, taking into account the emerging techniques and architectures exploring the synergy between flexible on-chip communication and system reconfigurability

    Predictable and composable system-on-chip memory controllers

    Get PDF
    Contemporary System-on-Chip (SoC) become more and more complex, as increasing integration results in a larger number of concurrently executing applications. These applications consist of tasks that are mapped on heterogeneous multi-processor platforms with distributed memory hierarchies, where SRAMs and SDRAMs are shared by a variety of arbiters. Some applications have real-time requirements, meaning that they must perform a particular computation before a deadline to guarantee functional correctness, or to prevent quality degradation. Mapping the applications on the platform such that all real-time requirements are satisfied is very challenging. The number of possible mappings of tasks to processing elements and data structures to memories may be large, and appropriate configuration settings must be determined once the mapping is chosen. Verifying that a particular mapping satisfies all application requirements is typically done by system-level simulation. However, resource sharing causes interference between applications, making their temporal behaviors inter-dependent. All concurrently executing applications must hence be verified together, causing the verification complexity of the system to increase exponentially with the number of applications. Together these factors contribute to making the integration and verification process a dominant part of SoC development, both in terms of time and money. Predictable and composable systems are proposed to manage the increasing verification complexity. Predictable systems provide lower bounds on application performance, while applications in composable systems are completely isolated and cannot affect each other’s temporal behavior by even a single clock cycle. Predictable systems enable formal verification that covers all possible interactions with the platform. However, this assumes that the behavior of an application is captured in a performance model, which is not the case for many applications. Composability offers a complementary verification approach by letting these applications be verified independently by simulation with linear verification complexity. A limitation of current predictable and composable systems is that there are no memory controllers supporting the concepts in a general way. Current SRAM controllers can be shared in a predictable way with a variety of arbiters, but are only composable if statically scheduled or shared using time-division multiplexing. Existing SDRAM controllers are not composable, and are either unpredictable or limited to applications that are statically scheduled. This thesis addresses the limitations of current predictable and composable systems by proposing a general predictable and composable memory controller, thereby addressing the mapping and verification problem in embedded systems. The proposed memory controller is divided into a front-end and a back-end. The back-end is specific for DDR2/DDR3 SDRAM and makes the memory behave in a predictable manner using precomputed memory patterns that are dynamically combined at run time. The front-end contains buffering and an arbiter in the class of Latency-Rate (LR) servers, which is a class with many well-known predictable arbiters. We extend this class with a Credit-Controlled Static-Priority (CCSP) arbiter that is developed specifically for shared resources with latency-critical requestors and high loads, such as memories. Three key features of CCSP are: 1) It accommodates latency-critical requestors with low bandwidth requirements without wasting bandwidth. 2) Over-allocated bandwidth can be made negligible at an increased area cost, without affecting latency. 3) It has a small implementation that runs fast enough to keep up with most DDR2/DDR3 memories. The proposed front-end is general and can be used with other predictable resources, such as SRAM controllers. The proposed memory controller hence supports multiple arbiter and memory types, thus addressing the diversity in modern SoCs. The combination of front-end and predictable memory behaves like a LR server, which is the shared resource abstraction used in this work. In essence, a LR server guarantees a requestor a minimum bandwidth and a maximum latency, enabling formal verification of real-time requirements. The LR server model is compatible with several commonly used formal analysis frameworks, such as network calculus and data-flow analysis. Our memory controller hence allows any combination of predictable memory and LR arbiter to be used transparently for formal verification of applications with any of these frameworks. Sharing a predictable memory at run-time results in interference between requestors, making the memory controller non-composable. This is addressed by adding a Delay Block to the front-end that delays all signals sent from the front-end to a requestor to always emulate worst-case interference. This makes requestors unable to affect each other’s temporal behavior, which is sufficient to guarantee composability on the level of applications. Our predictable memory controller hence offers composable service with a variety of memory and arbiter types, which widely extends the scope of composable platforms. Another benefit of this approach is that it enables composable service to be dynamically enabled and disabled, enabling requestors that do not require composable service to use slack bandwidth to improve performance. The predictable and composable memory controller is supported by a configuration flow that automatically computes memory patterns and arbiter settings to satisfy given bandwidth and latency requirements. The flow uses abstraction to separate the configuration of the memory and the arbiter, enabling settings to be computed in a streamlined fashion for all supported memories and arbiters

    Efficient implementation of resource-constrained cyber-physical systems using multi-core parallelism

    Get PDF
    The quest for more performance of applications and systems became more challenging in the recent years. Especially in the cyber-physical and mobile domain, the performance requirements increased significantly. Applications, previously found in the high-performance domain, emerge in the area of resource-constrained domain. Modern heterogeneous high-performance MPSoCs provide a solid foundation to satisfy the high demand. Such systems combine general processors with specialized accelerators ranging from GPUs to machine learning chips. On the other side of the performance spectrum, the demand for small energy efficient systems exposed by modern IoT applications increased vastly. Developing efficient software for such resource-constrained multi-core systems is an error-prone, time-consuming and challenging task. This thesis provides with PA4RES a holistic semiautomatic approach to parallelize and implement applications for such platforms efficiently. Our solution supports the developer to find good trade-offs to tackle the requirements exposed by modern applications and systems. With PICO, we propose a comprehensive approach to express parallelism in sequential applications. PICO detects data dependencies and implements required synchronization automatically. Using a genetic algorithm, PICO optimizes the data synchronization. The evolutionary algorithm considers channel capacity, memory mapping, channel merging and flexibility offered by the channel implementation with respect to execution time, energy consumption and memory footprint. PICO's communication optimization phase was able to generate a speedup almost 2 or an energy improvement of 30% for certain benchmarks. The PAMONO sensor approach enables a fast detection of biological viruses using optical methods. With a sophisticated virus detection software, a real-time virus detection running on stationary computers was achieved. Within this thesis, we were able to derive a soft real-time capable virus detection running on a high-performance embedded system, commonly found in today's smart phones. This was accomplished with smart DSE algorithm which optimizes for execution time, energy consumption and detection quality. Compared to a baseline implementation, our solution achieved a speedup of 4.1 and 87\% energy savings and satisfied the soft real-time requirements. Accepting a degradation of the detection quality, which still is usable in medical context, led to a speedup of 11.1. This work provides the fundamentals for a truly mobile real-time virus detection solution. The growing demand for processing power can no longer satisfied following well-known approaches like higher frequencies. These so-called performance walls expose a serious challenge for the growing performance demand. Approximate computing is a promising approach to overcome or at least shift the performance walls by accepting a degradation in the output quality to gain improvements in other objectives. Especially for a safe integration of approximation into existing application or during the development of new approximation techniques, a method to assess the impact on the output quality is essential. With QCAPES, we provide a multi-metric assessment framework to analyze the impact of approximation. Furthermore, QCAPES provides useful insights on the impact of approximation on execution time and energy consumption. With ApproxPICO we propose an extension to PICO to consider approximate computing during the parallelization of sequential applications

    3rd Many-core Applications Research Community (MARC) Symposium. (KIT Scientific Reports ; 7598)

    Get PDF
    This manuscript includes recent scientific work regarding the Intel Single Chip Cloud computer and describes approaches for novel approaches for programming and run-time organization

    Embedded System Design

    Get PDF
    A unique feature of this open access textbook is to provide a comprehensive introduction to the fundamental knowledge in embedded systems, with applications in cyber-physical systems and the Internet of things. It starts with an introduction to the field and a survey of specification models and languages for embedded and cyber-physical systems. It provides a brief overview of hardware devices used for such systems and presents the essentials of system software for embedded systems, including real-time operating systems. The author also discusses evaluation and validation techniques for embedded systems and provides an overview of techniques for mapping applications to execution platforms, including multi-core platforms. Embedded systems have to operate under tight constraints and, hence, the book also contains a selected set of optimization techniques, including software optimization techniques. The book closes with a brief survey on testing. This fourth edition has been updated and revised to reflect new trends and technologies, such as the importance of cyber-physical systems (CPS) and the Internet of things (IoT), the evolution of single-core processors to multi-core processors, and the increased importance of energy efficiency and thermal issues

    Embedded System Design

    Get PDF
    A unique feature of this open access textbook is to provide a comprehensive introduction to the fundamental knowledge in embedded systems, with applications in cyber-physical systems and the Internet of things. It starts with an introduction to the field and a survey of specification models and languages for embedded and cyber-physical systems. It provides a brief overview of hardware devices used for such systems and presents the essentials of system software for embedded systems, including real-time operating systems. The author also discusses evaluation and validation techniques for embedded systems and provides an overview of techniques for mapping applications to execution platforms, including multi-core platforms. Embedded systems have to operate under tight constraints and, hence, the book also contains a selected set of optimization techniques, including software optimization techniques. The book closes with a brief survey on testing. This fourth edition has been updated and revised to reflect new trends and technologies, such as the importance of cyber-physical systems (CPS) and the Internet of things (IoT), the evolution of single-core processors to multi-core processors, and the increased importance of energy efficiency and thermal issues

    Model-based symbolic design space exploration at the electronic system level: a systematic approach

    Get PDF
    In this thesis, a novel, fully systematic approach is proposed that addresses the automated design space exploration at the electronic system level. The problem is formulated as multi-objective optimization problem and is encoded symbolically using Answer Set Programming (ASP). Several specialized solvers are tightly coupled as background theories with the foreground ASP solver under the ASP modulo Theories (ASPmT) paradigm. By utilizing the ASPmT paradigm, the search is executed entirely systematically and the disparate synthesis steps can be coupled to explore the search space effectively.In dieser Arbeit wird ein vollständig systematischer Ansatz präsentiert, der sich mit der Entwurfsraumexploration auf der elektronischen Systemebene befasst. Das Problem wird als multikriterielles Optimierungsproblem formuliert und symbolisch mit Hilfe von Answer Set Programming (ASP) kodiert. Spezialisierte Solver sind im Rahmen des ASP modulo Theories (ASPmT) Paradigmas als Hintergrundtheorien eng mit dem ASP Solver gekoppelt. Durch die Verwendung von ASPmT wird die Suche systematisch ausgeführt und die individuellen Schritte können gekoppelt werden, um den Suchraum effektiv zu durchsuchen
    corecore