220 research outputs found

    Exploiting Hardware Abstraction for Parallel Programming Framework: Platform and Multitasking

    Get PDF
    With the help of the parallelism provided by the fine-grained architecture, hardware accelerators on Field Programmable Gate Arrays (FPGAs) can significantly improve the performance of many applications. However, designers are required to have excellent hardware programming skills and unique optimization techniques to explore the potential of FPGA resources fully. Intermediate frameworks above hardware circuits are proposed to improve either performance or productivity by leveraging parallel programming models beyond the multi-core era. In this work, we propose the PolyPC (Polymorphic Parallel Computing) framework, which targets enhancing productivity without losing performance. It helps designers develop parallelized applications and implement them on FPGAs. The PolyPC framework implements a custom hardware platform, on which programs written in an OpenCL-like programming model can launch. Additionally, the PolyPC framework extends vendor-provided tools to provide a complete development environment including intermediate software framework, and automatic system builders. Designers\u27 programs can be either synthesized as hardware processing elements (PEs) or compiled to executable files running on software PEs. Benefiting from nontrivial features of re-loadable PEs, and independent group-level schedulers, the multitasking is enabled for both software and hardware PEs to improve the efficiency of utilizing hardware resources. The PolyPC framework is evaluated regarding performance, area efficiency, and multitasking. The results show a maximum 66 times speedup over a dual-core ARM processor and 1043 times speedup over a high-performance MicroBlaze with 125 times of area efficiency. It delivers a significant improvement in response time to high-priority tasks with the priority-aware scheduling. Overheads of multitasking are evaluated to analyze trade-offs. With the help of the design flow, the OpenCL application programs are converted into executables through the front-end source-to-source transformation and back-end synthesis/compilation to run on PEs, and the framework is generated from users\u27 specifications

    MCFlow: Middleware for Mixed-Criticality Distributed Real-Time Systems

    Get PDF
    Traditional fixed-priority scheduling analysis for periodic/sporadic task sets is based on the assumption that all tasks are equally critical to the correct operation of the system. Therefore, every task has to be schedulable under the scheduling policy, and estimates of tasks\u27 worst case execution times must be conservative in case a task runs longer than is usual. To address the significant under-utilization of a system\u27s resources under normal operating conditions that can arise from these assumptions, several \emph{mixed-criticality scheduling} approaches have been proposed. However, to date there has been no quantitative comparison of system schedulability or run-time overhead for the different approaches. In this dissertation, we present what is to our knowledge the first side-by-side implementation and evaluation of those approaches, for periodic and sporadic mixed-criticality tasks on uniprocessor or distributed systems, under a mixed-criticality scheduling model that is common to all these approaches. To make a fair evaluation of mixed-criticality scheduling, we also address some previously open issues and propose modifications to improve schedulability and correctness of particular approaches. To facilitate the development and evaluation of mixed-criticality applications, we have designed and developed a distributed real-time middleware, called MCFlow, for mixed-criticality end-to-end tasks running on multi-core platforms. The research presented in this dissertation provides the following contributions to the state of the art in real-time middleware: (1) an efficient component model through which dependent subtask graphs can be configured flexibly for execution within a single core, across cores of a common host, or spanning multiple hosts; (2) support for optimizations to inter-component communication to reduce data copying without sacrificing the ability to execute subtasks in parallel; (3) a strict separation of timing and functional concerns so that they can be configured independently; (4) an event dispatching architecture that uses lock free algorithms where possible to reduce memory contention, CPU context switching, and priority inversion; and (5) empirical evaluations of MCFlow itself and of different mixed criticality scheduling approaches both with a single host and end-to-end across multiple hosts. The results of our evaluation show that in terms of basic distributed real-time behavior MCFlow performs comparably to the state of the art TAO real-time object request broker when only one core is used and outperforms TAO when multiple cores are involved. We also identify and categorize different use cases under which different mixed criticality scheduling approaches are preferable

    Real-Time Application Mapping for Many-Cores Using a Limited Migrative Model

    Get PDF
    Many-core platforms are an emerging technology in the real-time embedded domain. These devices offer various options for power savings, cost reductions and contribute to the overall system flexibility, however, issues such as unpredictability, scalability and analysis pessimism are serious challenges to their integration into the aforementioned area. The focus of this work is on many-core platforms using a limited migrative model (LMM). LMM is an approach based on the fundamental concepts of the multi-kernel paradigm, which is a promising step towards scalable and predictable many-cores. In this work, we formulate the problem of real-time application mapping on a many-core platform using LMM, and propose a three-stage method to solve it. An extended version of the existing analysis is used to assure that derived mappings (i) guarantee the fulfilment of timing constraints posed on worst-case communication delays of individual applications, and (ii) provide an environment to perform load balancing for e.g. energy/thermal management, fault tolerance and/or performance reasons

    Dynamic Resource Allocation in Embedded, High-Performance and Cloud Computing

    Get PDF
    The availability of many-core computing platforms enables a wide variety of technical solutions for systems across the embedded, high-performance and cloud computing domains. However, large scale manycore systems are notoriously hard to optimise. Choices regarding resource allocation alone can account for wide variability in timeliness and energy dissipation (up to several orders of magnitude). Dynamic Resource Allocation in Embedded, High-Performance and Cloud Computing covers dynamic resource allocation heuristics for manycore systems, aiming to provide appropriate guarantees on performance and energy efficiency. It addresses different types of systems, aiming to harmonise the approaches to dynamic allocation across the complete spectrum between systems with little flexibility and strict real-time guarantees all the way to highly dynamic systems with soft performance requirements. Technical topics presented in the book include: Load and Resource Models Admission Control Feedback-based Allocation and Optimisation Search-based Allocation Heuristics Distributed Allocation based on Swarm Intelligence Value-Based Allocation Each of the topics is illustrated with examples based on realistic computational platforms such as Network-on-Chip manycore processors, grids and private cloud environments.Note.-- EUR 6,000 BPC fee funded by the EC FP7 Post-Grant Open Access Pilo

    Dynamic Resource Allocation in Embedded, High-Performance and Cloud Computing

    Get PDF
    The availability of many-core computing platforms enables a wide variety of technical solutions for systems across the embedded, high-performance and cloud computing domains. However, large scale manycore systems are notoriously hard to optimise. Choices regarding resource allocation alone can account for wide variability in timeliness and energy dissipation (up to several orders of magnitude). Dynamic Resource Allocation in Embedded, High-Performance and Cloud Computing covers dynamic resource allocation heuristics for manycore systems, aiming to provide appropriate guarantees on performance and energy efficiency. It addresses different types of systems, aiming to harmonise the approaches to dynamic allocation across the complete spectrum between systems with little flexibility and strict real-time guarantees all the way to highly dynamic systems with soft performance requirements. Technical topics presented in the book include: • Load and Resource Models• Admission Control• Feedback-based Allocation and Optimisation• Search-based Allocation Heuristics• Distributed Allocation based on Swarm Intelligence• Value-Based AllocationEach of the topics is illustrated with examples based on realistic computational platforms such as Network-on-Chip manycore processors, grids and private cloud environments

    Dynamic Resource Allocation in Embedded, High-Performance and Cloud Computing

    Get PDF
    The availability of many-core computing platforms enables a wide variety of technical solutions for systems across the embedded, high-performance and cloud computing domains. However, large scale manycore systems are notoriously hard to optimise. Choices regarding resource allocation alone can account for wide variability in timeliness and energy dissipation (up to several orders of magnitude). Dynamic Resource Allocation in Embedded, High-Performance and Cloud Computing covers dynamic resource allocation heuristics for manycore systems, aiming to provide appropriate guarantees on performance and energy efficiency. It addresses different types of systems, aiming to harmonise the approaches to dynamic allocation across the complete spectrum between systems with little flexibility and strict real-time guarantees all the way to highly dynamic systems with soft performance requirements. Technical topics presented in the book include: • Load and Resource Models• Admission Control• Feedback-based Allocation and Optimisation• Search-based Allocation Heuristics• Distributed Allocation based on Swarm Intelligence• Value-Based AllocationEach of the topics is illustrated with examples based on realistic computational platforms such as Network-on-Chip manycore processors, grids and private cloud environments

    Designing a scalable dynamic load -balancing algorithm for pipelined single program multiple data applications on a non-dedicated heterogeneous network of workstations

    Get PDF
    Dynamic load balancing strategies have been shown to be the most critical part of an efficient implementation of various applications on large distributed computing systems. The need for dynamic load balancing strategies increases when the underlying hardware is a non-dedicated heterogeneous network of workstations (HNOW). This research focuses on the single program multiple data (SPMD) programming model as it has been extensively used in parallel programming for its simplicity and scalability in terms of computational power and memory size.;This dissertation formally defines and addresses the problem of designing a scalable dynamic load-balancing algorithm for pipelined SPMD applications on non-dedicated HNOW. During this process, the HNOW parameters, SPMD application characteristics, and load-balancing performance parameters are identified.;The dissertation presents a taxonomy that categorizes general load balancing algorithms and a methodology that facilitates creating new algorithms that can harness the HNOW computing power and still preserve the scalability of the SPMD application.;The dissertation devises a new algorithm, DLAH (Dynamic Load-balancing Algorithm for HNOW). DLAH is based on a modified diffusion technique, which incorporates the HNOW parameters. Analytical performance bound for the worst-case scenario of the diffusion technique has been derived.;The dissertation develops and utilizes an HNOW simulation model to conduct extensive simulations. These simulations were used to validate DLAH and compare its performance to related dynamic algorithms. The simulations results show that DLAH algorithm is scalable and performs well for both homogeneous and heterogeneous networks. Detailed sensitivity analysis was conducted to study the effects of key parameters on performance

    HW-SW Emulation Framework for Temperature-Aware Design in MPSoCs

    Get PDF
    New tendencies envisage Multi-Processor Systems-On-Chip (MPSoCs) as a promising solution for the consumer electronics market. MPSoCs are complex to design, as they must execute multiple applications (games, video), while meeting additional design constraints (energy consumption, time-to-market). Moreover, the rise of temperature in the die for MPSoCs can seriously affect their final performance and reliability. In this paper, we present a new hardware-software emulation framework that allows designers a complete exploration of the thermal behavior of final MPSoC designs early in the design flow. The proposed framework uses FPGA emulation as the key element to model the hardware components of the considered MPSoC platform at multi-megahertz speeds. It automatically extracts detailed system statistics that are used as input to our software thermal library running in a host computer. This library calculates at run-time the temperature of on-chip components, based on the collected statistics from the emulated system and the final floorplan of the MPSoC. This enables fast testing of various thermal management techniques. Our results show speed-ups of three orders of magnitude compared to cycle-accurate MPSoC simulator

    Dynamic Resource Management of Network-on-Chip Platforms for Multi-stream Video Processing

    Get PDF
    This thesis considers resource management in the context of parallel multiple video stream decoding, on multicore/many-core platforms. Such platforms have tens or hundreds of on-chip processing elements which are connected via a Network-on-Chip (NoC). Inefficient task allocation configurations can negatively affect the communication cost and resource contention in the platform, leading to predictability and performance issues. Efficient resource management for large-scale complex workloads is considered a challenging research problem; especially when applications such as video streaming and decoding have dynamic and unpredictable workload characteristics. For these type of applications, runtime heuristic-based task mapping techniques are required. As the application and platform size increase, decentralised resource management techniques are more desirable to overcome the reliability and performance bottlenecks in centralised management. In this work, several heuristic-based runtime resource management techniques, targeting real-time video decoding workloads are proposed. Firstly, two admission control approaches are proposed; one fully deterministic and highly predictable; the other is heuristic-based, which balances predictability and performance. Secondly, a pair of runtime task mapping schemes are presented, which make use of limited known application properties, communication cost and blocking-aware heuristics. Combined with the proposed deterministic admission controller, these techniques can provide strict timing guarantees for hard real-time streams whilst improving resource usage. The third contribution in this thesis is a distributed, bio-inspired, low-overhead, task re-allocation technique, which is used to further improve the timeliness and workload distribution of admitted soft real-time streams. Finally, this thesis explores parallelisation and resource management issues, surrounding soft real-time video streams that have been encoded using complex encoding tools and modern codecs such as High Efficiency Video Coding (HEVC). Properties of real streams and decoding trace data are analysed, to statistically model and generate synthetic HEVC video decoding workloads. These workloads are shown to have complex and varying task dependency structures and resource requirements. To address these challenges, two novel runtime task clustering and mapping techniques for Tile-parallel HEVC decoding are proposed. These strategies consider the workload communication to computation ratio and stream-specific characteristics to balance predictability improvement and communication energy reduction. Lastly, several task to memory controller port assignment schemes are explored to alleviate performance bottlenecks, resulting from memory traffic contention
    • …
    corecore