2 research outputs found

    Optimization Techniques for Parallel Programming of Embedded Many-Core Computing Platforms

    Get PDF
    Nowadays many-core computing platforms are widely adopted as a viable solution to accelerate compute-intensive workloads at different scales, from low-cost devices to HPC nodes. It is well established that heterogeneous platforms including a general-purpose host processor and a parallel programmable accelerator have the potential to dramatically increase the peak performance/Watt of computing architectures. However the adoption of these platforms further complicates application development, whereas it is widely acknowledged that software development is a critical activity for the platform design. The introduction of parallel architectures raises the need for programming paradigms capable of effectively leveraging an increasing number of processors, from two to thousands. In this scenario the study of optimization techniques to program parallel accelerators is paramount for two main objectives: first, improving performance and energy efficiency of the platform, which are key metrics for both embedded and HPC systems; second, enforcing software engineering practices with the aim to guarantee code quality and reduce software costs. This thesis presents a set of techniques that have been studied and designed to achieve these objectives overcoming the current state-of-the-art. As a first contribution, we discuss the use of OpenMP tasking as a general-purpose programming model to support the execution of diverse workloads, and we introduce a set of runtime-level techniques to support fine-grain tasks on high-end many-core accelerators (devices with a power consumption greater than 10W). Then we focus our attention on embedded computer vision (CV), with the aim to show how to achieve best performance by exploiting the characteristics of a specific application domain. To further reduce the power consumption of parallel accelerators beyond the current technological limits, we describe an approach based on the principles of approximate computing, which implies modification to the program semantics and proper hardware support at the architectural level

    Supporting localized openvx kernel execution for efficient computer vision application development on sthorm many-core platform

    No full text
    Nowadays Embedded Computer Vision (ECV) is considered a technology enabler for next generation killer apps, and scientific and industrial communities are showing a growing interest in developing applications on high-end embedded systems. Modern many-core accelerators are a promising target for running common ECV algorithms, since their architectural features are particularly suitable in terms of data access patterns and program control flow. In this work we propose a set of software optimization techniques, mainly based on data tiling and local buffering policies, which are specifically targeted to accelerate the execution of OpenVX-based ECV applications by exploiting the memory hierarchy of STHORM many-core accelerator
    corecore