Search CORE

3,344 research outputs found

Dynamic partitioning of loop iterations on heterogeneous PC clusters

Author: Yang CT (Yang, Chao-Tung)
Publication venue: Asia University
Publication date
Field of study

[[abstract]]Loop partitioning on parallel and distributed systems has been a critical problem. Furthermore, it becomes more difficult to deal with on the emerging heterogeneous PC cluster environments. In the past, some loop self-scheduling schemes have been proposed to be applicable to heterogeneous cluster environments. In this paper, we propose a performance-based approach, which partitions loop iterations according to the performance ratio of cluster nodes. To verify the proposed approach, a heterogeneous cluster is built, and three types of application programs are implemented to be executed in this testbed. Experimental results show that the proposed approach performs better than traditional schemes

Asia University Repository

Dynamic partitioning of loop iterations on heterogeneous PC clusters

Author: Yang Chao-Tung
Publication venue: Asia University
Publication date
Field of study

Asia University Repository

Architecture-Aware Configuration and Scheduling of Matrix Multiplication on Asymmetric Multicore Processors

Author: Catalán Sandra
Igual Francisco D.
Mayo Rafael
Quintana-Ortí Enrique S.
Rodríguez-Sánchez Rafael
Publication venue
Publication date: 30/06/2015
Field of study

Asymmetric multicore processors (AMPs) have recently emerged as an appealing technology for severely energy-constrained environments, especially in mobile appliances where heterogeneity in applications is mainstream. In addition, given the growing interest for low-power high performance computing, this type of architectures is also being investigated as a means to improve the throughput-per-Watt of complex scientific applications. In this paper, we design and embed several architecture-aware optimizations into a multi-threaded general matrix multiplication (gemm), a key operation of the BLAS, in order to obtain a high performance implementation for ARM big.LITTLE AMPs. Our solution is based on the reference implementation of gemm in the BLIS library, and integrates a cache-aware configuration as well as asymmetric--static and dynamic scheduling strategies that carefully tune and distribute the operation's micro-kernels among the big and LITTLE cores of the target processor. The experimental results on a Samsung Exynos 5422, a system-on-chip with ARM Cortex-A15 and Cortex-A7 clusters that implements the big.LITTLE model, expose that our cache-aware versions of gemm with asymmetric scheduling attain important gains in performance with respect to its architecture-oblivious counterparts while exploiting all the resources of the AMP to deliver considerable energy efficiency

arXiv.org e-Print Archive

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Repositori Institucional de la Universitat Jaume I

Concurrent Design of Embedded Control Software

Author: Broenink Jan
Frijns Raymond
Groothuis Marcel
Voeten Jeroen
Publication venue: EASST
Publication date: 01/01/2009
Field of study

Embedded software design for mechatronic systems is becoming an increasingly time-consuming and error-prone task. In order to cope with the heterogeneity and complexity, a systematic model-driven design approach is needed, where several parts of the system can be designed concurrently. There is however a trade-off between concurrency efficiency and integration efficiency. In this paper, we present a case study on the development of the embedded control software for a real-world mechatronic system in order to evaluate how we can integrate concurrent and largely independent designed embedded system software parts in an efficient way. The case study was executed using our embedded control system design methodology which employs a concurrent systematic model-based design approach that ensures a concurrent design process, while it still allows a fast integration phase by using automatic code synthesis. The result was a predictable concurrently designed embedded software realization with a short integration time

Pure OAI Repository

University of Twente Research Information

Electronic Communications of the EASST (European Association of Software Science and Technology)