Search CORE

681 research outputs found

Recommended from our members

Percolation scheduling for non-VLIW machines

Author: Brownhill Carrie J.
Nicolau Alexandru
Publication venue: eScholarship, University of California
Publication date: 15/01/1990
Field of study

Percolation Scheduling, a technique for compile-time code parallelization, has proven very successful for exploiting fine-grain irregular parallelism in ordinary programs. Currently, this technology is targeted only to VLIW (Very Long Instruction Word) machines, which have the advantages of 'free' synchronization and communication. Shared memory multi-processors can simulate the execution characteristics of VLIW machines with the use of static barriers. Preliminary results show that Percolation Scheduling can be used with good results on this type of architecture by increasing the granularity from operation level to source statement level, removing any redundant synchronization, and providing an efficient implementation of multi-way jumps

eScholarship - University of California

Overview of Parallel Platforms for Common High Performance Computing

Author: Adamec Filip
Fryza Tomas
Marsalek Roman
Prokopec Jan
Svobodova Jitka
Publication venue: Společnost pro radioelektronické inženýrství
Publication date: 01/04/2012
Field of study

The paper deals with various parallel platforms used for high performance computing in the signal processing domain. More precisely, the methods exploiting the multicores central processing units such as message passing interface and OpenMP are taken into account. The properties of the programming methods are experimentally proved in the application of a fast Fourier transform and a discrete cosine transform and they are compared with the possibilities of MATLAB's built-in functions and Texas Instruments digital signal processors with very long instruction word architectures. New FFT and DCT implementations were proposed and tested. The implementation phase was compared with CPU based computing methods and with possibilities of the Texas Instruments digital signal processing library on C6747 floating-point DSPs. The optimal combination of computing methods in the signal processing domain and new, fast routines' implementation is proposed as well

Directory of Open Access Journals

Digital library of Brno University of Technology

SynZEN: a hybrid TTA/VLIW architecture with a distributed register file

Author: Hauser Stefan
Juurlink Ben
Moser Nico
Publication venue
Publication date: 01/01/2012
Field of study

The quest for higher performance within a certain power budget in the fields of embedded computing demands unconventional architectural approaches. To this end, in this paper we present synZEN (sZ): a (micro-)architecture that combines features of very long instruction word (VLIW) and transport triggered architectures (TTAs) to cover the needs of different applications. SynZEN features a distributed register file (RF) (i.e., each functional unit (FU) has its own RF) and a wide memory connection to exploit spatial data locality. FPGA synthesis results demonstrate that due to the distributed RF the sZ design can be implemented in less area (in terms of FPGA slices) than existing TTA and VLIW designs. Furthermore, using two micro-benchmarks we show that because of the wide memory connection, sZ outperforms both the TTA as well as the VLIW design

Virtual cluster scheduling through the scheduling graph

Author: Codina Viñas Josep M.
González Colás Antonio María
Sánchez Navarro Jesús
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2007
Field of study

This paper presents an instruction scheduling and cluster assignment approach for clustered processors. The proposed technique makes use of a novel representation named the scheduling graph which describes all possible schedules. A powerful deduction process is applied to this graph, reducing at each step the set of possible schedules. In contrast to traditional list scheduling techniques, the proposed scheme tries to establish relations among instructions rather than assigning each instruction to a particular cycle. The main advantage is that wrong or poor schedules can be anticipated and discarded earlier. In addition, cluster assignment of instructions is performed using another novel concept called virtual clusters, which define sets of instructions that must execute in the same cluster. These clusters are managed during the deduction process to identify incompatibilities among instructions. The mapping of virtual to physical clusters is postponed until the scheduling of the instructions has finalized. The advantages this novel approach features include: (1) accurate scheduling information when assigning, and, (2) accurate information of the cluster assignment constraints imposed by scheduling decisions. We have implemented and evaluated the proposed scheme with superblocks extracted from Speclnt95 and MediaBench. The results show that this approach produces better schedules than the previous state-of-the-art. Speed-ups are up to 15%, with average speed-ups ranging from 2.5% (2-Clusters) to 9.5% (4-Clusters).Peer ReviewedPostprint (published version

Modeling and visualizing networked multi-core embedded software energy consumption

Author: Eder Kerstin
Kerrison Steve
Publication venue
Publication date: 09/09/2015
Field of study

In this report we present a network-level multi-core energy model and a software development process workflow that allows software developers to estimate the energy consumption of multi-core embedded programs. This work focuses on a high performance, cache-less and timing predictable embedded processor architecture, XS1. Prior modelling work is improved to increase accuracy, then extended to be parametric with respect to voltage and frequency scaling (VFS) and then integrated into a larger scale model of a network of interconnected cores. The modelling is supported by enhancements to an open source instruction set simulator to provide the first network timing aware simulations of the target architecture. Simulation based modelling techniques are combined with methods of results presentation to demonstrate how such work can be integrated into a software developer's workflow, enabling the developer to make informed, energy aware coding decisions. A set of single-, multi-threaded and multi-core benchmarks are used to exercise and evaluate the models and provide use case examples for how results can be presented and interpreted. The models all yield accuracy within an average +/-5 % error margin

arXiv.org e-Print Archive