Search CORE

480 research outputs found

Globally asynchronous locally synchronous configurable array architecture for algorithm embeddings

Author: Gao Bo
Publication venue: The University of Edinburgh
Publication date: 01/01/1996
Field of study

A comparative study of synchronous and self-timed systolic array architectures.

Author: Hogg R. S.
Publication venue: Sheffield Hallam University,
Publication date
Field of study

This thesis examines systolic array architectures and their methods of control and communication synchronisation. Systolic array processors suffer from synchronisation problems associated with the clocking mechanism that causally restricts their scalability. To overcome this problem both return-to-zero (RTZ) and non-return-to zero (NRTZ) delay-insensitive self-timed (ST) techniques can be used to realise architectures that operate correctly in the presence of arbitrary delays at all levels in their design. As a consequence, RTZ and NRTZ versions of an existing systolic array architecture, namely the Single instruction Systolic Array (SISA), have been developed in order to investigate the potential for realising architecturally scaleable systolic arrays. The new architectures, called the RTZ and NRTZ ST-SISAs, have been compared with each other and against their synchronous counterpart to establish their relative trade-offs. The new designs exhibit several novel features including: variable length bit-serial data words, average case processing speeds dependent on data word length as well as computational complexity, a novel autonomous inter-processor data communication mechanism and architectural scalability independent of fabrication technology. This thesis introduces an implementation of the RTZ and NRTZ ST-SISA architectures, along with their performance and area characteristics. Guidelines have been developed from the resulting RTZ and NRTZ architectures allowing novel self-timed systolic architectures to be derived

Sheffield Hallam University Research Archive

Dynamically reconfigurable architecture for embedded computer vision systems

Author: Nieto Lareo Alejandro Manuel
Publication venue
Publication date: 01/01/2013
Field of study

The objective of this research work is to design, develop and implement a new architecture which integrates on the same chip all the processing levels of a complete Computer Vision system, so that the execution is efficient without compromising the power consumption while keeping a reduced cost. For this purpose, an analysis and classification of different mathematical operations and algorithms commonly used in Computer Vision are carried out, as well as a in-depth review of the image processing capabilities of current-generation hardware devices. This permits to determine the requirements and the key aspects for an efficient architecture. A representative set of algorithms is employed as benchmark to evaluate the proposed architecture, which is implemented on an FPGA-based system-on-chip. Finally, the prototype is compared to other related approaches in order to determine its advantages and weaknesses

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Repositorio Institucional da Universidade de Santiago de Compostela

Architectural support for task dependence management with flexible software scheduling

Author: Beivide Palacio Ramon
Bosque Jose L.
Casas Marc
Castillo Emilio
Moreto Planas Miquel
Valero Cortés Mateo
Vallejo Enrique
Álvarez Martí Lluc
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2018
Field of study

The growing complexity of multi-core architectures has motivated a wide range of software mechanisms to improve the orchestration of parallel executions. Task parallelism has become a very attractive approach thanks to its programmability, portability and potential for optimizations. However, with the expected increase in core counts, finer-grained tasking will be required to exploit the available parallelism, which will increase the overheads introduced by the runtime system. This work presents Task Dependence Manager (TDM), a hardware/software co-designed mechanism to mitigate runtime system overheads. TDM introduces a hardware unit, denoted Dependence Management Unit (DMU), and minimal ISA extensions that allow the runtime system to offload costly dependence tracking operations to the DMU and to still perform task scheduling in software. With lower hardware cost, TDM outperforms hardware-based solutions and enhances the flexibility, adaptability and composability of the system. Results show that TDM improves performance by 12.3% and reduces EDP by 20.4% on average with respect to a software runtime system. Compared to a runtime system fully implemented in hardware, TDM achieves an average speedup of 4.2% with 7.3x less area requirements and significant EDP reductions. In addition, five different software schedulers are evaluated with TDM, illustrating its flexibility and performance gains.This work has been supported by the RoMoL ERC Advanced Grant (GA 321253), by the European HiPEAC Network of Excellence, by the Spanish Ministry of Science and Innovation (contracts TIN2015-65316-P, TIN2016-76635-C2-2-R and TIN2016-81840-REDT), by the Generalitat de Catalunya (contracts 2014-SGR-1051 and 2014-SGR-1272), and by the European Union’s Horizon 2020 research and innovation programme under grant agreement No 671697 and No. 671610. M. Moretó has been partially supported by the Ministry of Economy and Competitiveness under Juan de la Cierva postdoctoral fellowship number JCI-2012-15047.Peer ReviewedPostprint (author's final draft

Crossref

UPCommons. Portal del coneixement obert de la UPC

Accelerating digital forensic searching through GPGPU parallel processing techniques

Author: Bayne Ethan
Publication venue
Publication date: 01/02/2017
Field of study

Abertay Research Portal

A proposed synthesis method for Application-Specific Instruction Set Processors

Author: Horváth Péter
Hosszú Gábor
Kovács Ferenc
Publication venue
Publication date: 01/01/2015
Field of study

Due to the rapid technology advancement in integrated circuit era, the need for the high computation performance together with increasing complexity and manufacturing costs has raised the demand for high-performance con fi gurable designs; therefore, the Application-Speci fi c Instruction Set Processors (ASIPs) are widely used in SoC design. The automated generation of software tools for ASIPs is a commonly used technique, but the automated hardware model generation is less frequently applied in terms of fi nal RTL implementations. Contrary to this, the fi nal register-transfer level models are usually created, at least partly, manually. This paper presents a novel approach for automated hardware model generation for ASIPs. The new solution is based on a novel abstract ASIP model and a modeling language (Algorithmic Microarchitecture Description Language, AMDL) optimized for this architecture model. The proposed AMDL-based pre-synthesis method is based on a set of pre-de fi ned VHDL implementation schemes, which ensure the qualities of the automatically generated register-transfer level models in terms of resource requirement and operation frequency. The design framework implementing the algorithms required by the synthesis method is also presented

Repository of the Academy's Library

A Survey on Hardware-aware and Heterogeneous Computing on Multicore Processors and Accelerators

Author: Buchty Rainer
Heuveline Vincent
Karl Wolfgang
Weiß Jan-Philipp
Publication venue: Karlsruher Institut für Technologie
Publication date: 01/01/2009
Field of study

KITopen