773 research outputs found
Coarse-grained reconfigurable array architectures
Coarse-Grained Reconfigurable Array (CGRA) architectures accelerate the same inner loops that benefit from the high ILP support in VLIW architectures. By executing non-loop code on other cores, however, CGRAs can focus on such loops to execute them more efficiently. This chapter discusses the basic principles of CGRAs, and the wide range of design options available to a CGRA designer, covering a large number of existing CGRA designs. The impact of different options on flexibility, performance, and power-efficiency is discussed, as well as the need for compiler support. The ADRES CGRA design template is studied in more detail as a use case to illustrate the need for design space exploration, for compiler support and for the manual fine-tuning of source code
Video Processing Acceleration using Reconfigurable Logic and Graphics Processors
A vexing question is `which architecture will prevail as the core feature of the next state of
the art video processing system?' This thesis examines the substitutive and collaborative
use of the two alternatives of the reconfigurable logic and graphics processor architectures.
A structured approach to executing architecture comparison is presented - this includes a
proposed `Three Axes of Algorithm Characterisation' scheme and a formulation of perfor-
mance drivers. The approach is an appealing platform for clearly defining the problem,
assumptions and results of a comparison. In this work it is used to resolve the advanta-
geous factors of the graphics processor and reconfigurable logic for video processing, and
the conditions determining which one is superior. The comparison results prompt the
exploration of the customisable options for the graphics processor architecture. To clearly
define the architectural design space, the graphics processor is first identifed as part of
a wider scope of homogeneous multi-processing element (HoMPE) architectures. A novel
exploration tool is described which is suited to the investigation of the customisable op-
tions of HoMPE architectures. The tool adopts a systematic exploration approach and a
high-level parameterisable system model, and is used to explore pre- and post-fabrication
customisable options for the graphics processor. A positive result of the exploration is the
proposal of a reconfigurable engine for data access (REDA) to optimise graphics processor
performance for video processing-specific memory access patterns. REDA demonstrates
the viability of the use of reconfigurable logic as collaborative `glue logic' in the graphics
processor architecture
Adaptive Wireless Networking
This paper presents the Adaptive Wireless Networking (AWGN) project. The project aims to develop methods and technologies that can be used to design efficient adaptable and reconfigurable mobile terminals for future wireless communication systems. An overview of the activities in the project is given. Furthermore our vision on adaptivity in wireless communications and suggestions for future activities are presented
Recommended from our members
ENERGY EFFICIENCY EXPLORATION OF COARSE-GRAIN RECONFIGURABLE ARCHITECTURE WITH EMERGING NONVOLATILE MEMORY
With the rapid growth in consumer electronics, people expect thin, smart and powerful devices, e.g. Google Glass and other wearable devices. However, as portable electronic products become smaller, energy consumption becomes an issue that limits the development of portable systems due to battery lifetime. In general, simply reducing device size cannot fully address the energy issue.
To tackle this problem, we propose an on-chip interconnect infrastructure and pro- gram storage structure for a coarse-grained reconfigurable architecture (CGRA) with emerging non-volatile embedded memory (MRAM). The interconnect is composed of a matrix of time-multiplexed switchboxes which can be dynamically reconfigured with the goal of energy reduction. The number of processors performing computation can also be adapted. The use of MRAM provides access to high-density storage and lower memory energy consumption versus more standard SRAM technologies. The combination of CGRA, MRAM, and flexible on-chip interconnection is considered for signal processing. This application domain is of interest based on its time-varying computing demands.
To evaluate CGRA architectural features, prototype architectures have been pro- totyped in a field-programmable gate array (FPGA). Measurements of energy, power, instruction count, and execution time performance are considered for a scalable num- ber of processors. Applications such as adaptive Viterbi decoding and Reed Solomon coding are used for evaluation. To complete this thesis, a time-scheduled switchbox was integrated into our CGRA model. This model was prototyped on an FPGA. It is shown that energy consumption can be reduced by about 30% if dynamic design reconfiguration is performed
A Finite Domain Constraint Approach for Placement and Routing of Coarse-Grained Reconfigurable Architectures
Scheduling, placement, and routing are important steps in Very Large Scale Integration (VLSI) design. Researchers have developed numerous techniques to solve placement and routing problems. As the complexity of Application Specific Integrated Circuits (ASICs) increased over the past decades, so did the demand for improved place and route techniques. The primary objective of these place and route approaches has typically been wirelength minimization due to its impact on signal delay and design performance. With the advent of Field Programmable Gate Arrays (FPGAs), the same place and route techniques were applied to FPGA-based design. However, traditional place and route techniques may not work for Coarse-Grained Reconfigurable Architectures (CGRAs), which are reconfigurable devices offering wider path widths than FPGAs and more flexibility than ASICs, due to the differences in architecture and routing network. Further, the routing network of several types of CGRAs, including the Field Programmable Object Array (FPOA), has deterministic timing as compared to the routing fabric of most ASICs and FPGAs reported in the literature. This necessitates a fresh look at alternative approaches to place and route designs. This dissertation presents a finite domain constraint-based, delay-aware placement and routing methodology targeting an FPOA. The proposed methodology takes advantage of the deterministic routing network of CGRAs to perform a delay aware placement
Performance evaluation of MPEG-4 Video encoder on Adres
Curs 2006-2007Actualment un típic embedded system (ex. telèfon mòbil) requereix alta qualitat per portar a terme tasques com codificar/descodificar a temps real; han de consumir poc energia per funcionar hores o dies utilitzant bateries lleugeres; han de ser el suficientment flexibles per integrar múltiples aplicacions i estàndards en un sol aparell; han de ser dissenyats i verificats en un període de temps curt tot i l’augment de la complexitat. Els dissenyadors lluiten contra aquestes adversitats, que demanen noves innovacions en arquitectures i metodologies de disseny.
Coarse-grained reconfigurable architectures (CGRAs) estan emergent com a candidats potencials per superar totes aquestes dificultats. Diferents tipus d’arquitectures han estat presentades en els últims anys. L’alta granularitat redueix molt el retard, l’àrea, el consum i el temps de configuració comparant amb les FPGAs. D’altra banda, en comparació amb els tradicionals processadors coarse-grained programables, els alts recursos computacionals els permet d’assolir un alt nivell de paral•lelisme i eficiència. No obstant, els CGRAs existents no estant sent aplicats principalment per les grans dificultats en la programació per arquitectures complexes.
ADRES és una nova CGRA dissenyada per I’Interuniversity Micro-Electronics Center (IMEC). Combina un processador very-long instruction word (VLIW) i un coarse-grained array per tenir dues opcions diferents en un mateix dispositiu físic. Entre els seus avantatges destaquen l’alta qualitat, poca redundància en les comunicacions i la facilitat de programació. Finalment ADRES és un patró enlloc d’una arquitectura concreta. Amb l’ajuda del compilador DRESC (Dynamically Reconfigurable Embedded System Compile), és possible trobar millors arquitectures o arquitectures específiques segons l’aplicació.
Aquest treball presenta la implementació d’un codificador MPEG-4 per l’ADRES. Mostra l’evolució del codi per obtenir una bona implementació per una arquitectura donada. També es presenten les característiques principals d’ADRES i el seu compilador (DRESC).
Els objectius són de reduir al màxim el nombre de cicles (temps) per implementar el codificador de MPEG-4 i veure les diferents dificultats de treballar en l’entorn ADRES. Els resultats mostren que els cícles es redueixen en un 67% comparant el codi inicial i final en el mode VLIW i un 84% comparant el codi inicial en VLIW i el final en mode CGA.Nowadays, a typical embedded system requires high performance to perform tasks such as video encoding/decoding at run-time. It should consume little energy to work hours or even days using a lightweight battery. It should be flexible enough to integrate multiple applications and standards in one single device. It has to be designed and verified in short time-to-market despite substantially increased complexity. The designers are struggling to meet these huge challenges, which call for innovations of both architectures and design methodology.
Coarse-grained reconfigurable architectures (CGRAs) are emerging as potential candidates to meet the above challenges. Many of them were proposed in recent years. This coarse granularity greatly reduces delay, area, power and configuration time compared with FPGAs. On the other hand, compared with traditional "coarse-grained" programmable processors, their massive computational resources enable them to achieve high parallelism and efficiency. However, existing CGRAs have yet been widely adopted mainly because of programming difficulty for such complex architecture.
ADRES is a novel CGRA designed by Interuniversity Micro-Electronics Center (IMEC). It tightly couples a very-long instruction word (VLIW) processor and a coarse-grained array by providing two functional views on the same physical resources. It brings advantages such as high performance, low communication overhead and easiness of programming. Finally, ADRES is a template instead of a concrete architecture. With the retargetable compilation support from DRESC (Dynamically Reconfigurable Embedded System Compile), architectural exploration becomes possible to discover better architectures or design domain-specific architectures.
In this thesis, a performance of an MPEG-4 encoder in ADRES is presented. The thesis shows the code evolution to obtain a good implementation for a given architecture. The main features of ADRES and its compiler (DRESC) are presented. The objectives are to reduce as much as possible the amount of cycles (time) spent to encode video in MPEG-4 and test different issues working with ADRES environment.
The cycles decrease a 67% comparing initial and final code in VLIW and
84% between initial VLIW and CGA mode.Director/a: Moisès Serra i SerraSupervisor: Eric Delfoss
Recommended from our members
Resource-Aware Predictive Models in Cyber-Physical Systems
Cyber-Physical Systems (CPS) are composed of computing devices interacting with physical systems. Model-based design is a powerful methodology in CPS design in the implementation of control systems. For instance, Model Predictive Control (MPC) is typically implemented in CPS applications, e.g., in path tracking of autonomous vehicles. MPC deploys a model to estimate the behavior of the physical system at future time instants for a specific time horizon. Ordinary Differential Equations (ODE) are the most commonly used models to emulate the behavior of continuous-time (non-)linear dynamical systems. A complex physical model may comprise thousands of ODEs that pose scalability, performance and power consumption challenges. One approach to address these model complexity challenges are frameworks that automate the development of model-to-model transformation. In this dissertation, a state-based model with tunable parameters is proposed to operate as a reconfigurable predictive model of the physical system. Moreover, we propose a run-time switching algorithm that selects the best model using machine learning. We employed a metric that formulates the trade-off between the error and computational savings due to model reduction. Building statistical models are constrained to having expert knowledge and an actual understanding of the modeled phenomenon or process. Also, statistical models may not produce solutions that are as robust in a real-world context as factors outside the model, like disruptions would not be taken into account. Machine learning models have emerged as a solution to account for the dynamic behavior of the environment and automate intelligence acquisition and refinement. Neural networks are machine learning models, well-known to have the ability to learn linear and nonlinear relations between input and output variables without prior knowledge. However, the ability to efficiently exploit resource-hungry neural networks in embedded resource-bound settings is a major challenge.Here, we proposed Priority Neuron Network (PNN), a resource-aware neural networks model that can be reconfigured into smaller sub-networks at runtime. This approach enables a trade-off between the model's computation time and accuracy based on available resources. The PNN model is memory efficient since it stores only one set of parameters to account for various sub-network sizes. We propose a training algorithm that applies regularization techniques to constrain the activation value of neurons and assigns a priority to each one. We consider the neuron's ordinal number as our priority criteria in that the priority of the neuron is inversely proportional to its ordinal number in the layer. This imposes a relatively sorted order on the activation values. We conduct experiments to employ our PNN as the predictive model in a CPS application. We can see that not only our technique will resolve the memory overhead of DNN architectures but it also reduces the computation overhead for the training process substantially. The training time is a critical matter especially in embedded systems where many NN models are trained on the fly
- …