218 research outputs found

    Techniques d'exploration architecturale de design à usage spécifique pour l'accélération de boucles

    Get PDF
    RÉSUMÉ De nos jours, les industriels privilĂ©gient les architectures flexibles afin de rĂ©duire le temps et les coĂ»ts de conception d’un systĂšme. Les processeurs Ă  usage spĂ©cifique (ASIP) fournissent beaucoup de flexibilitĂ©, tout en atteignant des performances Ă©levĂ©es. Une tendance qui a de plus en plus de succĂšs dans le processus de conception d’un systĂšme sur puce consiste Ă  spĂ©cifier le comportement du systĂšme en langage Ă©voluĂ© tel que le C, SystemC, etc. La spĂ©cification est ensuite utilisĂ©e durant le partitionement pour dĂ©terminer les composantes logicielles et matĂ©rielles du systĂšme. Avec la maturitĂ© des gĂ©nĂ©rateurs automatiques de ASIP, les concepteurs peuvent rajouter dans leurs boĂźtes Ă  outils un nouveau type d’architecture, Ă  savoir les ASIP, en sachant que ces derniers sont conçus Ă  partir d’une spĂ©cification dĂ©crite en langage Ă©voluĂ©. D’un autre cĂŽtĂ©, dans le monde matĂ©riel, et cela depuis trĂšs longtemps, les chercheurs ont vu l’avantage de baser le processus de conception sur un langage Ă©voluĂ©. Cette recherche a abouti Ă  l’avĂ©nement de gĂ©nĂ©rateurs automatiques de matĂ©riel sur le marchĂ© qui sont des outils d’aide Ă  la conception comme CapatultC, Forte’s Cynthetizer, etc. Ainsi, avec tous ces outils basĂ©s sur le langage C, les concepteurs ont un choix de types de design Ă©largi mais, d’un autre cĂŽtĂ©, les options de designs possibles explosent, ce qui peut allonger au lieu de rĂ©duire le temps de conception. C’est dans ce cadre que notre thĂšse doctorale s’inscrit, puisqu’elle prĂ©sente des mĂ©thodologies d’exploration architecturale de design Ă  usage spĂ©cifique pour l’accĂ©lĂ©ration de boucles afin de rĂ©duire le temps de conception, entre autres. Cette thĂšse a dĂ©butĂ© par l’exploration de designs de ASIP. Les boucles de traitement sont de bonnes candidates Ă  l’accĂ©lĂ©ration, si elles comportent de bonnes possibilitĂ©s de parallĂ©lisme et si ces derniĂšres sont bien exploitĂ©es. Le matĂ©riel est trĂšs efficace Ă  profiter des possibilitĂ©s de parallĂ©lisme au niveau instruction, donc, une mĂ©thode de conception a Ă©tĂ© proposĂ©e. Cette derniĂšre extrait le parallĂ©lisme d’une boucle afin d’exĂ©cuter plus d’opĂ©rations concurrentes dans des instructions spĂ©cialisĂ©es. Notre mĂ©thode se base aussi sur l’optimisation des donnĂ©es dans l’architecture du processeur.---------- ABSTRACT Time to market is a very important concern in industry. That is why the industry always looks for new CAD tools that contribute to reducing design time. Application-specific instruction-set processors (ASIPs) provide flexibility and they allow reaching good performance if they are well designed. One trend that gains more and more success is C-based design that uses a high level language such as C, SystemC, etc. The C-based specification is used during the partitionning phase to determine the software and hardware components of the system. Since automatic processor generators are mature now, designers have a new type of tool they can rely on during architecture design. In the hardware world, high level synthesis was and is still a hot research topic. The advances in ESL lead to commercial high-level synthesis tools such as CapatultC, Forte’s Cynthetizer, etc. The designers have more tools in their box but they have more solutions to explore, thus their use can have a reverse effect since the design time can increase instead of being reduced. Our doctoral research tackles this issue by proposing new methodologies for design space exploration of application specific architecture for loop acceleration in order to reduce the design time while reaching some targeted performances. Our thesis starts with the exploration of ASIP design. We propose a method that targets loop acceleration with highly coupled specialized-instructions executing loop operations. Loops are good candidates for acceleration when the parallelism they offer is well exploited (if they have any parallelization opportunities). Hardware components such as specialized-instructions can leverage parallelization opportunities at low level. Thus, we propose to extract loop parallelization opportunities and to execute more concurrent operations in specialized-instructions. The main contribution of this method is a new approach to specialized-instruction (SI) design based on loop acceleration where loop optimization and transformation are done in SIs directly, instead of optimizing the software code. Another contribution is the design of tightly-coupled specialized-instructions associated with loops based on a 5-pattern representation

    Design of multimedia processor based on metric computation

    Get PDF
    Media-processing applications, such as signal processing, 2D and 3D graphics rendering, and image compression, are the dominant workloads in many embedded systems today. The real-time constraints of those media applications have taxing demands on today's processor performances with low cost, low power and reduced design delay. To satisfy those challenges, a fast and efficient strategy consists in upgrading a low cost general purpose processor core. This approach is based on the personalization of a general RISC processor core according the target multimedia application requirements. Thus, if the extra cost is justified, the general purpose processor GPP core can be enforced with instruction level coprocessors, coarse grain dedicated hardware, ad hoc memories or new GPP cores. In this way the final design solution is tailored to the application requirements. The proposed approach is based on three main steps: the first one is the analysis of the targeted application using efficient metrics. The second step is the selection of the appropriate architecture template according to the first step results and recommendations. The third step is the architecture generation. This approach is experimented using various image and video algorithms showing its feasibility

    Instruction-set architecture synthesis for VLIW processors

    Get PDF

    Automated design of domain-specific custom instructions

    Get PDF

    KAVUAKA: a low-power application-specific processor architecture for digital hearing aids

    Get PDF
    The power consumption of digital hearing aids is very restricted due to their small physical size and the available hardware resources for signal processing are limited. However, there is a demand for more processing performance to make future hearing aids more useful and smarter. Future hearing aids should be able to detect, localize, and recognize target speakers in complex acoustic environments to further improve the speech intelligibility of the individual hearing aid user. Computationally intensive algorithms are required for this task. To maintain acceptable battery life, the hearing aid processing architecture must be highly optimized for extremely low-power consumption and high processing performance.The integration of application-specific instruction-set processors (ASIPs) into hearing aids enables a wide range of architectural customizations to meet the stringent power consumption and performance requirements. In this thesis, the application-specific hearing aid processor KAVUAKA is presented, which is customized and optimized with state-of-the-art hearing aid algorithms such as speaker localization, noise reduction, beamforming algorithms, and speech recognition. Specialized and application-specific instructions are designed and added to the baseline instruction set architecture (ISA). Among the major contributions are a multiply-accumulate (MAC) unit for real- and complex-valued numbers, architectures for power reduction during register accesses, co-processors and a low-latency audio interface. With the proposed MAC architecture, the KAVUAKA processor requires 16 % less cycles for the computation of a 128-point fast Fourier transform (FFT) compared to related programmable digital signal processors. The power consumption during register file accesses is decreased by 6 %to 17 % with isolation and by-pass techniques. The hardware-induced audio latency is 34 %lower compared to related audio interfaces for frame size of 64 samples.The final hearing aid system-on-chip (SoC) with four KAVUAKA processor cores and ten co-processors is integrated as an application-specific integrated circuit (ASIC) using a 40 nm low-power technology. The die size is 3.6 mm2. Each of the processors and co-processors contains individual customizations and hardware features with a varying datapath width between 24-bit to 64-bit. The core area of the 64-bit processor configuration is 0.134 mm2. The processors are organized in two clusters that share memory, an audio interface, co-processors and serial interfaces. The average power consumption at a clock speed of 10 MHz is 2.4 mW for SoC and 0.6 mW for the 64-bit processor.Case studies with four reference hearing aid algorithms are used to present and evaluate the proposed hardware architectures and optimizations. The program code for each processor and co-processor is generated and optimized with evolutionary algorithms for operation merging,instruction scheduling and register allocation. The KAVUAKA processor architecture is com-pared to related processor architectures in terms of processing performance, average power consumption, and silicon area requirements
    • 

    corecore