Search CORE

5 research outputs found

HIGH THROUGHPUT IMPLEMENTATION OF 64 BIT MODIFIED WALLANCE MAC USING MULTIOPERAND ADDERS

Author: Sowmya Sri Badisa
V.Sudheer Kumar S.
Publication venue: International Journal of Innovative Technology and Research
Publication date: 07/09/2016
Field of study

Although redundant addition is widely used to design parallel multioperand adders for ASIC implementations, the use of redundant adders on Field Programmable Gate Arrays (FPGAs) has generally been avoided. The main reasons are the efficient implementation of carry propagate adders (CPAs) on these devices (due to their specialized carry-chain resources) as well as the area overhead of the redundant adders when they are implemented on FPGAs. This project presents different approaches to the efficient implementation of generic carry-save compressor trees. In computing, especially digital signal processing, the multiply–accumulate operation is a common step that computes the product of two numbers and adds that product to an accumulator. The hardware unit that performs the operation is known as a multiplier–accumulator (MAC, or MAC unit); the operation itself is also often called a MAC or a MAC operation. Power dissipation is one of the most important design objectives in integrated circuit, after speed. Digital signal processing (DSP) circuits whose main building block is a Multiplier-Accumulator (MAC) unit. High speed and low power MAC unit is desirable for any DSP processor. This is because speed and throughput rate are always the concerns of DSP system. MAC unit consists of adder, multiplier, and an accumulator it preserves a unique mapping between input and output vector of the particular circuit. In this MAC operation is performed in two parts Partial Product Generation (PPG) circuit and Multi-Operand Addition (MOA) circui

International Journal of Innovative Technology and Research (IJITR)

Uso eficiente de aritmética redundante en FPGAs

Author: Ortiz Manuel A.
Publication venue: Universidad de Córdoba, Servicio de Publicaciones
Publication date: 01/01/2013
Field of study

Hasta hace pocos años, la utilización de aritmética redundante en FPGAs había sido descartada por dos razones principalmente. En primer lugar, por el buen rendimiento que ofrecían los sumadores de acarreo propagado, gracias a la lógica de de acarreo que poseían de fábrica y al pequeño tamaño de los operandos en las aplicaciones típicas para FPGAs. En segundo lugar, el excesivo consumo de área que las herramientas de síntesis obtenían cuando mapeaban unidades que trabajan en carrysave. En este trabajo, se muestra que es posible la utilización de aritmética redundante carry-save en FPGAs de manera eficiente, consiguiendo un aumento en la velocidad de operación con un consumo de recursos razonable. Se ha introducido un nuevo formato redundante doble carry-save y se ha demostrado que la manera óptima para la realización de multiplicadores de elevado ancho de palabra es la combinación de multiplicadores empotrados con sumadores carry-save.Till a few years ago, redundant arithmetic had been discarded to be use in FPGA mainly for two reasons. First, the efficient results obtained using carry-propagate adders thanks to the carry-logic embedded in FPGAs and the small sizes of operands in typical FPGA applications. Second, the high number of resources that the synthesis tools utilizes to implement carry-save circuits. In this work, it is demonstrated that carry-save arithmetic can be efficiently used in FPGA, obtaining an important speed improvement with a reasonable area cost. A new redundant format, double carry-save, has been introduced, and the optimal implementation of large size multipliers has been shown based on embedded multipliers and carry-save adders

Repositorio Institucional de la Universidad de Córdoba

Projeto unificado de componentes em hardware e software para sistemas embarcados

Author: Mück Tiago Rogério
Publication venue
Publication date: 01/01/2013
Field of study

Dissertação (mestrado) - Universidade Federal de Santa Catarina, Centro Tecnológico, Programa de Pós-Graduação em Ciência da Computação, Florianópolis, 2013.O crescente aumento na complexidade dos sistemas embarcados está ocasionando uma migração para técnicas de projeto em níveis mais altos de abstração, o que tem levado a uma convergência entre as metodologias de desenvolvimento de hardware e software. Este trabalho tem como objetivo principal contribuir nesse cenário propondo uma estratégia de desenvolvimento unificada que possibilita a implementação de componentes em hardware e software a partir de uma única descrição na linguagem C++. As técnicas propostas se baseiam em conceitos de programação orientada a objetos (do inglês Object-oriented Programming - OOP) e programação orientada a aspectos (do inglês Aspect-oriented Programming - AOP) para guiar uma estratégia de engenharia de domínio que facilita a clara separação entre a estrutura e comportamento-base de um componente das características que são específicas de implementações em hardware ou software.Certos aspectos de um componente, como, por exemplo, alocação de recursos e a interface de comunicação, são modelados de maneiras distintas dependendo da implementação-alvo (hardware ou software). Este trabalho mostra como tais aspectos podem ser fatorados e encapsulados em programas de aspecto que são aplicados às descrições iniciais apenas quando o particionamento final entre hardware e software é definido. Os mecanismos de aplicação de aspectos são definidos via metaprogramação estática utilizando os templates do C++. Dessa forma, a extração de implementações em hardware ou software a partir de uma implementação unificada em C++ é direta e se dá através de transformações no nível da linguagem suportadas por uma grande gama de compiladores e ferramentas de síntese de alto-nível (do inglês High-level Synthesis - HLS). Para avaliar a abordagem proposta, foi desenvolvida uma plataforma flexível para implementação de System-on-Chips (SoCs) em dispositivos lógico programáveis. A infraestrutura de hardware/software desenvolvida utiliza uma arquitetura baseadas em Network-on-Chips (NoCs) para prover um mecanismo de comunicação transparente entre hardware e software. A avaliação dos mecanismos propostos foi feita através da implementação de um SoC para aplicações PABX. Os resultados mostraram que a estratégia proposta resulta em componentes flexíveis e reusáveis com uma eficiência muito próxima a de componentes implementados especificamente para software ou hardware.Abstract : The increasing complexity of current embedded systems is pushing their design to higher levels of abstraction, leading to a convergence between hardware and software design methodologies. In this work we aim at narrowing the gap between hardware and software design by introducing a strategy that handles both domains in a unified fashion. We leverage on Aspect-oriented Programming (AOP) and Object-oriented Programming (OOP) techniques in order to provide unified C++ descriptions of embedded system components. Such unified descriptions can be obtained through a careful design process focused on isolating aspects that are specific of hardware and software scenarios. Aspects that differ significantly in each domain, such as resource allocation and communication interface, were isolated in aspect programs that are applied to the unified descriptions before they are compiled to software binaries or synthesized to dedicated hardware using High-level Synthesis (HLS) tools. Furthermore, we propose a flexible FPGA-based SoC platform for the deployment of SoCs in a HLS-capable environment. The proposed hardware/software infrastructure relies on a Network-on-Chip-based architecture to provide transparent communication mechanisms for hardware and software components. The proposed unified design approach and its transparent communication mechanisms are evaluated through the implementation of a SoC for digital PABX systems. The results show that our strategy leads to reusable and flexible components at the cost of an acceptable overhead when compared to software-only C/C++ and hardware-only C++ implementations

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Repositório Institucional da UFSC

RCAAP - Repositório Científico de Acesso Aberto de Portugal

Dynamisches Scheduling in der Hochsprachen-Compilierung für adaptive Rechner

Author: Gädke-Lütjens Hagen
Publication venue
Publication date: 27/04/2011
Field of study

The single-thread performance of conventional CPUs has not improved significantly due to the stagnation of the CPU frequencies since 2003. Adaptive computers, which combine a CPU with a reconfigurable hardware unit used as hardware accelerator, represent a promising, alternative compute platform. During the past 10 years, much research has been done to develop tools that enhance the usability of adaptive computers. An important goal here is the development of an adaptive compiler, which compiles hardware descriptions from common high-level languages such as C in a fully automated way. Most of the compilers developed until today use static scheduling for the generated hardware. However, for complex programs containing nested loops, irregular control flow, and arbitrary pointers, dynamic scheduling is more appropriate. This work examines the feasibility of compiling to dynamically scheduled hardware, an approach that has been the subject of only limited research efforts so far. Based on previous work we have developed the adaptive compiler COMRADE 2.0, which generates synthesizable hardware descriptions (using dynamic scheduling) from ANSI C. For this, the compiler utilizes our COMRADE Controller Micro-Architecture (COCOMA), an intermediate representation which models even complex control and memory dependences and is thus especially suitable in a compile flow that supports complex C programs. We examine the effects of parameter variations and low-level optimizations on the simulation and synthesis results. The most promising optimization technique considering the runtime is memory localization which can significantly increase the memory bandwidth available to the compiled hardware kernels. Using memory localization we have obtained hardware kernel speed-ups of up to 37x over an embedded superscalar CPU.Bedingt durch die Stagnation der CPU-Frequenzen stagniert seit 2003 auch die Single-Thread-Rechenleistung herkömmlicher CPUs. Adaptive Rechner bieten eine vielversprechende, alternative Rechenarchitektur, indem sie die CPU um eine rekonfigurierbare Einheit, die als Hardware-Beschleuniger verwendet wird, erweitern. In den vergangenen zehn Jahren wurde viel Forschung in Entwurfswerkzeuge investiert, die eine praktikablere Verwendung adaptiver Rechner ermöglichen sollen. Ein wesentliches Ziel ist dabei die Entwicklung eines adaptiven Compilers, der aus einer allgemein verwendeten Hochsprache wie C vollautomatisch Hardwarebeschreibungen erzeugen kann. Die meisten der bisher entwickelten Compiler setzen in der erzeugten Hardware statisches Scheduling ein. Für komplexere Programme mit verschachtelten Schleifen, irregulärem Kontrollfluss und beliebigen Zeigerzugriffen ist jedoch dynamisches Scheduling besser geeignet. Diese Arbeit untersucht die praktische Machbarkeit des dynamischen Schedulings, zu dem es bislang kaum Untersuchungen im Kontext adaptiver Rechner gibt. Auf der Grundlage bestehender Vorarbeiten haben wir den Compiler COMRADE 2.0 entwickelt, der aus ANSI C vollautomatisch synthetisierbare Hardwarebeschreibungen generieren kann, die dynamisches Scheduling verwenden. Eine zentrale Rolle spielt dabei die von uns entwickelte Zwischendarstellung COMRADE Controller Micro-Architecture (COCOMA), die auch komplexere Kontroll- und Speicherabhängigkeiten korrekt modelliert und daher besonders für die Compilierung komplexerer C-Programme geeignet ist. Wir untersuchen die Auswirkungen von Parameteränderungen und Low-Level-Optimierungen auf die Simulations- und Syntheseergebnisse. Als sehr vielversprechend hat sich die Speicherlokalisierung herausgestellt, die eine wesentlich höhere Speicherbandbreite bei geringer Latenz bietet. Wir messen hier Beschleunigungen eines Hardware-Kernels gegenüber einer eingebetteten, superskalaren CPU von bis zu 37-fach

Digitale Bibliothek Braunschweig