14 research outputs found

    VLSI DESIGN FOR CARRY-PROTECT FORMATTED DATA

    Get PDF
    However, research activities have proven the arithmetic optimizations at greater abstraction levels compared to structural circuit one considerably effect on the datapath performance. CS representation continues to be broadly accustomed to design fast arithmetic circuits because of its natural benefit of getting rid of the big carry-propagation chains. Hardware acceleration continues to be demonstrated a very promising implementation technique for digital signal processing (DSP) domain. Instead of adopting a monolithic application-specific integrated circuit design approach, within this brief, we present a manuscript accelerator architecture composed of flexible computational models that offer the execution of a big group of operation templates present in DSP popcorn kernels. Extensive experimental evaluations reveal that the suggested accelerator architecture provides average gains as high as 61.91% in area-delay product and 54.43% in energy consumption in comparison using the condition-of-art flexible datapaths. We differentiate from previous creates flexible accelerators by enabling computations to become strongly carried out with carry-save (CS) formatted data. Advanced arithmetic design concepts, i.e., recoding techniques, are employed enabling CS optimizations to become carried out inside a bigger scope compared to previous approaches

    Synthesis of Multimode digital signal processing systems

    Get PDF
    International audienceIn this paper, we propose a design methodology for implementing a multimode (or multi-configuration) and multi-throughput system into a single hardware architecture. The inputs of the design flow are the data flow graphs (DFGs), representing the different modes (i.e. the different applications to be implemented), with their respective throughput constraints. While traditional approaches merge DFGs together before the synthesis process, we propose to use ad-hoc scheduling and binding steps during the synthesis of each DFG. The scheduling, which assigns operations to specific time steps, maximizes the similarity between the control steps and thus decreases the controller complexity. The binding process, which assigns operations to specific functional units and data to specific storage elements, maximizes the similarity between datapaths and thus minimizes steering logic and register overhead. First results show the interest of the proposed synthesis flow

    Circuit Merging versus Dynamic Partial Reconfiguration -The HoMade Implementation

    Get PDF
    International audienceOne goal of reconfiguration is to save power and occupied resources. In this paper we compare two different kinds of reconfiguration available on field-programmable gate arrays (FPGA) and we discuss their pros and cons. The first method that we study is circuit merging. This type of reconfiguration methods consists in sharing common resources between different circuits. The second method that we explore is dynamic partial reconfiguration (DPR). It is specific to some FPGA, allowing well defined reconfigurable parts to be modified during run-time. We show that DPR, when available, has good and more predictable result in terms of occupied area. There is still a huge overhead in term of time and power consumption during the reconfiguration phase. Therefore we show that circuit merging remains an interesting solution on FPGA because it is not vendor specific and the reconfiguration time is around a clock cycle. Besides, good merging algorithms exist even-though FPGA physical synthesis flow makes it hard to predict the real performance of the merged circuit during the optimization. We establish our comparison in the context of the HoMade processor

    Area Estimation for Fast Design Space Exploration of Multi-mode System

    Get PDF
    학위논문 (석사)-- 서울대학교 대학원 : 전기·컴퓨터공학부, 2013. 2. 최기영.최근 반도체의 설계 복잡도가 급속도로 증가함에 따라 SoC (System on Chip)의 다기능화 대면적화가 급속도로 진행되고 있으며, 이로 인해 chip 면적 최적화(area optimization)가 SoC 설계의 주요 화두들 중 하나로 대두되고 있다. 멀티모드 구조(Multi-mode architecture)는 이런 면적 최적화 문제에 대한 좋은 해결책 중 하나로 볼 수 있다. 멀티모드 구조란 하나의 하드웨어 모듈에 다수의 구성(configuration)을 둠으로써, 선택에 따라 여러 기능을 수행하도록 만드는 것으로, 각 기능의 공통되는 부분의 공유를 통해 면적을 줄일 수 있게 된다. SoC 는 수많은 application 들로 구성이 되며 각각의 application 들은 또한 다수의 functional module 들로 구성되므로, SoC 시스템 전체에 대한 멀티모드 시스템 설계를 위한 설계공간(design space)은 방대한 공간이 된다. 이 설계공간 내의 특정 functional module 들을 선택하여 멀티모드 설계를 적용하게 될 경우 각 조합에 따라 면적 최적화의 정도들은 달라질 수 밖에 없으며, 따라서 전체 SoC 디자인의 관점에서 보면 최대 효율을 위한 최적의 멀티모드 대상 조합을 찾기 위한 설계공간 탐색(design space exploration, DSE)이 필수적으로 요구된다. 하지만 기존의 멀티모드 디자인에 대한 연구들은 대부분 설계자가 임의의 기준에 따라 선택한 대상 functional module 들을 어떻게 효율적으로 잘 합칠 것인가에 대하여 연구의 주안점을 두고 있으며, 방대한 설계공간 내에서의 최적조합 해를 찾기 위한 연구는 찾기 힘들다. 그러나 설계공간의 크기를 감안할 때 전수조사를 통해 최적해 조합을 찾기는 사실상 불가능한 문제다. 빠르고 신뢰할 만한 DSE 를 위한 heuristic algorithm 이 필요한 이유이다. 이때 설계공간 내의 각 조합에 따른 면적 절약 양을 사전에 예측할 수 있다면, 이는 빠르고 신뢰할 만한 DSE algorithm 의 핵심 요소로 사용될 수 있을 것이다. 멀티모드 구조를 적용할 때 얻을 수 있는 면적 절약은 합치게 되는 각 functional module 들이 필요로 하는 functional unit 들 및 register 들을 공유함으로써 얻어진다. 하지만 이로 인해 mux 들이 추가로 사용되어야 하며, 최종적으로 공유로 인한 면적 절약과 추가되는 면적 증가의 차에 의해 면적 절약의 크기가 결정된다. 본 논문에서는 initiation interval constraint 를 갖는 application 에 대하여 최소 functional unit 개수, 간단한 mux 개수를 최소화 하는 binding algorithm 을 통한 mux 증가량 예측, 그리고 간단한 register 개수 예측 등을 통하여 선택한 대상 functional module 들에 멀티모드 HLS(High-Level Synthesis, 상위수준 합성)를 적용할 경우의 최소 면적절약 양을 예측하는 방법을 제안한다. 그리고 제안한 방법에 의해 예측한 면적 절약이 실제로 유효한지 확인하기 위하여 실험을 통해 실제 면적 감소분과의 비교를 진행해 본다. 제안한 면적 절약 예측 방법은 낮은 계산 복잡도를 갖고 있으며, 그에 따라 큰 설계공간 전체에 대해 빠른 계산이 가능하다. 이를 통해 설계공간 내의 무의미한 조합들을 제외하거나 명백히 좋은 조합들을 우선적으로 찾아냄으로써 설계공간의 크기를 유의미하게 줄일 수 있을 것이다.SoC (System on Chip) is gaining more functions and becoming larger with the recent trend of increasing design complexity of semiconductor. So chip area optimization becomes one of the main topics of SoC design. Multi-mode architecture could be one of the good solutions to cope with this area optimization problem. Multimode architecture is a hardware architecture that has multiple configurations and thus the hardware can perform multiple functions by changing the configuration. Areas could be saved by sharing the common parts of the functions. SoC consists of many applications and each application is formed by many functional modules. So the design space for a multi-mode SoC system is huge in general. If we apply multi-mode design methodology to some selected functional modules among this design space, the amount of area savings can be very different depending on which modules are selected for merge. Therefore, DSE (Design Space Exploration) of finding functional modules to be merged is indispensable to achieve maximal total area reduction. But previous researches on multi-mode design are mostly focused on how to merge target functional modules that have been selected by the researchers' arbitrary standard. DSE related researches could be hardly found. Exhaustive search, however, is almost impossible because of huge design space, so fast and reliable heuristic algorithm for DSE is needed. If we can efficiently estimate the amount of area saving for a given combination of functional modules, it could be used as the key-factor of that DSE algorithm. Area savings of multi-mode architecture is gained by sharing functional units or registers that are needed in each target functional module. But additional mux would be used because of the sharingthe final area savings can be calculated by the difference between the area decrease due to sharing and the area increase due to adding muxes. This thesis proposes the method of estimating minimum area savings obtained by applying a multi-mode design method to highlevel synthesis with initiation interval constraints. This estimation considers reduced number of functional units due to sharing, increased number of muxes with simple mux minimizing binding algorithm, and simple estimation of the number of registers. This thesis also validates the proposed approach to estimation of area saving by comparing the estimation results with real amounts of area saving. The proposed estimation method has low computational complexity, thus enables fast exploration of the huge design space.초록 목차 표 목차 그림 목차 제1장 서론 제2장 관련 연구 2.1 상위 수준 합성 (High-Level Synthesis) 2.2 멀티모드 구조 설계 2.2.1 멀티모드 HLS vs. Datapath Merging 2.3 멀티모드 설계를 위한 시스템 수준 설계공간 탐색 제3장 멀티모드 설계의 면적 절약 예측 3.1 면적 절약 예측 개요 3.2 Combinational Logic 면적 절약 예측 3.2.1 Functional Unit 공유 면적 예측 3.2.2 Mux 공유 면적 예측 3.3 Estimation of Non-combinational Logic Area 제4장 실험 결과 4.1 실험 개요 4.1.1 In-house HLS Tool의 구현 4.2 In-house HLS Tool의 유효성 검증 4.3 멀티모드 면적 절약 예측의 유효성 검증 4.4 멀티모드 구조의 면적절약 분석 제5장 결론 및 향후 과제 참고문헌 ABSTRACT 감사의 글Maste

    CHIPS: Custom Hardware Instruction Processor Synthesis

    Full text link

    Algoritmos para alocação de recursos em arquiteturas reconfiguraveis

    Get PDF
    Orientador: Guido Costa Souza de AraujoTese (doutorado) - Universidade Estadual de Campinas, Instituto de ComputaçãoResumo: Pesquisas recentes na área de arquiteturas reconfiguráveis mostram que elas oferecem um desempenho melhor que os processadores de propósito geral (GPPs - General Purpose Processors), aliado a uma maior flexibilidade que os ASICs (Application Specific Integrated Circuits). Uma mesma arquitetura recongurável pode ser adaptada para implementar aplicações diferentes, permitindo a especialização do hardware de acordo com a demanda computacional da aplicação. Neste trabalho, nos estudamos o projeto de sistemas dedicados baseado em uma arquitetura reconfigurável. Adotamos a abordagem de extensão do conjunto de instruções, na qual o conjunto de instruções de um GPP e acrescido de instruções especializadas para uma aplicação. Estas instruções correspondem a trechos da aplicação e são executadas em um datapath dinamicamente recongurável, adicionado ao hardware do GPP. O tema central desta tese e o problema de compartilhamento de recursos no projeto do datapath reconfigurável. Dado que os trechos da aplicação são modelados como grafos de luxo de dados e controle (Control/Data-Flow Graphs ¿ CDFGs), o problema de combinação de CDFGs consiste em projetar um datapath reconfigurável com área mínima. Nos apresentamos uma demonstração de que este problema e NP-completo. Nossas principais contribuições são dois algoritmos heurísticos para o problema de combinação de CDFGs. O primeiro tem o objetivo de minimizar a área das interconexões do datapath reconfigurável, enquanto que o segundo visa a minimização da área total. Avaliações experimentais mostram que nossa primeira heurística resultou em uma redução media de 26,2% na área das interconexões, em relação ao método mais utilizado na literatura. O erro máximo de nossas soluções foi em media 4,1% e algumas soluções ótimas foram obtidas. Nosso segundo algoritmo teve tempos de execução comparáveis ao método mais rápido conhecido, obtendo uma redução media de 20% na área. Em relação ao melhor método para área conhecido, nossa heurística produziu áreas um pouco menores, alcançando um speed up médio de 2500. O algoritmo proposto também produziu áreas menores, quando comparado a uma ferramenta de síntese comercialAbstract: Recent work in reconfigurable architectures shows that they ofter a better performance than general purpose processors (GPPs), while offering more exibility than ASICs (Application Specific Integrated Circuits). A reconfigurable architecture can be adapted to implement different applications, thus allowing the specialization of the hardware according to the computational demands. In this work we describe an embedded systems project based on a reconfigurable architecture. We adopt an instruction set extension technique, where specialized instructions for an application are included into the instruction set of a GPP. These instructions correspond to sections of the application, and are executed in a dynamically reconfigurable datapath, added to the GPP's hardware. The central focus of this theses is the resource sharing problem in the design of reconfigurable datapaths. Since the application sections are modeled as control/data-ow graphs (CDFGs), the CDFG merging problem consists in designing a reconfigurable datapath with minimum area. We prove that this problem is NP-complete. Our main contributions are two heuristic algorithms to the CDFG merging problem. The first has the goal of minimizing the reconfigurable datapath interconnection area, while the second minimizes its total area. Experimental evaluation showed that our first heuristic produced an average 26.2% area reduction, with respect to the most used method. The maximum error of our solutions was on average 4.1%, and some optimal solutions were found. Our second algorithm approached, in execution times, the fastest previous solution, and produced datapaths with an average area reduction of 20%. When compared to the best known area solution, our approach produced slightly better areas, while achieving an average speedup of 2500. The proposed algorithm also produced smaller areas, when compared to an industry synthesis toolDoutoradoDoutor em Ciência da Computaçã

    Efficient datapath merging for partially reconfigurable architectures

    No full text
    Abstract—Reconfigurable systems have been shown to achieve significant performance speedup through architectures that map the most time-consuming application kernel modules or inner loops to a reconfigurable datapath. As each portion of the application starts to execute, the system partially reconfigures the datapath so as to perform the corresponding computation. The reconfigurable datapath should have as few and simple hardware blocks and interconnections as possible, in order to reduce its cost, area, and reconfiguration overhead. To achieve that, hardware blocks and interconnections should be reused as much as possible across the application. We represent each piece of the application as a data-flow graph (DFG). The DFG merging process identifies similarities among the DFGs, and produces a single datapath that can be dynamically reconfigured and has a minimum area cost, when considering both hardware blocks and interconnections. In this paper we present a novel technique for the DFG merge problem, and we evaluate it using programs from the MediaBench benchmark. Our algorithm execution time approaches the fastest previous solution to this problem and produces datapaths with an average area reduction of 20%. When compared to the best known area solution, our approach produces datapaths with area costs equivalent to (and in many cases better than) it, while achieving impressive speedups. Index Terms—High-level synthesis, reconfigurable computing, resource sharing. I

    Efficient datapath merging for partially reconfigurable architectures

    No full text
    Reconfigurable systems have been shown to achieve significant performance speedup through architectures that map the most time-consuming application kernel modules or inner loops to a reconfigurable datapath. As each portion of the application starts to execute, the system partially reconfigures the datapath so as to perform the corresponding computation. The reconfigurable datapath should have as few and simple hardware blocks and interconnections as possible, in order to reduce its cost, area, and reconfiguration overhead. To achieve that, hardware blocks and interconnections should be reused as much as possible across the application. We represent each piece of the application as a data-flow graph (DFG). The DFG merging process identifies similarities among the DFGs, and produces a single datapath that can be dynamically reconfigured and has a minimum area cost, when considering both hardware blocks and interconnections. In this paper we present a novel technique for the DFG merge problem, and we evaluate it using programs from the MediaBench benchmark. Our algorithm execution time approaches the fastest previous solution to this problem and produces datapaths with an average area reduction of 20 %. When compared to the best known area solution, our approach produces datapaths with area costs equivalent to (and in many cases better than) it, while achieving impressive speedups.24796998

    Efficient datapath merging for partially reconfigurable architectures

    No full text
    corecore