7 research outputs found

    Vesyla-II: An Algorithm Library Development Tool for Synchoros VLSI Design Style

    Full text link
    High-level synthesis (HLS) has been researched for decades and is still limited to fast FPGA prototyping and algorithmic RTL generation. A feasible end-to-end system-level synthesis solution has never been rigorously proven. Modularity and composability are the keys to enabling such a system-level synthesis framework that bridges the huge gap between system-level specification and physical level design. It implies that 1) modules in each abstraction level should be physically composable without any irregular glue logic involved and 2) the cost of each module in each abstraction level is accurately predictable. The ultimate reasons that limit how far the conventional HLS can go are precisely that it cannot generate modular designs that are physically composable and cannot accurately predict the cost of its design. In this paper, we propose Vesyla, not as yet another HLS tool, but as a synthesis tool that positions itself in a promising end-to-end synthesis framework and preserving its ability to generate physically composable modular design and to accurately predict its cost metrics. We present in the paper how Vesyla is constructed focusing on the novel platform it targets and the internal data structures that highlights the uniqueness of Vesyla. We also show how Vesyla will be positioned in the end-to-end synchoros synthesis framework called SiLago

    A Compiler Framework for a Coarse-Grained Reconfigurable Array

    Get PDF
    The number of transistors on a chip is increasing with time giving rise to multiple design challenges. In this context, reconfigurable architectures have emerged to provide high flexibility, less power/energy consumption yet while delivering high performance levels. The success of an embedded architecture depends on powerful compiler support. Current studies are focused on developing compilers to reduce the designer’s effort by introducing many automation related features. In this thesis work, a compiler framework is presented for a scalable Coarse-Grained Reconfigurable Array (CGRA) called SCREMA. The compiler framework presented in this thesis replaces the exiting GUI compiler with an added feature of automatic placement and routing. The compiler receives a Reverse Polish Notation (RPN) description of the target algorithm by the user. It extracts the computational information from the RPN description and performs placement and routing over the CGRA template. The first configuration stream generated by the compiler is the main processing context. Furthermore, if additional configuration patterns have to be designed, the compiler framework gives the possibility to implement them in two different design paradigms: a preprocessing context and a canonical context. Pre-processing context is used to align the data into a CGRA to facilitate post-processing. Canonical context allows the user to perform additions in sum-of-products related algorithms. The compiler framework has been tested by implementing real integer Matrix-Vector Multiplication (MVM) algorithms. Specifically, the tested MVM orders are 4th, 8th, 16th and 32nd on the CGRA sizes of 4x4, 4x8, 4x16 and 4x32 PEs, respectively. All the implementation are based on the RPN description of 4th-order MVM. Other than implementing 4th-order MVM, the rest of tested MVM algorithms need preprocessing and canonical contexts to be designed and implemented. The user effort which was needed to Place and Route (P&R) an algorithm manually on SCREMA is now reduced by using this compiler framework as it provides an automatic P&R mechanism

    Algoritmos para alocação de recursos em arquiteturas reconfiguraveis

    Get PDF
    Orientador: Guido Costa Souza de AraujoTese (doutorado) - Universidade Estadual de Campinas, Instituto de ComputaçãoResumo: Pesquisas recentes na área de arquiteturas reconfiguráveis mostram que elas oferecem um desempenho melhor que os processadores de propósito geral (GPPs - General Purpose Processors), aliado a uma maior flexibilidade que os ASICs (Application Specific Integrated Circuits). Uma mesma arquitetura recongurável pode ser adaptada para implementar aplicações diferentes, permitindo a especialização do hardware de acordo com a demanda computacional da aplicação. Neste trabalho, nos estudamos o projeto de sistemas dedicados baseado em uma arquitetura reconfigurável. Adotamos a abordagem de extensão do conjunto de instruções, na qual o conjunto de instruções de um GPP e acrescido de instruções especializadas para uma aplicação. Estas instruções correspondem a trechos da aplicação e são executadas em um datapath dinamicamente recongurável, adicionado ao hardware do GPP. O tema central desta tese e o problema de compartilhamento de recursos no projeto do datapath reconfigurável. Dado que os trechos da aplicação são modelados como grafos de luxo de dados e controle (Control/Data-Flow Graphs ¿ CDFGs), o problema de combinação de CDFGs consiste em projetar um datapath reconfigurável com área mínima. Nos apresentamos uma demonstração de que este problema e NP-completo. Nossas principais contribuições são dois algoritmos heurísticos para o problema de combinação de CDFGs. O primeiro tem o objetivo de minimizar a área das interconexões do datapath reconfigurável, enquanto que o segundo visa a minimização da área total. Avaliações experimentais mostram que nossa primeira heurística resultou em uma redução media de 26,2% na área das interconexões, em relação ao método mais utilizado na literatura. O erro máximo de nossas soluções foi em media 4,1% e algumas soluções ótimas foram obtidas. Nosso segundo algoritmo teve tempos de execução comparáveis ao método mais rápido conhecido, obtendo uma redução media de 20% na área. Em relação ao melhor método para área conhecido, nossa heurística produziu áreas um pouco menores, alcançando um speed up médio de 2500. O algoritmo proposto também produziu áreas menores, quando comparado a uma ferramenta de síntese comercialAbstract: Recent work in reconfigurable architectures shows that they ofter a better performance than general purpose processors (GPPs), while offering more exibility than ASICs (Application Specific Integrated Circuits). A reconfigurable architecture can be adapted to implement different applications, thus allowing the specialization of the hardware according to the computational demands. In this work we describe an embedded systems project based on a reconfigurable architecture. We adopt an instruction set extension technique, where specialized instructions for an application are included into the instruction set of a GPP. These instructions correspond to sections of the application, and are executed in a dynamically reconfigurable datapath, added to the GPP's hardware. The central focus of this theses is the resource sharing problem in the design of reconfigurable datapaths. Since the application sections are modeled as control/data-ow graphs (CDFGs), the CDFG merging problem consists in designing a reconfigurable datapath with minimum area. We prove that this problem is NP-complete. Our main contributions are two heuristic algorithms to the CDFG merging problem. The first has the goal of minimizing the reconfigurable datapath interconnection area, while the second minimizes its total area. Experimental evaluation showed that our first heuristic produced an average 26.2% area reduction, with respect to the most used method. The maximum error of our solutions was on average 4.1%, and some optimal solutions were found. Our second algorithm approached, in execution times, the fastest previous solution, and produced datapaths with an average area reduction of 20%. When compared to the best known area solution, our approach produced slightly better areas, while achieving an average speedup of 2500. The proposed algorithm also produced smaller areas, when compared to an industry synthesis toolDoutoradoDoutor em Ciência da Computaçã

    Das FPGA-Entwicklungssystem CHDL

    Get PDF
    In dieser Arbeit wurde das Konzept der C++-basierten Hardwarebeschreibung für Field Programmable Gate Arrays (FPGAs) weiterentwickelt und optimiert. Ergebnis ist ein homogenes System, das eine deutlich verbesserte Unterstützung für FPGA-Koprozessoren bietet als bisher verfügbare Werkzeuge: Das FPGA-Entwicklungssystem CHDL. CHDL integriert mehrere parallel einsetzbare Beschreibungsebenen von der detaillierten strukturellen Spezifikation über Zustandsmaschinen bis hin zur Hochsprachenbeschreibung. Die Simulation kann durch Nachbilden der Hardwareumgebung mittels C++-Funktionen das gesamte zu untersuchende System umfassen. Auch die Softwarekomponente des FPGA-Koprozessors ist in die Simulation einbezogen. Zusätzlich wird die Anwendung moderner Debugging-Verfahren wie Readback und partielle Rekonfiguration unterstützt. Die Ausgabe der Netzlisten erfolgt direkt im XNF- oder EDIF-Format. Beim Einsatz von CHDL muß der Entwickler nur eine einzige Sprache beherrschen, um Anwendungen für FPGA-Koprozessoren zu implementieren: C++. Ein handelsüblicher C++-Kompiler sowie die Place&Route-Software des FPGA-Herstellers reichen aus, um mit CHDL FPGA-Anwendungen zu entwickeln. Es werden keine weiteren Werkzeuge benötigt, insbesondere keine VHDL-Kompiler

    Fast Compilation for Pipelined Reconfigurable Fabrics

    No full text
    In this paper we describe a compiler which quickly synthesizes high quality pipelined datapaths for pipelined reconfigurable devices. The compiler uses the same internal representation to perform synthesis, module generation, optimization, and place and route. The core of the compiler is a linear time place and route algorithm more than two orders of magnitude faster than traditional CAD tools. The key behind our approach is that we never backtrack, rip-up, or re-route. Instead, the graph representing the computation is preprocessed to guarantee routability by inserting lazy noops. The preprocessing steps provides enough information to make a greedy strategy feasible. The compilation speed is approximately 3000 bit-operations/second (on a PII/400Mhz) for a wide range of applications. The hardware utilization averages 60% on the target device, PipeRench

    High level compilation for gate reconfigurable architectures

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2001.Includes bibliographical references (p. 205-215).A continuing exponential increase in the number of programmable elements is turning management of gate-reconfigurable architectures as "glue logic" into an intractable problem; it is past time to raise this abstraction level. The physical hardware in gate-reconfigurable architectures is all low level - individual wires, bit-level functions, and single bit registers - hence one should look to the fetch-decode-execute machinery of traditional computers for higher level abstractions. Ordinary computers have machine-level architectural mechanisms that interpret instructions - instructions that are generated by a high-level compiler. Efficiently moving up to the next abstraction level requires leveraging these mechanisms without introducing the overhead of machine-level interpretation. In this dissertation, I solve this fundamental problem by specializing architectural mechanisms with respect to input programs. This solution is the key to efficient compilation of high-level programs to gate reconfigurable architectures. My approach to specialization includes several novel techniques. I develop, with others, extensive bitwidth analyses that apply to registers, pointers, and arrays. I use pointer analysis and memory disambiguation to target devices with blocks of embedded memory. My approach to memory parallelization generates a spatial hierarchy that enables easier-to-synthesize logic state machines with smaller circuits and no long wires.(cont.) My space-time scheduling approach integrates the techniques of high-level synthesis with the static routing concepts developed for single-chip multiprocessors. Using DeepC, a prototype compiler demonstrating my thesis, I compile a new benchmark suite to Xilinx Virtex FPGAs. Resulting performance is comparable to a custom MIPS processor, with smaller area (40 percent on average), higher evaluation speeds (2.4x), and lower energy (18x) and energy-delay (45x). Specialization of advanced mechanisms results in additional speedup, scaling with hardware area, at the expense of power. For comparison, I also target IBM's standard cell SA-27E process and the RAW microprocessor. Results include sensitivity analysis to the different mechanisms specialized and a grand comparison between alternate targets.by Jonathan William Babb.Ph.D
    corecore