515 research outputs found

    Performance Aspects of Synthesizable Computing Systems

    Get PDF

    Heracles: A Tool for Fast RTL-Based Design Space Exploration of Multicore Processors

    Get PDF
    This paper presents Heracles, an open-source, functional, parameterized, synthesizable multicore system toolkit. Such a multi/many-core design platform is a powerful and versatile research and teaching tool for architectural exploration and hardware-software co-design. The Heracles toolkit comprises the soft hardware (HDL) modules, application compiler, and graphical user interface. It is designed with a high degree of modularity to support fast exploration of future multicore processors of di erent topologies, routing schemes, processing elements (cores), and memory system organizations. It is a component-based framework with parameterized interfaces and strong emphasis on module reusability. The compiler toolchain is used to map C or C++ based applications onto the processing units. The GUI allows the user to quickly con gure and launch a system instance for easy factorial development and evaluation. Hardware modules are implemented in synthesizable Verilog and are FPGA platform independent. The Heracles tool is freely available under the open-source MIT license at: http://projects.csail.mit.edu/heracle

    Heracles: Fully Synthesizable Parameterized MIPS-Based Multicore System

    Get PDF
    Heracles is an open-source complete multicore system written in Verilog. It is fully parameterized and can be reconfigured and synthesized into different topologies and sizes. Each processing node has a 7-stage pipeline, fully bypassed, microprocessor running the MIPS-III ISA, a 4-stage input-buffer, virtual-channel router, and a local variable-size shared memory. Our design is highly modular with clear interfaces between the core, the memory hierarchy, and the on-chip network. In the baseline design, the microprocessor is attached to two caches, one instruction cache and one data cache, which are oblivious to the global memory organization. The memory system in Heracles can be configured as one single global shared memory (SM), or distributed shared memory (DSM), or any combination thereof. Each core is connected to the rest of the network of processors by a parameterized, realistic, wormhole router. We show different topology configurations of the system, and their synthesis results on the Xilinx Virtex-5 LX330T FPGA board. We also provide a small MIPS cross-compiler toolchain to assist in developing software for Heracles

    Electronic System-Level Synthesis Methodologies

    Full text link

    A System-Level Simulation Model for a Protocol Processor

    Get PDF
    As the development of integrated circuit technology continues to follow Moore’s law the complexity of circuits increases exponentially. Traditional hardware description languages such as VHDL and Verilog are no longer powerful enough to cope with this level of complexity and do not provide facilities for hardware/software codesign. Languages such as SystemC are intended to solve these problems by combining the powerful expression of high level programming languages and hardware oriented facilities of hardware description languages. To fully replace older languages in the desing flow of digital systems SystemC should also be synthesizable. The devices required by modern high speed networks often share the same tight constraints for e.g. size, power consumption and price with embedded systems but have also very demanding real time and quality of service requirements that are difficult to satisfy with general purpose processors. Dedicated hardware blocks of an application specific instruction set processor are one way to combine fast processing speed, energy efficiency, flexibility and relatively low time-to-market. Common features can be identified in the network processing domain making it possible to develop specialized but configurable processor architectures. One such architecture is the TACO which is based on transport triggered architecture. The architecture offers a high degree of parallelism and modularity and greatly simplified instruction decoding. For this M.Sc.(Tech) thesis, a simulation environment for the TACO architecture was developed with SystemC 2.2 using an old version written with SystemC 1.0 as a starting point. The environment enables rapid design space exploration by providing facilities for hw/sw codesign and simulation and an extendable library of automatically configured reusable hardware blocks. Other topics that are covered are the differences between SystemC 1.0 and 2.2 from the viewpoint of hardware modeling, and compilation of a SystemC model into synthesizable VHDL with Celoxica Agility SystemC Compiler. A simulation model for a processor for TCP/IP packet validation was designed and tested as a test case for the environment.Siirretty Doriast

    Digital signal processing: the impact of convergence on education, society and design flow

    Get PDF
    Design and development of real-time, memory and processor hungry digital signal processing systems has for decades been accomplished on general-purpose microprocessors. Increasing needs for high-performance DSP systems made these microprocessors unattractive for such implementations. Various attempts to improve the performance of these systems resulted in the use of dedicated digital signal processing devices like DSP processors and the former heavyweight champion of electronics design – Application Specific Integrated Circuits. The advent of RAM-based Field Programmable Gate Arrays has changed the DSP design flow. Software algorithmic designers can now take their DSP algorithms right from inception to hardware implementation, thanks to the increasing availability of software/hardware design flow or hardware/software co-design. This has led to a demand in the industry for graduates with good skills in both Electrical Engineering and Computer Science. This paper evaluates the impact of technology on DSP-based designs, hardware design languages, and how graduate/undergraduate courses have changed to suit this transition

    FINN: A Framework for Fast, Scalable Binarized Neural Network Inference

    Full text link
    Research has shown that convolutional neural networks contain significant redundancy, and high classification accuracy can be obtained even when weights and activations are reduced from floating point to binary values. In this paper, we present FINN, a framework for building fast and flexible FPGA accelerators using a flexible heterogeneous streaming architecture. By utilizing a novel set of optimizations that enable efficient mapping of binarized neural networks to hardware, we implement fully connected, convolutional and pooling layers, with per-layer compute resources being tailored to user-provided throughput requirements. On a ZC706 embedded FPGA platform drawing less than 25 W total system power, we demonstrate up to 12.3 million image classifications per second with 0.31 {\mu}s latency on the MNIST dataset with 95.8% accuracy, and 21906 image classifications per second with 283 {\mu}s latency on the CIFAR-10 and SVHN datasets with respectively 80.1% and 94.9% accuracy. To the best of our knowledge, ours are the fastest classification rates reported to date on these benchmarks.Comment: To appear in the 25th International Symposium on Field-Programmable Gate Arrays, February 201
    • 

    corecore