5,176 research outputs found

    Reconfigurable Hardware Accelerators: Opportunities, Trends, and Challenges

    Full text link
    With the emerging big data applications of Machine Learning, Speech Recognition, Artificial Intelligence, and DNA Sequencing in recent years, computer architecture research communities are facing the explosive scale of various data explosion. To achieve high efficiency of data-intensive computing, studies of heterogeneous accelerators which focus on latest applications, have become a hot issue in computer architecture domain. At present, the implementation of heterogeneous accelerators mainly relies on heterogeneous computing units such as Application-specific Integrated Circuit (ASIC), Graphics Processing Unit (GPU), and Field Programmable Gate Array (FPGA). Among the typical heterogeneous architectures above, FPGA-based reconfigurable accelerators have two merits as follows: First, FPGA architecture contains a large number of reconfigurable circuits, which satisfy requirements of high performance and low power consumption when specific applications are running. Second, the reconfigurable architectures of employing FPGA performs prototype systems rapidly and features excellent customizability and reconfigurability. Nowadays, in top-tier conferences of computer architecture, emerging a batch of accelerating works based on FPGA or other reconfigurable architectures. To better review the related work of reconfigurable computing accelerators recently, this survey reserves latest high-level research products of reconfigurable accelerator architectures and algorithm applications as the basis. In this survey, we compare hot research issues and concern domains, furthermore, analyze and illuminate advantages, disadvantages, and challenges of reconfigurable accelerators. In the end, we prospect the development tendency of accelerator architectures in the future, hoping to provide a reference for computer architecture researchers

    Study on the Availability Prediction of the Reconfigurable Networked Software System

    Full text link
    This paper describes multi-agent based availability prediction approach for the reconfigurable networked software system

    High Level Hardware/Software Embedded System Design with Redsharc

    Full text link
    As tools for designing multiple processor systems-on-chips (MPSoCs) continue to evolve to meet the demands of developers, there exist systematic gaps that must be bridged to provide a more cohesive hardware/software development environment. We present Redsharc to address these problems and enable: system generation, software/hardware compilation and synthesis, run-time control and execution of MPSoCs. The efforts presented in this paper extend our previous work to provide a rich API, build infrastructure, and runtime enabling developers to design a system of simultaneously executing kernels in software or hardware, that communicate seamlessly. In this work we take Redsharc further to support a broader class of applications across a larger number of devices requiring a more unified system development environment and build infrastructure. To accomplish this we leverage existing tools and extend Redsharc with build and control infrastructure to relieve the burden of system development allowing software programmers to focus their efforts on application and kernel development.Comment: Presented at First International Workshop on FPGAs for Software Programmers (FSP 2014) (arXiv:1408.4423

    Timing verification of dynamically reconfigurable logic for Xilinx Virtex FPGA series

    Get PDF
    This paper reports on a method for extending existing VHDL design and verification software available for the Xilinx Virtex series of FPGAs. It allows the designer to apply standard hardware design and verification tools to the design of dynamically reconfigurable logic (DRL). The technique involves the conversion of a dynamic design into multiple static designs, suitable for input to standard synthesis and APR tools. For timing and functional verification after APR, the sections of the design can then be recombined into a single dynamic system. The technique has been automated by extending an existing DRL design tool named DCSTech, which is part of the Dynamic Circuit Switching (DCS) CAD framework. The principles behind the tools are generic and should be readily extensible to other architectures and CAD toolsets. Implementation of the dynamic system involves the production of partial configuration bitstreams to load sections of circuitry. The process of creating such bitstreams, the final stage of our design flow, is summarized

    The future of computing beyond Moore's Law.

    Get PDF
    Moore's Law is a techno-economic model that has enabled the information technology industry to double the performance and functionality of digital electronics roughly every 2 years within a fixed cost, power and area. Advances in silicon lithography have enabled this exponential miniaturization of electronics, but, as transistors reach atomic scale and fabrication costs continue to rise, the classical technological driver that has underpinned Moore's Law for 50 years is failing and is anticipated to flatten by 2025. This article provides an updated view of what a post-exascale system will look like and the challenges ahead, based on our most recent understanding of technology roadmaps. It also discusses the tapering of historical improvements, and how it affects options available to continue scaling of successors to the first exascale machine. Lastly, this article covers the many different opportunities and strategies available to continue computing performance improvements in the absence of historical technology drivers. This article is part of a discussion meeting issue 'Numerical algorithms for high-performance computational science'

    Prototyping scalable digital signal processing systems for radio astronomy using dataflow models

    Full text link
    There is a growing trend toward using high-level tools for design and implementation of radio astronomy digital signal processing (DSP) systems. Such tools, for example, those from the Collaboration for Astronomy Signal Processing and Electronics Research (CASPER), are usually platform-specific, and lack high-level, platform-independent, portable, scalable application specifications. This limits the designer's ability to experiment with designs at a high-level of abstraction and early in the development cycle. We address some of these issues using a model-based design approach employing dataflow models. We demonstrate this approach by applying it to the design of a tunable digital downconverter (TDD) used for narrow-bandwidth spectroscopy. Our design is targeted toward an FPGA platform, called the Interconnect Break-out Board (IBOB), that is available from the CASPER. We use the term TDD to refer to a digital downconverter for which the decmation factor and center frequency can be reconfigured without the need for regenerating the hardware code. Such a design is currently not available in the CASPER DSP library. The work presented in this paper focuses on two aspects. Firstly, we introduce and demonstrate a dataflow-based design approach using the dataflow interchange format (DIF) tool for high-level application specification, and we integrate this approach with the CASPER tool flow. Secondly, we explore the trade-off between the flexibility of TDD designs and the low hardware cost of fixed-configuration digital downconverter (FDD) designs that use the available CASPER DSP library. We further explore this trade-off in the context of a two-stage downconversion scheme employing a combination of TDD or FDD designs.Comment: Accepted for publication in Radio Scienc

    ACACES 2011 poster abstracts: July 13, 2011: Fiuggi, Italy

    Get PDF

    NeuroFlow: A General Purpose Spiking Neural Network Simulation Platform using Customizable Processors

    Get PDF
    © 2016 Cheung, Schultz and Luk.NeuroFlow is a scalable spiking neural network simulation platform for off-the-shelf high performance computing systems using customizable hardware processors such as Field-Programmable Gate Arrays (FPGAs). Unlike multi-core processors and application-specific integrated circuits, the processor architecture of NeuroFlow can be redesigned and reconfigured to suit a particular simulation to deliver optimized performance, such as the degree of parallelism to employ. The compilation process supports using PyNN, a simulator-independent neural network description language, to configure the processor. NeuroFlow supports a number of commonly used current or conductance based neuronal models such as integrate-and-fire and Izhikevich models, and the spike-timing-dependent plasticity (STDP) rule for learning. A 6-FPGA system can simulate a network of up to ~600,000 neurons and can achieve a real-time performance of 400,000 neurons. Using one FPGA, NeuroFlow delivers a speedup of up to 33.6 times the speed of an 8-core processor, or 2.83 times the speed of GPU-based platforms. With high flexibility and throughput, NeuroFlow provides a viable environment for large-scale neural network simulation

    Designing a Million-Qubit Quantum Computer Using Resource Performance Simulator

    Full text link
    The optimal design of a fault-tolerant quantum computer involves finding an appropriate balance between the burden of large-scale integration of noisy components and the load of improving the reliability of hardware technology. This balance can be evaluated by quantitatively modeling the execution of quantum logic operations on a realistic quantum hardware containing limited computational resources. In this work, we report a complete performance simulation software tool capable of (1) searching the hardware design space by varying resource architecture and technology parameters, (2) synthesizing and scheduling fault-tolerant quantum algorithm within the hardware constraints, (3) quantifying the performance metrics such as the execution time and the failure probability of the algorithm, and (4) analyzing the breakdown of these metrics to highlight the performance bottlenecks and visualizing resource utilization to evaluate the adequacy of the chosen design. Using this tool we investigate a vast design space for implementing key building blocks of Shor's algorithm to factor a 1,024-bit number with a baseline budget of 1.5 million qubits. We show that a trapped-ion quantum computer designed with twice as many qubits and one-tenth of the baseline infidelity of the communication channel can factor a 2,048-bit integer in less than five months.Comment: 24 pages, 13 figures and 6 table
    • …
    corecore