5,176 research outputs found
Reconfigurable Hardware Accelerators: Opportunities, Trends, and Challenges
With the emerging big data applications of Machine Learning, Speech
Recognition, Artificial Intelligence, and DNA Sequencing in recent years,
computer architecture research communities are facing the explosive scale of
various data explosion. To achieve high efficiency of data-intensive computing,
studies of heterogeneous accelerators which focus on latest applications, have
become a hot issue in computer architecture domain. At present, the
implementation of heterogeneous accelerators mainly relies on heterogeneous
computing units such as Application-specific Integrated Circuit (ASIC),
Graphics Processing Unit (GPU), and Field Programmable Gate Array (FPGA). Among
the typical heterogeneous architectures above, FPGA-based reconfigurable
accelerators have two merits as follows: First, FPGA architecture contains a
large number of reconfigurable circuits, which satisfy requirements of high
performance and low power consumption when specific applications are running.
Second, the reconfigurable architectures of employing FPGA performs prototype
systems rapidly and features excellent customizability and reconfigurability.
Nowadays, in top-tier conferences of computer architecture, emerging a batch of
accelerating works based on FPGA or other reconfigurable architectures. To
better review the related work of reconfigurable computing accelerators
recently, this survey reserves latest high-level research products of
reconfigurable accelerator architectures and algorithm applications as the
basis. In this survey, we compare hot research issues and concern domains,
furthermore, analyze and illuminate advantages, disadvantages, and challenges
of reconfigurable accelerators. In the end, we prospect the development
tendency of accelerator architectures in the future, hoping to provide a
reference for computer architecture researchers
Study on the Availability Prediction of the Reconfigurable Networked Software System
This paper describes multi-agent based availability prediction approach for
the reconfigurable networked software system
High Level Hardware/Software Embedded System Design with Redsharc
As tools for designing multiple processor systems-on-chips (MPSoCs) continue
to evolve to meet the demands of developers, there exist systematic gaps that
must be bridged to provide a more cohesive hardware/software development
environment. We present Redsharc to address these problems and enable: system
generation, software/hardware compilation and synthesis, run-time control and
execution of MPSoCs. The efforts presented in this paper extend our previous
work to provide a rich API, build infrastructure, and runtime enabling
developers to design a system of simultaneously executing kernels in software
or hardware, that communicate seamlessly. In this work we take Redsharc further
to support a broader class of applications across a larger number of devices
requiring a more unified system development environment and build
infrastructure. To accomplish this we leverage existing tools and extend
Redsharc with build and control infrastructure to relieve the burden of system
development allowing software programmers to focus their efforts on application
and kernel development.Comment: Presented at First International Workshop on FPGAs for Software
Programmers (FSP 2014) (arXiv:1408.4423
Timing verification of dynamically reconfigurable logic for Xilinx Virtex FPGA series
This paper reports on a method for extending existing VHDL design and verification software available for the Xilinx Virtex series of FPGAs. It allows the designer to apply standard hardware design and verification tools to the design of dynamically reconfigurable logic (DRL). The technique involves the conversion of a dynamic design into multiple static designs, suitable for input to standard synthesis and APR tools. For timing and functional verification after APR, the sections of the design can then be recombined into a single dynamic system. The technique has been automated by extending an existing DRL design tool named DCSTech, which is part of the Dynamic Circuit Switching (DCS) CAD framework. The principles behind the tools are generic and should be readily extensible to other architectures and CAD toolsets. Implementation of the dynamic system involves the production of partial configuration bitstreams to load sections of circuitry. The process of creating such bitstreams, the final stage of our design flow, is summarized
The future of computing beyond Moore's Law.
Moore's Law is a techno-economic model that has enabled the information technology industry to double the performance and functionality of digital electronics roughly every 2 years within a fixed cost, power and area. Advances in silicon lithography have enabled this exponential miniaturization of electronics, but, as transistors reach atomic scale and fabrication costs continue to rise, the classical technological driver that has underpinned Moore's Law for 50 years is failing and is anticipated to flatten by 2025. This article provides an updated view of what a post-exascale system will look like and the challenges ahead, based on our most recent understanding of technology roadmaps. It also discusses the tapering of historical improvements, and how it affects options available to continue scaling of successors to the first exascale machine. Lastly, this article covers the many different opportunities and strategies available to continue computing performance improvements in the absence of historical technology drivers. This article is part of a discussion meeting issue 'Numerical algorithms for high-performance computational science'
Prototyping scalable digital signal processing systems for radio astronomy using dataflow models
There is a growing trend toward using high-level tools for design and
implementation of radio astronomy digital signal processing (DSP) systems. Such
tools, for example, those from the Collaboration for Astronomy Signal
Processing and Electronics Research (CASPER), are usually platform-specific,
and lack high-level, platform-independent, portable, scalable application
specifications. This limits the designer's ability to experiment with designs
at a high-level of abstraction and early in the development cycle.
We address some of these issues using a model-based design approach employing
dataflow models. We demonstrate this approach by applying it to the design of a
tunable digital downconverter (TDD) used for narrow-bandwidth spectroscopy. Our
design is targeted toward an FPGA platform, called the Interconnect Break-out
Board (IBOB), that is available from the CASPER. We use the term TDD to refer
to a digital downconverter for which the decmation factor and center frequency
can be reconfigured without the need for regenerating the hardware code. Such a
design is currently not available in the CASPER DSP library.
The work presented in this paper focuses on two aspects. Firstly, we
introduce and demonstrate a dataflow-based design approach using the dataflow
interchange format (DIF) tool for high-level application specification, and we
integrate this approach with the CASPER tool flow. Secondly, we explore the
trade-off between the flexibility of TDD designs and the low hardware cost of
fixed-configuration digital downconverter (FDD) designs that use the available
CASPER DSP library. We further explore this trade-off in the context of a
two-stage downconversion scheme employing a combination of TDD or FDD designs.Comment: Accepted for publication in Radio Scienc
NeuroFlow: A General Purpose Spiking Neural Network Simulation Platform using Customizable Processors
© 2016 Cheung, Schultz and Luk.NeuroFlow is a scalable spiking neural network simulation platform for off-the-shelf high performance computing systems using customizable hardware processors such as Field-Programmable Gate Arrays (FPGAs). Unlike multi-core processors and application-specific integrated circuits, the processor architecture of NeuroFlow can be redesigned and reconfigured to suit a particular simulation to deliver optimized performance, such as the degree of parallelism to employ. The compilation process supports using PyNN, a simulator-independent neural network description language, to configure the processor. NeuroFlow supports a number of commonly used current or conductance based neuronal models such as integrate-and-fire and Izhikevich models, and the spike-timing-dependent plasticity (STDP) rule for learning. A 6-FPGA system can simulate a network of up to ~600,000 neurons and can achieve a real-time performance of 400,000 neurons. Using one FPGA, NeuroFlow delivers a speedup of up to 33.6 times the speed of an 8-core processor, or 2.83 times the speed of GPU-based platforms. With high flexibility and throughput, NeuroFlow provides a viable environment for large-scale neural network simulation
Designing a Million-Qubit Quantum Computer Using Resource Performance Simulator
The optimal design of a fault-tolerant quantum computer involves finding an
appropriate balance between the burden of large-scale integration of noisy
components and the load of improving the reliability of hardware technology.
This balance can be evaluated by quantitatively modeling the execution of
quantum logic operations on a realistic quantum hardware containing limited
computational resources. In this work, we report a complete performance
simulation software tool capable of (1) searching the hardware design space by
varying resource architecture and technology parameters, (2) synthesizing and
scheduling fault-tolerant quantum algorithm within the hardware constraints,
(3) quantifying the performance metrics such as the execution time and the
failure probability of the algorithm, and (4) analyzing the breakdown of these
metrics to highlight the performance bottlenecks and visualizing resource
utilization to evaluate the adequacy of the chosen design. Using this tool we
investigate a vast design space for implementing key building blocks of Shor's
algorithm to factor a 1,024-bit number with a baseline budget of 1.5 million
qubits. We show that a trapped-ion quantum computer designed with twice as many
qubits and one-tenth of the baseline infidelity of the communication channel
can factor a 2,048-bit integer in less than five months.Comment: 24 pages, 13 figures and 6 table
- …