5,211 research outputs found
OPTIMAL AREA AND PERFORMANCE MAPPING OF K-LUT BASED FPGAS
FPGA circuits are increasingly used in many fields: for rapid prototyping of new products (including fast ASIC implementation), for logic emulation, for producing a small number of a device, or if a device should be reconfigurable in use (reconfigurable computing). Determining if an arbitrary, given wide, function can be implemented by a programmable logic block, unfortunately, it is generally, a very difficult problem. This problem is called the Boolean matching problem. This paper introduces a new implemented algorithm able to map, both for area and performance, combinational networks using k-LUT based FPGAs.k-LUT based FPGAs, combinational circuits, performance-driven mapping.
Empowering parallel computing with field programmable gate arrays
After more than 30 years, reconfigurable computing has grown from a concept to a mature field of science and technology. The cornerstone of this evolution is the field programmable gate array, a building block enabling the configuration of a custom hardware architecture. The departure from static von Neumannlike architectures opens the way to eliminate the instruction overhead and to optimize the execution speed and power consumption. FPGAs now live in a growing ecosystem of development tools, enabling software programmers to map algorithms directly onto hardware. Applications abound in many directions, including data centers, IoT, AI, image processing and space exploration. The increasing success of FPGAs is largely due to an improved toolchain with solid high-level synthesis support as well as a better integration with processor and memory systems. On the other hand, long compile times and complex design exploration remain areas for improvement. In this paper we address the evolution of FPGAs towards advanced multi-functional accelerators, discuss different programming models and their HLS language implementations, as well as high-performance tuning of FPGAs integrated into a heterogeneous platform. We pinpoint fallacies and pitfalls, and identify opportunities for language enhancements and architectural refinements
Transformations of High-Level Synthesis Codes for High-Performance Computing
Specialized hardware architectures promise a major step in performance and
energy efficiency over the traditional load/store devices currently employed in
large scale computing systems. The adoption of high-level synthesis (HLS) from
languages such as C/C++ and OpenCL has greatly increased programmer
productivity when designing for such platforms. While this has enabled a wider
audience to target specialized hardware, the optimization principles known from
traditional software design are no longer sufficient to implement
high-performance codes. Fast and efficient codes for reconfigurable platforms
are thus still challenging to design. To alleviate this, we present a set of
optimizing transformations for HLS, targeting scalable and efficient
architectures for high-performance computing (HPC) applications. Our work
provides a toolbox for developers, where we systematically identify classes of
transformations, the characteristics of their effect on the HLS code and the
resulting hardware (e.g., increases data reuse or resource consumption), and
the objectives that each transformation can target (e.g., resolve interface
contention, or increase parallelism). We show how these can be used to
efficiently exploit pipelining, on-chip distributed fast memory, and on-chip
streaming dataflow, allowing for massively parallel architectures. To quantify
the effect of our transformations, we use them to optimize a set of
throughput-oriented FPGA kernels, demonstrating that our enhancements are
sufficient to scale up parallelism within the hardware constraints. With the
transformations covered, we hope to establish a common framework for
performance engineers, compiler developers, and hardware developers, to tap
into the performance potential offered by specialized hardware architectures
using HLS
Watermarking FPGA Bitfile for Intellectual Property Protection
Intellectual property protection (IPP) of hardware designs is the most important requirement for many Field Programmable Gate Array (FPGA) intellectual property (IP) vendors. Digital watermarking has become an innovative technology for IPP in recent years. Existing watermarking techniques have successfully embedded watermark into IP cores. However, many of these techniques share two specific weaknesses: 1) They have extra overhead, and are likely to degrade performance of design; 2) vulnerability to removing attacks. We propose a novel watermarking technique to watermark FPGA bitfile for addressing these weaknesses. Experimental results and analysis show that the proposed technique incurs zero overhead and it is robust against removing attacks
Technology Mapping for Circuit Optimization Using Content-Addressable Memory
The growing complexity of Field Programmable Gate Arrays (FPGA's) is leading to architectures with high input cardinality look-up tables (LUT's). This thesis describes a methodology for area-minimizing technology mapping for combinational logic, specifically designed for such FPGA architectures. This methodology, called LURU, leverages the parallel search capabilities of Content-Addressable Memories (CAM's) to outperform traditional mapping algorithms in both execution time and quality of results. The LURU algorithm is fundamentally different from other techniques for technology mapping in that LURU uses textual string representations of circuit topology in order to efficiently store and search for circuit patterns in a CAM. A circuit is mapped to the target LUT technology using both exact and inexact string matching techniques. Common subcircuit expressions (CSE's) are also identified and used for architectural optimization---a small set of CSE's is shown to effectively cover an average of 96% of the test circuits. LURU was tested with the ISCAS'85 suite of combinational benchmark circuits and compared with the mapping algorithms FlowMap and CutMap. The area reduction shown by LURU is, on average, 20% better compared to FlowMap and CutMap. The asymptotic runtime complexity of LURU is shown to be better than that of both FlowMap and CutMap
CMOL: Second Life for Silicon?
This report is a brief review of the recent work on architectures for the
prospective hybrid CMOS/nanowire/ nanodevice ("CMOL") circuits including
digital memories, reconfigurable Boolean-logic circuits, and mixed-signal
neuromorphic networks. The basic idea of CMOL circuits is to combine the
advantages of CMOS technology (including its flexibility and high fabrication
yield) with the extremely high potential density of molecular-scale
two-terminal nanodevices. Relatively large critical dimensions of CMOS
components and the "bottom-up" approach to nanodevice fabrication may keep CMOL
fabrication costs at affordable level. At the same time, the density of active
devices in CMOL circuits may be as high as 1012 cm2 and that they may provide
an unparalleled information processing performance, up to 1020 operations per
cm2 per second, at manageable power consumption.Comment: Submitted on behalf of TIMA Editions
(http://irevues.inist.fr/tima-editions
An Intermediate Language and Estimator for Automated Design Space Exploration on FPGAs
We present the TyTra-IR, a new intermediate language intended as a
compilation target for high-level language compilers and a front-end for HDL
code generators. We develop the requirements of this new language based on the
design-space of FPGAs that it should be able to express and the
estimation-space in which each configuration from the design-space should be
mappable in an automated design flow. We use a simple kernel to illustrate
multiple configurations using the semantics of TyTra-IR. The key novelty of
this work is the cost model for resource-costs and throughput for different
configurations of interest for a particular kernel. Through the realistic
example of a Successive Over-Relaxation kernel implemented both in TyTra-IR and
HDL, we demonstrate both the expressiveness of the IR and the accuracy of our
cost model.Comment: Pre-print and extended version of poster paper accepted at
international symposium on Highly Efficient Accelerators and Reconfigurable
Technologies (HEART2015) Boston, MA, USA, June 1-2, 201
- …