12 research outputs found
Hardware division by small integer constants
International audienceThis article studies the design of custom circuits for division by a small positive constant. Such circuits can be useful to specific FPGA and ASIC applications. The first problem studied is the Euclidean division of an unsigned integer by a constant, computing a quotient and a remainder. Several new solutions are proposed and compared against the state of the art. As the proposed solutions use small look-up tables, they match well the hardware resources of an FPGA. The article then studies whether the division by the product of two constants is better implemented as two successive dividers or as one atomic divider. It also considers the case when only a quotient or only a remainder are needed. Finally, it addresses the correct rounding of the division of a floating-point number by a small integer constant. All these solutions, and the previous state of the art, are compared in terms of timing, area, and area-timing product. In general, the relevance domains of the various techniques are very different on FPGA and on ASIC
Geometric augmented product codes
We propose a new simple decomposable code construction technique that generates codes with the full information rate for all of the minimum Hamming distance-4 binary linear block codes of even length greater than or equal to eight. Additionally, some optimal Hamming distance-8 and higher distance codes are obtained with our proposed scheme. A generic trellis structure for the proposed codes was also designed. It is shown that our trellis structures provide lower decoding complexity in comparison to the trellises of some other well-known block codes
Semi- and Fully-Random Access LUTs for Smooth Functions
International audienceLook-Up Table (LUT) implementation of complicated functions often offers lower latency compared to algebraic implementations at the expense of significant area penalty. If the function is smooth, MultiPartite table method (MP) can circumvent the area problem by breaking up the implementation into multiple smaller LUTs. However, even some of these smaller LUTs may be big in high accuracy MP implementations. Lossless LUT compression can be applied to these LUTs to further improve area and even timing in some cases. The state-of-the-art in the literature decomposes the Table of Initial Values (TIV) of MP into a table of pivots and tables of differences from the pivots. Our technique instead places differences of consecutive elements in the difference tables and result in a smaller range of differences that fit in fewer bits. Constraining the difference of consecutive input values, hence semi-random access, allows us to further optimize designs. We also propose variants of our techniques with variable length coding. We implemented Verilog generators of MP for sine and exponential using conventional LUT as well as different versions of the state-of-the-art and our technique. We synthesized the generated designs on FPGA and found that our techniques produce up to 29% improvement in area, 11% improvement in timing, and 26% improvement in area-time product over the state-of-the-art
VLSI-SoC: At the Crossroads of Emerging Trends: 21st IFIP WG 10.5/IEEE International Conference on Very Large Scale Integration, VLSI-SoC 2013, Istanbul, Turkey, October 6–9, 2013
International audienceBook Front Matter of AICT 46
Recommended from our members
PyTorch and CEDR: Enabling Deployment of Machine Learning Models on Heterogeneous Computing Systems
The PyTorch programming interface enables efficient deployment of machine learning models, leveraging the parallelism offered by GPU architectures. In this study, we present the integration of the PyTorch framework with a compiler and runtime ecosystem. Our aim is to demonstrate the ability to deploy PyTorch-based models on FPGA-based SoC platforms, without requiring users to possess prior FPGA-based design experience. The proposed PyTorch model transformation approach expands the range of hardware architectures that PyTorch developers can target, enabling them to take advantage of the energy-efficient execution provided by heterogeneous computing systems. Our experiments involve compiling and executing real-life applications on heterogeneous SoC configurations emulated on the Xilinx Zynq Ultrascale+ ZCU102 system. We showcase our ability to deploy three distinct PyTorch applications, encompassing object detection, visual geometry group (VGG), and speech classification, using the integrated compiler and runtime system without loss of model accuracy. Furthermore, we extend our analysis by evaluating dynamically arriving workload scenarios, consisting of a mix of PyTorch models and non-PyTorch-based applications. Through these experiments, we vary the hardware composition and scheduling heuristics. Our findings indicate that when PyTorch-based applications coexist with unrelated applications, our integrated scheduler fairly dispatches tasks to the FPGA platform's accelerator and CPU cores, without compromising the target throughput for each application.Defense Advanced Research Projects AgencyImmediate accessThis item from the UA Faculty Publications collection is made available by the University of Arizona with support from the University of Arizona Libraries. If you have questions, please contact us at [email protected]
VLSI-SoC: Opportunities and Challenges Beyond the Internet of Things: 25th IFIP WG 10.5/IEEE International Conference on Very Large Scale Integration, VLSI-SoC 2017, Abu Dhabi, United Arab Emirates, October 23–25, 2017, Revised and Extended Selected Papers
International audienceBook Front Matter of AICT 50