322 research outputs found
Design and analysis of multiple read port techniques using bank division with XOR method for multi-ported-memory on FPGA platform
The multiple read and write operations are performed simultaneously by multi-ported memories and are used in advanced digital design applications on reprogrammable field-programmable gate arrays (FPGAs) to achieve higher bandwidth. The Memory modules are configured by block RAM (BRAMs), which utilizes more area and power on FPGA. In this manuscript, the techniques to increase the read ports for multi-ported memory modules are designed using the bank division with XOR (BDX) approach. The read port techniques like two read-one write (2R1W) memory, hybrid mode approach either 2R1W or 4R memory, and hierarchical BDX (HBDX) Approach using 2R1W/4R memory are designed on FPGA platform. The Proposed work utilizes only slices and look-up table (LUT's) rather than BRAMs while designing the memory modules on FPGA, which reduces the computational complexity and improves the system performance. The experimental results are analyzed on Artix-7 FPGA. The performance parameters like slices, LUT utilization, maximum frequency (Fmax), and hardware efficiency are analyzed by concerning different memory depths. The 4R1W memory design using the HBDX approach utilizes 4% slices and works at 449.697 MHz operating frequency on Artix-7 FPGA. The proposed work provides a better platform to choose the proper read port technique to design an efficient modular multiport memory architecture
A Lyra2 FPGA Core for Lyra2REv2-Based Cryptocurrencies
Lyra2REv2 is a hashing algorithm that consists of a chain of individual
hashing algorithms and it is used as a proof-of-work function in several
cryptocurrencies that aim to be ASIC-resistant. The most crucial hashing
algorithm in the Lyra2REv2 chain is a specific instance of the general Lyra2
algorithm. In this work we present the first FPGA implementation of the
aforementioned instance of Lyra2 and we explain how several properties of the
algorithm can be exploited in order to optimize the design.Comment: 5 pages, to be presented at the IEEE International Symposium on
Circuits and Systems (ISCAS) 201
Design Space Exploration of Algorithmic Multi-Port Memories in High-Performance Application-Specific Accelerators
Memory load/store instructions consume an important part in execution time
and energy consumption in domain-specific accelerators. For designing highly
parallel systems, available parallelism at each granularity is extracted from
the workloads. The maximal use of parallelism at each granularity in these
high-performance designs requires the utilization of multi-port memories.
Currently, true multiport designs are less popular because there is no inherent
EDA support for multiport memory beyond 2-ports, utilizing more ports requires
circuit-level implementation and hence a high design time. In this work, we
present a framework for Design Space Exploration of Algorithmic Multi-Port
Memories (AMM) in ASICs. We study different AMM designs in the literature,
discuss how we incorporate them in the Pre-RTL Aladdin Framework with different
memory depth, port configurations and banking structures. From our analysis on
selected applications from the MachSuite (accelerator benchmark suite), we
understand and quantify the potential use of AMMs (as true multiport memories)
for high performance in applications with low spatial locality in memory access
patterns
A Scalable Unsegmented Multiport Memory for FPGA-Based Systems
On-chip multiport memory cores are crucial primitives for many modern high-performance reconfigurable architectures and multicore systems. Previous approaches for scaling memory cores come at the cost of operating frequency, communication overhead, and logic resources without increasing the storage capacity of the memory. In this paper, we present two approaches for designing multiport memory cores that are suitable for reconfigurable accelerators with substantial on-chip memory or complex communication. Our design approaches tackle these challenges by banking RAM blocks and utilizing interconnect networks which allows scaling without sacrificing logic resources. With banking, memory congestion is unavoidable and we evaluate our multiport memory cores under different memory access patterns to gain insights about different design trade-offs. We demonstrate our implementation with up to 256 memory ports using a Xilinx Virtex-7 FPGA. Our experimental results report high throughput memories with resource usage that scales with the number of ports
A Standalone FPGA-based Miner for Lyra2REv2 Cryptocurrencies
Lyra2REv2 is a hashing algorithm that consists of a chain of individual
hashing algorithms, and it is used as a proof-of-work function in several
cryptocurrencies. The most crucial and exotic hashing algorithm in the
Lyra2REv2 chain is a specific instance of the general Lyra2 algorithm. This
work presents the first hardware implementation of the specific instance of
Lyra2 that is used in Lyra2REv2. Several properties of the aforementioned
algorithm are exploited in order to optimize the design. In addition, an
FPGA-based hardware implementation of a standalone miner for Lyra2REv2 on a
Xilinx Multi-Processor System on Chip is presented. The proposed Lyra2REv2
miner is shown to be significantly more energy efficient than both a GPU and a
commercially available FPGA-based miner. Finally, we also explain how the
simplified Lyra2 and Lyra2REv2 architectures can be modified with minimal
effort to also support the recent Lyra2REv3 chained hashing algorithm.Comment: 13 pages, accepted for publication in IEEE Trans. Circuits Syst. I.
arXiv admin note: substantial text overlap with arXiv:1807.0576
Design of an FPGA-based parallel SIMD machine for power flow analysis
Power flow analysis consists of computationally intensive calculations on large matrices, consumes several hours of computational time, and has shown the need for the implementation of application-specific parallel machines. The potential of Single-Instruction stream Multiple-Data stream (SIMD) parallel architectures for efficient operations on large matrices has been demonstrated as seen in the case of many existing supercomputers. The unsuitability of existing parallel machines for low-cost power system applications, their long design cycles, and the difficulty in using them show the need for application-specific SIMI) machines. Advances in VLSI technology and Field-Programmable Gate-Arrays (FPGAs) enable the implementation of Custom Computing Machines (CCMs) which can yield better performance for specific applications. The advent of SoftCore processors made it possible to integrate reconfigurable logic as a slave to a peripheral bus and has demonstrated the ability in the rapid prototyping of complete systems on programmable chips. This thesis aims at designing and implementing an FPGA-based SIMI) machine for power flow analysis. It presents the architecture of an SIMI) machine that consists of an array of processing elements with mesh interconnection and a Soft-Core processor; the latter is used as the host. The FPGAbased SIMI) machine is implemented on the Annapolis Microsystems Wildstar-II board that contains multiple Virtex-II FPGAs. The Soft-Core processor used is the Xilinx Microblaze and the application targeted is matrix multiplication
Minimising DSP block usage through multi-pumping
Resource sharing in the mapping of an algorithm to an architecture allows the same resource to be scheduled for different uses in different cycles, generally at the cost of increased schedule length. Multi-pumping is a method whereby a resource is clocked at a frequency that is a multiple of the surrounding circuit, thereby offering multiple executions per global clock, and therefore sharing in the same clock cycle. This concept maps well to FPGA architectures, where hard macro blocks are typically capable of running at higher frequencies than standard logic. While this technique has been demonstrated for multipliers, modern DSP blocks are more complex with multiple computational nodes. In this paper, we apply multi-pumping to minimise DSP block usage, while taking advantage of the multiple nodes they support. The proposed approach uses, on average, 39% fewer DSP blocks, at a cost of 19% more LUTs and 7% more registers
ASC: A stream compiler for computing with FPGAs
Published versio
Soft Error Resistant Design of the AES Cipher Using SRAM-based FPGA
This thesis presents a new architecture for the reliable implementation of the symmetric-key algorithm Advanced Encryption Standard (AES) in Field Programmable Gate Arrays (FPGAs). Since FPGAs are prone to soft errors caused by radiation, and AES is highly sensitive to errors, reliable architectures are of significant concern. Energetic particles hitting a device can flip bits in FPGA SRAM cells controlling all aspects of the implementation. Unlike previous research, heterogeneous error detection techniques based on properties of the circuit and functionality are used to provide adequate reliability at the lowest possible cost. The use of dual ported block memory for SubBytes, duplication for the control circuitry, and a new enhanced parity technique for MixColumns is proposed. Previous parity techniques cover single errors in datapath registers, however, soft errors can occur in the control circuitry as well as in SRAM cells forming the combinational logic and routing. In this research, propagation of single errors is investigated in the routed netlist. Weaknesses of the previous parity techniques are identified. Architectural redesign at the register-transfer level is introduced to resolve undetected single errors in both the routing and the combinational logic.
Reliability of the AES implementation is not only a critical issue in large scale FPGA-based systems but also at both higher altitudes and in space applications where there are a larger number of energetic particles. Thus, this research is important for providing efficient soft error resistant design in many current and future secure applications
- âŠ