434 research outputs found
Maximizing resource utilization by slicing of superscalar architecture
Superscalar architectural techniques increase instruction throughput from one instruction per cycle to more than one instruction per cycle. Modern processors make use of several processing resources to achieve this kind of throughput. Control units perform various functions to minimize stalls and to ensure a continuous feed of instructions to execution units. It is vital to ensure that instructions ready for execution do not encounter a bottleneck in the execution stage; This thesis work proposes a dynamic scheme to increase efficiency of execution stage by a methodology called block slicing. Implementing this concept in a wide, superscalar pipelined architecture introduces minimal additional hardware and delay in the pipeline. The hardware required for the implementation of the proposed scheme is designed and assessed in terms of cost and delay. Performance measures of speed-up, throughput and efficiency have been evaluated for the resulting pipeline and analyzed
Recommended from our members
Microarchitecture optimization for timing and layout
In recent years the drive to produce more complex integrated circuits while spending less design time has driven the demand for design automation tools. The search for design automation methods has resulted in the design of numerous behavioral synthesis and logic synthesis tools. This report describes a system that fills the gap between traditional behavioral synthesis and logic synthesis tools. Techniques are introduced for improving the microarchitecture structure and using feedback from lower-level optimization tools to guide design optimizations while attempting to meet user specified area and time constraints. These techniques include the capability for mixing layout styles such as custom layout for random-logic components and bit-slicing for regularly structured components. In this manner the entire design, control logic and datapath, can be optimized at the same time. Further, this paper presents a new methodology for microarchitecture-level optimization that greatly reduces the amount of technology-specific knowledge necessary to perform the optimizations
Recommended from our members
Synthesis from VHDL : Rockwell-counter case study
This report describes the design process and synthesis tools used in the UC Irvine CADLAB design environment to design a representative benchmark. The steps taken and rationale used in each stage of the design process are discussed. The benchmark is initially described using a VHDL behavioral description; results produced by each intermediate tool are presented, showing the system flow and integration of tools. The final silicon layout is performed in 3 micron CMOS technology
A C++-embedded Domain-Specific Language for programming the MORA soft processor array
MORA is a novel platform for high-level FPGA programming of streaming vector and matrix operations, aimed at multimedia applications. It consists of soft array of pipelined low-complexity SIMD processors-in-memory (PIM). We present a Domain-Specific Language (DSL) for high-level programming of the MORA soft processor array. The DSL is embedded in C++, providing designers with a familiar language framework and the ability to compile designs using a standard compiler for functional testing before generating the FPGA bitstream using the MORA toolchain. The paper discusses the MORA-C++ DSL and the compilation route into the assembly for the MORA machine and provides examples to illustrate the programming model and performance
High Speed Low Power Cyclic Redundancy Check-32 using FPGA
Cyclic Redundancy Check (CRC) is a method used for error detection technique and data integrity. CRC take a block of a messageās bits and divide it by a binary number called polynomial, the result of this division is the checksum that will be added to the message. On the receiver side, the same division will be performed to get the remainder which could be compared with the transmitted checksum if there are no differences that are mean there are no errors. This paper aims to design CRC32 that applied in the Ethernet frame by using Field Programmable Gate Array (FPGA) Virtex-7. Lookup tables and slicing-by-16 algorithm are used together to calculate the CRC32 in parallel. Xilinx ISE used as IDE and synthesis tool and I-Sim used for simulation purposes. The result of this design is 1.250 ns which is the processing time and 102.4 Gbps which is the throughput, furthermore the power consumption is very low as well as the device utilization
Efficient Fault Injection based on Dynamic HDL Slicing Technique
This work proposes a fault injection methodology where Hardware Description
Language (HDL) code slicing is exploited to prune fault injection locations,
thus enabling more efficient campaigns for safety mechanisms evaluation. In
particular, the dynamic HDL slicing technique provides for a highly collapsed
critical fault list and allows avoiding injections at redundant locations or
time-steps. Experimental results show that the proposed methodology integrated
into commercial tool flow doubles the simulation speed when comparing to the
state-of-the-art industrial-grade EDA tool flows.Comment: arXiv admin note: substantial text overlap with arXiv:2001.0998
- ā¦