661 research outputs found
A Synthesizable single-cycle multiply-accumulator
The multiplication and multiply-accumulate operations are expensive to implement in hardware for Digital Signal Processing, video, and graphics applications. A standard multiply-accumulator has three inputs and a single output that is equal to the product of two of its inputs added to the third input. For some applications it is desirable for a multiply-accumulator to have two outputs; one output that is the product of the first two inputs, and a second output that is the multiply-accumulate result. The goal of this thesis is to investigate algorithms and architectures used to design multipliers and multiply-accumulators, and to create a multiply-accumulator that computes both outputs in a single clock cycle. Often times in high speed designs the most time-consuming operations are pipelined to meet the system timing requirements. If the multiply-accumulate computation can be reduced to a single-cycle operation the overall processor performance can be improved for many applications. A multiply-accumulator with two outputs can be created using a combination of standard multiply, add, or multiply-accumulate components. Using these components, a multiplier and a multiply-accumulator can be used to produce the outputs in the most time-efficient manner. A multiplier and an adder will result in a smaller design with a larger worst-case delay. Therefore, the goal is to create a multiply-accumulator that is comparable in speed, but requires less area than a design using an industry standard multiplier and multiply-accumulator
Recommended from our members
Total delay optimization for column reduction multipliers considering non-uniform arrival times to the final adder
textColumn Reduction Multiplier techniques provide the fastest multiplier designs and involve three steps. First, a partial product array of terms is formed by logically ANDing each bit of the multiplier with each bit of the multiplicand. Second, adders or counters are used to reduce the number of terms in each bit column to a final two. This activity is commonly described as column reduction and occurs in multiple stages. Finally, some form of carry propagate adder (CPA) is applied to the final two terms in order to sum them to produce the final product of the multiplication. Since forming the partial products, in the first step, is simply forming an array of the logical AND's of two bits, there is little opportunity for delay improvement for the first step. There has been much work done in optimizing the reduction stages for column multipliers in the second reduction step. All of the reduction approaches of the second step result in non-uniform arrival times to the input of the final carry propagate adder in the final step. The designs for carry propagate adders have been done assuming that the input bits all have the same arrival time. It is not evident that the non-uniform arrival times from the columns impacts the performance of the multiplier. A thorough analysis of the several column reduction methods and the impact of carry propagate adder designs, along with the column reduction design step, to provide the fastest possible final results, for an array of multiplier widths has not been undertaken. This dissertation investigates the design impact of three carry propagate adders, with different performance attributes, on the final delay results for four column reduction multipliers and suggests general ways to optimize the total delay for the multipliers.Electrical and Computer Engineerin
- âŠ