LOW POWER COMPRESSOR BASED MAC ARCHITECTURE FOR DSP APPLICATIONS by Parvathi, Yeturu. & Penchalaiah, Siddu.
Yeturu Parvathi* et al.
(IJITR) INTERNATIONAL JOURNAL OF INNOVATIVE TECHNOLOGY AND RESEARCH
Volume No.6, Issue No.4, June - July 2018, 8428-8432.
2320 –5547 @ 2013-2018 http://www.ijitr.com All rights Reserved. Page | 8428
Low Power Compressor Based MAC
Architecture for DSP Applications
YETURU. PARVATHI
Pursuing M.Tech (VLSI&ESD) from SKR College
of Engineering & Technology, Manubolu, SPSR
Nellore.AP.
SIDDU. PENCHALAIAH
M. Tech, Assistant Professor in Deportment of
ECE, SKR College of Engineering & Technology,
Manubolu, SPSR Nellore.AP.
Abstract: This paper shows the low power blower based Multiply-Accumulate (MAC) design for DSP
applications. In VLSI, exceptionally registered math cells including adders and multipliers are the most
plentifully utilized parts. Productive usage of math rationale units, skimming point units and other
devoted utilitarian segments are used in the vast majority of the chip and computerized flag processors
(DSPs). Along these lines in this concise, blower circuit has been outlined for the low power applications
and furthermore the effect of datapath circuits has been illustrated. The proposed low power blower
design was connected to MAC unit and looked at against the regular blower based MAC units and
watched that the proposed engineering has decreased critical measure of spillage control.
1.INTRODUCTION
Since the most recent decade the semiconductor
business has encountered an exponential
development of mix of advanced multi-media
applications into convenient devices. The
significant worry of compact devices is the battery
life, which impacts the genuine - time preparing
applications and their dynamic scope of
information signals for added substance highlights.
It is the high time to investigate the testing criteria
of these rising low power, low region and elite
computerized flag handling chips [1].
In computerized VLSI circuits, calculation is the
basic part and it chooses the power utilization and
working rate of the plans. For calculations number
juggling circuits includes adders and multipliers;
which are the most abundantly utilized parts.
Advanced flag processors performing sifting,
convolution and so forth, depends on the effective
usage of these viper, multiplier and MAC number
juggling units.
As the criticality of multipliers chooses the power
utilization and working rate of the computerized
circuits, there is potential at circuit configuration
level to upgrade the power and defer requirements.
Numerous specialists in the past have created and
exhibited a few models to enhance the
effectiveness of the multipliers. Stall encoders and
its adjustments were created to diminish the
deferral by decreasing number of columns in the
Partial Product Generation stage. Blowers were
used in the halfway item decrease stage to expand
the increase activity speed [3 - 5]. Integral Pass
transistor rationale based adiabatic 8-bit multiplier
is composed in [6] to diminish the deferral and
power utilization of the multiplier engineering.
Vedic sutras were likewise utilized in the multiplier
engineering to build the speed of the MAC designs
[7]. To decrease the defer facilitate in the MAC
designs, the convey spread expansion phase of
multiplier and viper phase of gather is combined
utilizing blowers in this work.
Low power blower design is proposed in this brief
to diminish the power utilization of the MAC
engineering since the nearness of more number of
blowers. The effect of the circuit configuration
level or the datapath improvements is tended to at
the MAC level for DSP applications. In MAC,
furthermore the convey engender expansion
engaged with multiplier and gather stages are
converged to endeavor and increment the quantity
of blowers in the MAC designs. Outlines were
represented in ASIC and FPGA spaces according to
the standard plan technique. Remaining segments
of the paper are sorted out as takes after. Blower
and MAC designs are examined and the restrictions
of existing structures are described in area II.
Results are assessed in segment III and the paper is
closed in segment IV. Last segment gives the
references.
2.ARCHITECTURE
A. Compressor
Compressors are the digital circuits which have the
capability to add five/six/seven bits at a time and
hence called as column compressors. A typical five
input compressor is illustrated in this brief. It takes
4 regular inputs and 1 intermediate carry-in input
and generates 1 sum bit, 1 carry-out bit and another
intermediate carry bit. Intermediate carry bits are
the carry-in and carry-outs (called as horizontal
carry propagation) from previous and to next stage
compressors. Carry-out (also called as vertical
carry) bit is final carry generated along with the
sum bit.
Since compressors forms the basic and critical
components for multipliers and large-input adders,
several compressors architectures were developed
in the past to address several constraints. Some of
the compressor architectures described in the past
are shown in Fig. 1 & Fig. 2 [8, 9].
Yeturu Parvathi* et al.
(IJITR) INTERNATIONAL JOURNAL OF INNOVATIVE TECHNOLOGY AND RESEARCH
Volume No.6, Issue No.4, June - July 2018, 8428-8432.
2320 –5547 @ 2013-2018 http://www.ijitr.com All rights Reserved. Page | 8429
Fig. 1: Full Adder based Compressor [8]
Compressor architecture shown in Fig. 1 is built
using the full-adders. This architecture has only
two cells and will have minimum interconnects but
each of the cell needs to generate the sum and carry
path and one of the path is dependent on the other.
This requires larger drive strength to drive the
chain of compressors and hence the power
consumption will be higher. The higher drive
strength will significantly have the reduced delay.
Fig. 2 shows the compressor architecture built
using lesser fan-in gates. Logic implementation
with lesser fan-in gates leads to more number of
interconnects which has significant impact on
glitch power & delay. In lower technological nodes
the interconnect power is dominant than the gate
power, hence the architecture of [9] leads to high
power consumption.
Fig. 2: David Harris Compressor cell [9]
Fig. 3 shows the proposed compressor architecture.
The proposed compressor architecture is built with
larger fan-in gates and also using separate logics
for sum and carry paths. In the sum path four 2
input XOR cells are replaced by two 3 input XOR
cells and in the carry path two 2 input AND cells &
one 2 input OR cells are replaced by one 6 input
AND-OR (AO222) logic cell. Larger fan-in gates
covers large part of the logics and helps in
minimizing the number of gates required for
implementation. Lesser gates lead to smaller area
and minimum interconnect delays. Thus the
proposed compressor architecture helps in reducing
the power consumption.
Fig. 3: Proposed compressor Cell
Thus the proposed compressor architecture enables
new features like design specific/constraint specific
architectures and allows utilizing for low power
applications. Optimizations provided in the
proposed architectures are,
1. Minimum interconnect in sum-path reduces
the interconnect delay and associated glitches
2. Reduced power consumption with minimum
interconnects
3. Independent carry logic to reduce the
horizontal carry delay
B.Multiply-Accumulate Unit
MAC is the basic and most frequently used
component in DSP to perform filtering,
convolution and etc to accelerate the FIR or FFT
computations [2]. Regularly MAC unit contain
multiplier, adders and registers as shown in Fig. 4,
where the previous output of the MAC unit is
added with the multiplier output and accumulated.
Fig. 4: Regular MAC architecture
Multipliers are implemented in three stages
namely: partial product generation, partial product
reduction and carry propagate addition. Regular
architectures utilize the half and full adders in the
partial product stages, but due to its performance
limitation compressor cells were utilized. Some of
the past architecture’s reduced the number of
reduction steps in the partial product reduction
stage by introducing booth encoding in the partial
product generation stage, to reduce overall delay [3
- 5].
Use of compressors in the multiplier will reduce
the number of gates for implementation which
inturn reduces the number of interconnects. This
results in reduced interconnect delay and glitches
associated with-it, yielding a low power design.
Thus the efficient multiplier will improve the
efficiency the MAC unit.
The use of circuit level design specifically designed
for particular constraint will be more efficient in
ASIC designs. For example the use of proposed
low power compressor architecture improves the
power efficiency and suits for low power
applications. To demonstrate the impact of
compressor architecture a MAC unit architecture
which contains more number of compressors is
chosen from [2].
In [2], author has used the compressors in
multipliers in the partial product reduction and in
Yeturu Parvathi* et al.
(IJITR) INTERNATIONAL JOURNAL OF INNOVATIVE TECHNOLOGY AND RESEARCH
Volume No.6, Issue No.4, June - July 2018, 8428-8432.
2320 –5547 @ 2013-2018 http://www.ijitr.com All rights Reserved. Page | 8430
accumulation stage of the MAC unit, where the
carry propagate stage of the multiplier is merged
with the input of accumulate add stage. Fig. 5
shows the state of the art MAC architecture.
Totally 29 compressors were utilized to implement
the MAC unit of Fig. 5. Other than compressors,
half and full adders were also required to
implement.
Fig. 5: State of the art MAC architecture [2]
Both the conventional and proposed compressor
architectures were applied in the state of the art
MAC architecture, to illustrate the impact of
compressor architectures.
3.RESULTS & DISCUSSIONS
Both the regular and proposed architectures at the
compressor and MAC unit level were designed and
modeled using Verilog HDL. Designs were
functionally verified using Mentor graphics Model-
sim simulator using waveform editor and were
synthesized by targeting to TSMC’s 65nm
technological library node using Cadence RTL
compiler. The designs were also synthesized under
FPGA domain by targeting the virtex 7 device.
Results of the compressor and MAC units were
benchmarked as per the standard design
methodology for both ASIC and FPGA domains.
Table 1: Comparison of the synthesis results of
existing and proposed compressor architectures
Table I shows the results of the regular and
proposed compressor architectures. It can be
observed that the proposed compressor
architecture is more efficient in all the design
parameters against the architecture of [9]. As
mentioned in the architecture section, the large
number of less fan-in gates requires more number
of gates and number of interconnects will be
more, due to which the area required is more and
the dynamic power consumption is also higher.
More number of interconnects and less fan-in
logic gates has increased the delay and power
consumption of the compressor architecture of
[9].
Only two cells in compressor architecture of [8]
reduces the interconnect delay and is reflected as
reduced delay. The dependency of one among the
sum and carry path of full adder requires higher
drive strength to drive the signal faster; resulting
in higher power consumption than the proposed
compressor architecture.
As the proposed compressor architecture utilizes
the larger fan-in gates, its transistor stack will be
higher causes to have higher resistance between
the power supplies and results in reduced leakage
power. Since the proposed architecture generates
the sum and carries simultaneously; it doesn’t
require higher drive strength signal.
Table 2 shows the results of the MAC units with
conventional and proposed compressor
architectures. Similar to Table 1, results at the
MAC level also yielded the efficient results. Here
also a significant amount of power consumption
has been reduced of the proposed MAC unit
having proposed compressor architectures. This
suggests that the proposed architecture designed
specifically towards power constraint has behaved
similar at the cell and at the sub-systems level. It
also proves that the optimizations at the circuit
design level will have impact at the sub-system
level. From these it can be encouraged that the
optimizations at the circuit design level can be
applied to any level of hierarchical abstractions.
Further the proposed architecture can be
generalized for any bit-width and at any level of
abstraction in the design hierarchy.
Table 2: Comparison of the synthesis results of
MAC architectures using existing and
proposed Compressor architectures in ASIC
domain
The circuit level design optimization was also
illustrated in the FPGA design and the synthesis
results are tabulated in Table 3. In FPGA domain
the designs were targeted to Virtex 7 family. It
can be observed from the Table 2 and Table 3 that
designs behave differently in different domains
due to different mapping logics and hence it
suggests that the optimizations should be domain
specific. In ASIC the logics are mapped to
Yeturu Parvathi* et al.
(IJITR) INTERNATIONAL JOURNAL OF INNOVATIVE TECHNOLOGY AND RESEARCH
Volume No.6, Issue No.4, June - July 2018, 8428-8432.
2320 –5547 @ 2013-2018 http://www.ijitr.com All rights Reserved. Page | 8431
standard cells of the libraries and in FPGA
domain the logics are mapped to up tables
(LUTs). Table 3 shows that the proposed
architecture has better results than the existing
architectures in FPGA domain.
Existing compressor architecture of [8] has one
interconnect and three outputs, hence it requires
four LUTs to implement one compressor cell.
Since the logics of proposed compressor
architecture has been implemented parallelly;
(parallel sum and carry logics) interconnect has
been avoided to reduce the LUT requirements to 3
against the 4 numbers of existing compressor
architecture.
Table 3: Comparison of the synthesis results of
MAC architectures using existing and
proposed Compressor architectures in FPGA
domain
As more number of compressors are required in
the MAC architecture, the proposed MAC
architecture requires less number LUTs and it
constitutes to lesser interconnects and resulted in
the reduced delay against the existing MAC
architecture with compressor architecture of [8].
Since the numbers of LUTs are higher in existing
MAC architecture and as per the relation larger the
area; higher will be power consumption, the power
consumption of the existing MAC architecture is
higher than the proposed MAC architecture. More
number of interconnects also contributes to power
consumption. Thus the parallelism in the proposed
architecture has better efficiency than the exiting
architectures. Further improvements can be
obtained by designing as per the FPGA
architectures.
From the results of Table 2 and Table 3, it can be
suggested that the proposed architecture holds
good and true for both ASIC and FPGA domains.
It can also from the above result tables discussions
that the proposed architecture can be generalized
for n-bit MAC and are independent of Number
Representation (Radix, Base) & Bit Width.
Increase in the MAC bit-width, requires more
number of compressors and this optimization
impact will be higher. Approximately the increase
in bit-width size from N-bits to 2N-bits, the
number of compressors would be increased by
approximately 5 times.
4.CONCLUSION
Design and domain specific low power
compressor based MAC architecture has been
demonstrated in this work. The importance of
circuit design level and its impact for DSP
applications is addressed. Use of higher fan-in
gates and its merits are discussed for the low
power applications. The proposed architectures
have yielded better efficiencies in the ASIC and
FPGA domain when modeled in Verilog HDL
and synthesized with Cadence RTL compiler and
Xilinx ISE respectively. Designs were mapped to
TSMC’s 65nm technology node and Virtex 7
FPGA family respectively.
5.REFERENCES
[1]. Chang, Chip-Hong, Jiangmin Gu, and
Mingyan Zhang. "Ultra low-voltage low-
power CMOS 4-2 and 5-2 compressors for
fast arithmetic circuits." Circuits and
Systems I: Regular Papers, IEEE
Transactions on 51.10 (2004): 1985-1997.
[2]. Tung Thanh Hoang; Sjalander, M.;
Larsson-Edefors, P., "A High-Speed,
Energy-Efficient Two-Cycle Multiply-
Accumulate (MAC) Architecture and Its
Application to a Double-Throughput MAC
Unit," Circuits and Systems I: Regular
Papers, IEEE Transactions on , vol.57,
no.12, pp.3073,3081, Dec. 2010.
[3]. Chen Ping-hua; Zhao Juan, "High-speed
Parallel 32×32-b Multiplier Using a Radix-
16 Booth Encoder," Intelligent Information
Technology Application Workshops, 2009.
IITAW '09. Third International Symposium
on , vol., no., pp.406,409, 21-22 Nov. 2009
[4]. Kiwon Choi; Minkyu Song, "Design of a
high performance 32×32-bit multiplier with
a novel sign select Booth encoder," Circuits
and Systems, 2001. ISCAS 2001. The 2001
IEEE International Symposium on , vol.2,
no., pp.701,704 vol. 2, 6-9 May 2001.
[5]. Rajput, R.P.; Swamy, M.N.S., "High Speed
Modified Booth Encoder Multiplier for
Signed and Unsigned Numbers," Computer
Modelling and Simulation (UKSim), 2012
UKSim 14th International Conference on ,
vol., no., pp.649,654, 28-30 March 2012.
[6]. Yangbo Wu; Weijiang Zhang; Jianping Hu,
"Adiabatic 4-2 compressors for low-power
multiplier," Circuits and Systems, 2005.
48th Midwest Symposium on , vol., no.,
pp.1473,1476 Vol. 2, 7-10 Aug. 2005.
[7]. Jaina, D.; Sethi, K.; Panda, R., "Vedic
Mathematics Based Multiply Accumulate
Unit," Computational Intelligence and
Communication Networks (CICN), 2011
International Conference on, vol., no.,
pp.754,757, 7-9 Oct. 2011.
Yeturu Parvathi* et al.
(IJITR) INTERNATIONAL JOURNAL OF INNOVATIVE TECHNOLOGY AND RESEARCH
Volume No.6, Issue No.4, June - July 2018, 8428-8432.
2320 –5547 @ 2013-2018 http://www.ijitr.com All rights Reserved. Page | 8432
[8]. Aliparast, Peiman, Ziaadin D.
Koozehkanani, and Farhad Nazari. "An
Ultra High Speed Digital 4-2 Compressor
in 65-nm CMOS." International Journal of
Computer Theory & Engineering 5.4
(2013).
[9]. N. Weste and David Harris, “CMOS VLSI
Design- A Circuits & System Perspective”,
Pearson Education, 2008.
[10]. ChandraMohan U, “Low Power Area
Efficient Digital Counters”, Proceedings of
the 7th VLSI Design and Test Workshops,
VDAT, August 2003.
[11]. Narendra C P & Ravi K M Kumar,
“Efficient Comparator based Sum of
Absolute Differences Architecture for
Digital Image Processing Applications”,
Foundation of Computer Science, New
York, USA, International Journal of
Computer Applications, 96(4):17-24, June
2014.
AUTHOR’s PROFILE
Yeturu. Parvathi, Pursuing M.Tech
(VLSI&ESD) from SKR College of
Engineering & Technology,
Manubolu, SPSR Nellore.AP.
Siddu. Penchalaiah, M.tech,
Assistant Professor in Deportment of
ECE, SKR College of Engineering &
Technology, Manubolu, SPSR
Nellore.AP.
