International Journal of Electronics and Electical Engineering
Volume 1

Issue 2

Article 2

October 2012

Design of High Performance Modified Wave pipelined DAA Filter
with Critical Path Approach
Charanjit Singh
VLSI- ES Division, Centre for Development of Advanced Computing, Mohali, India,
charanjitscientist00@gmail.com

Balwinder Singh
VLSI- ES Division, Centre for Development of Advanced Computing, Mohali, India,
balwinder.singh@gmail.com

Follow this and additional works at: https://www.interscience.in/ijeee
Part of the Power and Energy Commons

Recommended Citation
Singh, Charanjit and Singh, Balwinder (2012) "Design of High Performance Modified Wave pipelined DAA
Filter with Critical Path Approach," International Journal of Electronics and Electical Engineering: Vol. 1 :
Iss. 2 , Article 2.
DOI: 10.47893/IJEEE.2012.1016
Available at: https://www.interscience.in/ijeee/vol1/iss2/2

This Article is brought to you for free and open access by the Interscience Journals at Interscience Research
Network. It has been accepted for inclusion in International Journal of Electronics and Electical Engineering by an
authorized editor of Interscience Research Network. For more information, please contact
sritampatnaik@gmail.com.

Design of High Performance Modified Wave pipelined DAA Filter with Critical Path Approach

Design of High Performance Modified Wave pipelined DAA Filter with Critical Path Approach
1

Charanjit Singh, 2Balwinder Singh
VLSI- ES Division
Centre for Development of Advanced Computing
Mohali, India
Email:charanjitscientist00@gmail.com
I. INTRODUCTION
Abstract— In this paper, a new high speed control
circuit is proposed which will act as a critical path for the
data which will go from input to output to improve the
performance of wave pipelining circuits The wave
pipelining is a method of high performance circuit designs
which implements pipelining in logic without the use of
intermediate registers. Wave pipelining has been widely
used in the past few years with a great deal of significant
features in technology and applications. It has the ability to
improve speed, efficiency, economy in every aspect which
it presents. Wave pipelining is being used in wide range of
applications like digital filters, network routers,
multipliers, fast convolvers, MODEMs, image processing,
control systems, radars and many others. In previous
work, the operating speed of the wave-pipelined circuit can
be increased by the following three tasks: adjustment of
the clock period, clock skew and equalization of path
delays. The path-delay equalization task can be done
theoretically, but the real challenge is to accomplish it in
the presence of various different delays. So, the main
objective of this paper is to solve the path delay
equalization problem by inserting the control circuit in
wave pipelined based circuit which will act as critical path
for the data that moves from input to output. The
proposed technique is evaluated for DSP applications by
designing 4- tap FIR filter using Distributed arithmetic
algorithm (DAA). Then comparison of this design is done
with 4-tap FIR filter designs using conventional pipelining
and non pipelining. The synthesis and simulation results
based on Xilinx ISE Navigator 12.3 shows that wave
pipelined DAA based filter is faster by a factor of 1.43
compared to non pipelined one and the conventional
pipelined filter is faster than non pipelined by factor of
1.61 but at the cost of increased logic utilization by 200 %.
So, the wave-pipelined DA filters designed with the
proposed control circuit can operate at higher frequency
than that of non-pipelined but less than that of pipelined.
The gain in speed in pipelined compared to that of wavepipelined is at the cost of increased area and more
dissipated power. When latency is considered, wavepipelined design filters with the proposed scheme are
having the lowest latency among three schemes designed.
Keywords—control circuit, DAA, wave pipelining, FIR,
critical path

The complexity in digital circuits is growing day by
day, reducing delay and a proper clocking methodology is
very important to maintain the overall system performance.
The designers have always been trying to reduce the total
delay of a circuit to make the design faster. But the modern
CMOS technology, where time is required to transfer some
logic from input to output of a gate is typically less than 1 ns,
while the overall system clock period remains greater than 10
ns. This implies 10% logic utilization, i.e. at any particular
instance of time, 90% of the logic gates become idle. So, it can
be maximized by utilizing this idle time and can be achieved
by using wave pipelining technique.
In case of ordinary pipeline system, there is one wave of
data between register stages. When a new set of data has been
clocked into one set of register, the values are propagated to
the next stage of register before the first set of data has been
clocked again. By this technique the speed will be improved
but at the cost of increased number of registers, area, latency,
power.
But in case of Wave pipelining (WP) system; multiple
waves of data are propagated between storage elements. It was
first introduced by cotton who named it maximum rate
pipelining. He observed that the rate at which logic can
propagate through a circuit depend not only on the longest
path delay but also on the difference between the longest and
shortest path delay. As a result, several computational
“waves”, i.e., logic signals related to different clock cycles,
can propagate through the logic simultaneously. The operating
speed can be increased in wave pipelining technique by
adjustment of clock period, adjustment of clock skew and
equalization of path delay. In this paper a new high speed
control circuit is introduced which will act as critical path.
II.OVERVIEW
Wave Pipelining is a combinational logic circuit design
technique which is implementing pipelining without the use of
storage or sequential elements. Traditional pipelining is done
by inserting flip flops between different intermediate stages of
the circuit, which are all clocked by a common clock signal.
The maximum frequency of operation of these designs is

International Journal of Electrical and Electronics Engineering (IJEEE), ISSN (PRINT): 2231 – 5284, Volume-I, Issue-2
78

determined by the worst case delay path and the setup & hold
time requirements of the flip flops. This can prove to be a
bottleneck to the maximum frequency of operation of the
circuit.

and equalization of path delays are the three tasks required. All
three tasks require the delays to be measured and altered if
required. So to design wave-pipelined circuits, it is necessary to
balance the path delays. For practical circuits, usually the
nominal path delay to be a predefined constant Dmax. Because
the circuit must interface to other components in the system.
While in theory the path-delay equalization problem has been
solved, the real challenge is to accomplish it in the presence of
a variety of static and dynamic delay tolerances. So, balancing
problem is major challenge for designers in these days. In this
paper, to overcome this problem active delay elements are used
in the circuit called control circuit and then inserted in the
architecture. This process is also called rough tuning. This
control circuit has a very high speed and it acts as critical path
for the data moves from input to output. So it increases the
overall operating frequency of the circuit.

Fig 1. A combinational logic circuit with I/O registers

Cotten [12] has proved that the maximum rate at which
data logic can propagate through the circuit depends on (Dmax Dmin) where Dmax and Dmin represent the maximum and
minimum propagation delay of the circuit and not just on the
maximum propagation delay. Hence, Cotten proposed that
decreasing the value of (Dmax - Dmin) will increase the
maximum at which data can propagate and called it Maximum
Rate Pipelining[12]. Figure 1 is a block diagram
representation of wave pipelining. In this figure, the skew ▲ =
(Dmax - Dmin).

IV.. PROPOSED CRITICAL PATH SCHEME
A. Concept of control circuit used for critical path scheme
Control circuit is basically a circuit consists of flip-flops
and XOR gates[17]. It is basically high speed circuit. This
circuit will be placed in the wave-pipelined architecture as
shown in figure3 to improve the operating speed. In wavepipelined circuit, data will move from input to output in
different waves. In the previous work, designers calculated
Maximum (Dmax) and minimum (Dmin) distance which varies
from input to output. Then by taking the difference between
these two distances (Dmax – Dmin) they improves the speed. For
this, firstly we have to calculate all the distances, after that
maximum and minimum distance to be sort out. It is very time
consuming process requires more hardware. The modified
wave-pipelined circuit is as shown in Fig 3. In this, whatever
the data will enter at the input X, the same data will be at
output but at very high speed. The advantage of this circuit is
that it is automatic high speed circuit which will automatically
act as critical path for the data moving from input to output.

Fig2. Temporal/spatial diagram of combinational logic
circuit
III.REVIEW OF RELATED WORK
Wave-pipelining is proposed as one of the techniques for
achieving high speed without the cost of increased area and
circuit complexity. In this, the main function is portioned into
many independent but interconnected sub-functions, and these
sub-functions are processed in each stage of the circuit. The
basic criteria used for partitioning the execution path into
stages is to have stages with nearly equal computation delay, so
that all the stages can be kept busy during entire length of clock
cycle. This criterion is hard to achieve in practice because of
the differing amounts of logic per stage and variations in time
delays per logic element. The concept of wave-pipelining has
been described in a number of previous works[2][3][7][20].

Fig 3 Modified wave-pipelined circuit with control circuit

There are many ways of implementing wave- pipelined
circuits. In these circuits, main task is to equalizing or
balancing path delay (Dmax – Dmin) to maximizing the speed.
Previously to maximizing operating speed of the wavepipelined circuit, adjustment of clock period, clock skew (δ)

Fig 4 Control circuit having D flip-flops and XOR gates

International Journal of Electrical and Electronics Engineering (IJEEE), ISSN (PRINT): 2231 – 5284, Volume-I, Issue-2
79

V. FIR THEORY
An FIR with constant coefficients is an LTI digital filter. The
output of an FIR of order or length L, to an input time-series
x[n], is given by a finite version of the convolution sum given
in (1.1), namely

of weights in a memory table. It is assumed that the inputs to
the filter are represented as B-bit 2’s complement binary
numbers with only the sign bit to the left of the radix point. A
discrete-time linear finite impulse response filter generates the
output y[n] as a sum of delayed and scaled input samples x[n].
In other words,
(4)

(1)
Where f [0] = 0 through f [L − 1] = 0 are the filter’s L
coefficients. They also correspond to the FIR’s impulse
response. For LTI systems it is sometime more convenient to
express in the z-domain with
Y (z) = F (z) X (z),

Let the signal samples to the filter be represented as B-bit 2’s
complement binary numbers,
(5)

(2)

Where F (z) is the FIR’s transfer function defined in the zdomain by
(3)

where bil is the lth bit in the 2’s complement representation of
x[n - i]. Substituting equation (5) into equation (4) and
swapping the order of the summations yields
(6)

The Lth-order LTI FIR filter is graphically interpreted in Fig 5
It can be seen to consist of a collection of a “tapped delay
line,” adders, and multipliers. One of the operands presented
to each multiplier is an FIR coefficient, often referred to as a
tap weight for obvious reasons. Historically, the FIR filter is
also known by the name transversal filter. The roots of
polynomial F (z) in (3) define the zeros of the filter. The
presence of only zeros is the reason that FIRs are sometimes
called all zero filters.

For a given set of wi (i = 0……. K - 1), the terms in the square
braces may take only one of 2K possible values, which may be
stored in a memory table, denoted as the DA filtering memory
table (DA-F-MEM). The entry in the DA-F-MEM addressed
by r, is given by
(7)
where ci(r) is the ith bit in the K-bit representation of the
address r.

Fig 5. Direct form FIR filter [34].
V1.DISTRIBUTED ARITHMETI ALGORITHM
(DAA)
Distributed arithmetic (DA) was first introduced by Croisier et
al and further developed by Peled and Lui . DA is a
multiplier-less implementation for computing the inner
product of a pair of vectors [8], a common computation used
in digital signal processing. It is well suited to implementing
high throughput FIR filters and signal transformations such as
discrete cosine transforms or fast Fourier transforms. DA is a
bit-serial computation that forms an inner product of a pair of
vectors in a few steps by storing all possible combination sums

Fig 6. DA implementation of a 4-tap (K = 4) FIR filter. [19]

International Journal of Electrical and Electronics Engineering (IJEEE), ISSN (PRINT): 2231 – 5284, Volume-I, Issue-2
80

A 4-tap (K = 4) implementation of the DA FIR filter is shown
in Fig 6. The DAF-MEM contains all 16 possible combination
sums of the filter weights w0, w1, w2, and w3. The bank of
shift registers in Fig. 6 stores 4 consecutive input samples (x[n
- i], i =0……. 3). The concatenation of rightmost bits of the
shift registers becomes the address of the memory table. The
shift register is shifted right at every clock cycle. The
corresponding memory table entries are also shifted and
accumulated B consecutive times to generate the output y[n]
where B is the precision of the input data. The sign bit control
is used to change the addition to subtraction for the sign bits
which are included in the first expression square brackets in
equation (6). In addition, computing the filtering operation by
utilizing the DA filter can be done in B clock cycles regardless
of the size of the filter, K. Thus, obtaining a high throughput
rate using the DA implementation, especially if K>>B, is
possible. Also due to the regular

500
400
Freq
(MHz)

300
200

Slices

100
0
Wave PipeNon pipe Pipelining

Fig.7 Comparison of speed and area of 4-tap DA filter

VIII. CONCLUSION AND FUTURE SCOPE
VII IMPLEMENTEATION AND RESULTS
A. Implementation of 4 tap FIR filter using VERTAX 4
FPGA
A 4 tap FIR filter with 8-bit inputs and 8-bit coefficients as
shown in fig.7 is implemented along with the proposed critical
path approach in Vertax 4 FPGA using XILINX Project
Navigator 12.3. Simulation results are also checked by using
the same tool. The filter is implemented in three schemes:
non-pipelined, pipelined and wave pipelined and comparisons
are done in terms of operating frequency, area and latency.
From the table 1 and Fig. 7, it can be observed that wave
pipelined DA filter is faster by a factor of 1.43 compared to
the non-piplined DA filter. The Pipelined DA filter is faster by
factor of 1.61 compared to the non pipelined DA filter. But
this gain in speed is achieved at the cost of increased area.
There is an increase in logic utilization by 200 % . Also the
latency measured for the maximum operating frequency for
wave-pipelined DA filter is the least compared to both nonpipelined and pipelined

In this paper, a new critical path approach to speeding up wavepipelining technique for DAA based FIR filter using a control
circuit has been presented. Previous results showed that wave
pipelined DAA based FIR filter is faster by a factor of 1.36
compared to non pipelined techniques. In this work, the
synthesis and simulation results based on Xilinx ISE Navigator
12.3 shows that wave pipelined DAA based filter is faster by a
factor of 1.43 compared to non pipelined and the conventional
pipelined filter is faster than non pipelined by factor of 1.61 but
at the cost of increased logic utilization by 200%.So, the wavepipelined DA filters designed with the proposed control circuit
can operate at higher frequency than that of non-pipelined but
less than that of pipelined. The gain in speed in conventional
pipelined compared to that of wave-pipelined is at the cost of
increased area. Moreover, power dissipation of pipelined
circuits are higher than that of wave-pipelined circuits. When
latency is considered, wave-pipelined design filters with the
proposed scheme are having the lowest latency among three
schemes designed. Results are showing that though area and
power of wave-pipelined circuits are less than that of pipelined
circuits but its performance is less than pipelined circuits.
So, in future further work can be done on wave pipelining
technique to improve its performance.

TABLE 1 IMPLEMENTATION RESULTS FOR 4-TAP DA FILTER IN VERTAX 4

REFERENCES

Pipelining techniques
[1]
Wave pipelining

Non pipelining

Pipelining

Freq
(MHz)

386.033

269.578

454.907

Slices

166

141

332

[2]

[3]

4 I/P
LUT

308

216

324

Burleson, W.P Ciesielski, M Klass, F. Liu, “Wavepipelining: a tutorial and research survey” Very Large
Scale Integration (VLSI) Systems, IEEE Transactions on
Volume: 6 , Issue: 3 Publication Year: 1998 , Page(s):
464 – 474.
Delgado-Frias, J.G. Nyathi, “A hybrid wave-pipelined
network router” VLSI, 2001. Proceedings. IEEE
Computer Society Workshop Publication Year: 2001 ,
Page(s): 165 – 170.
Suryanarayana B. Tatapudi and José G. Delgado-Frias,
“A High Performance Hybrid Wave-Pipelined
Multiplier”. VLSI, 2005. Proceedings. IEEE Computer
Society Annual Symposium .Publication Year: 2005 ,
Page(s): 282 – 283.

International Journal of Electrical and Electronics Engineering (IJEEE), ISSN (PRINT): 2231 – 5284, Volume-I, Issue-2
81

] Lakshminarayanan, G,Venkataramani, B, “Optimization
Techniques for FPGA-Based Wave-Pipelined DSP
Blocks” Very Large Scale Integration (VLSI) Systems,
IEEE Transactions Volume: 13 , Issue: 7 Publication
Year: 2005 , Page(s): 783 – 793.
[5] Kim, W.J,
Kim, Y.-B, “Automating wave-pipelined
circuit design” Design & Test of Computers, IEEE
Volume: 20 ,Issue: 6 Publication Year: 2003 , Page(s): 51
– 58.
[6] Akshaya Srivatsa, Kiran Chandran, Krishna Sankara
Subramanian,Sriharsha Niverty, “Implementation of
Wavepipelined 16 Bit ALU” Department of ECE, The
University of Texas at Austin, Austin TX-78712,2008
[7] G.Seetharaman ,B. Venkataramani, “ Automation
Schemes for FPGA Implementation of Wave-Pipelined
Circuits” ACM Transactions on Reconfigurable
Technology and Systems (TRETS) archive Volume 2 ,
Issue 2 (June 2009) table of contents Article No.: 11
Year of Publication: 2009 ISSN:1936-7406.
[8] Syed Shahzad Shah, Saqib Yaqub, and Faisal Suleman,
“Distributed Arithmetic for the Design of High Speed FIR
Filter using FPGAs” Chameleon Logics, #301, Kiran
Plaza F-8 Markaz, Islamabad,2010.
[9] G.Lakshminarayanan,B.Venkataramani,K.P
Senthilkumar,M. SasitharanV.A Kiran Kottapalli,
“Design and implementation of FPGA based
wavepipelined fast convolver” TENCON 2000.
ProceedingsVolume:3 Publication Year: 2000 , Page(s):
212 - 217 vol.3
[10] T. Feng, B. Jin, J. Wang N. Park
Y. B. Kim ,F.
Lombardi, “Fault tolerant clockless wave pipeline design”
Conference On Computing Frontiers archive Proceedings
of the 1st conference on Computing frontiers table of
contents Ischia, Italy. Session: Pipelined architectures
table of contents Pages: 350 - 356 Year of Publication:
2004”
[11] Santhi M,Seetharaman G, Silwal R, Lakshminarayanan
G “ A novel online clock
skew scheme for FPGA
based Asynchronous Wave–pipelined circuits”, Future
information technology (Future Tech), 2010 5th
international conference,Publication year: 2010,page(s):
[12] L.Cotton, “Maximum rate pipelined systems,” in
Proc.AFIPS Spring Joint Comput.Conf., 1969.
[13] ] Ali Al-Haj, “Configurable Multirate Filter Banks”,
Department of Computer Engineering, School of
Electrical Engineering, Princess Sumaya University for
Technology, PO Box 1928, Al-Jubeiha, 11941 Amman,
Jordan American Journal of Applied Sciences 5 (7): 788797, 2008ISSN 1546-9239 © 2008 Science Publications
..
[14] White, S.A, “Applications of distributed arithmetic to
digital signal processing: a tutorial review” ASSP
Magazine, IEEE Volume: 6 , Issue: 3 Page(s): 4 – 19
Publication Year: 1989
[4]

[15] Dr. Uwe Meyer-Baese, “Digital Signal Processing with
Field Programmable Gate Arrays”, 3rd Ed, Springer
Series .
[16] Manish Garg, “High performance pipelining method for
static circuits using heterogeneous pipelining elements”,
Solid-State Circuits Conference, 2003. ESSCIRC '03.
Proceedings of the 29th European , Publication Year:
2003 , Page(s): 185 – 188.
[17] Aliot M, Palumbo G, “Modeling and optimized design of
current mode MUX/XOR and D flip-flop” , Circuits and
Systems II: Analog and Digital Signal Processing, IEEE
Transactions on Volume: 47 , Issue: 5, Publication Year:
2000 , Page(s): 452 – 461.
[18] Metra C, Giovanelli F, Soma M , Ricco B , “Selfchecking scheme for very fast clocks' skew correction”,
Test Conference, 1999. Proceedings. International,
Publication Year: 1999, Page(s): 652 – 661.
[19] Walter G. Huang, “Implementation of adaptive digital
FIR and reprogrammable mixed-signal filters using
distributed arithmetic” A dissertation submitted to the
Department of School of Electrical and Computer
Engineering and the faculty of the Georgia Institute of
Technology in partial fulfillment of the requirements for
the degree of Master of Science,2009.
[20] Hirak Kumar Maity, Mitra Barun Sarkar and A.
Chakrobarty, “Wave Pipelining: An Analysis for High
Performance Digital Circuits”, International Journal of
Electronic Engineering Research, Vol. 1 Number 3
(2009) pp. 269–278
[21] N.Suresh
Kumar,
D.V.Rama
KotiReddy,
P.
SuryaChandra,Uttam Mande and N. Krishna Santosh,
“Two Way Clock Scheme to Minimize the Clock Skew
in Digital Frequency Measurement”, Advances in
Computational Sciences and Technology ISSN 09736107 Volume 3 Number 3 (2010) pp. 359–367
[22] Mak T, Sedcole P, Cheung P.Y.K, Luk W , “Wavepipelined signaling for on-FPGA communication”,
ICECE Technology, 2008. FPT 2008. International
Conference, Publication Year: 2008 , Page(s): 9 – 16.

International Journal of Electrical and Electronics Engineering (IJEEE), ISSN (PRINT): 2231 – 5284, Volume-I, Issue-2
82

