Online self-repair of FIR filters by Benso, Alfredo et al.
Infrastructure IP
50 0740-7475/03/$17.00 © 2003 IEEE Copublished by the IEEE CS and the IEEE CASS IEEE Design & Test of Computers
THE DSP MARKET has achieved astonishing growth
in the past few years, and no single vendor seems to prof-
it at the expense of another. Driving this growth is the
trend of increasingly more analog-based products to
move to digital technologies. Examples include TV and
audio ICs; serial- and optical-communication ICs; the
Global Positioning System; cellular and Personal
Communication Service (PCS) ICs; and multimedia ICs
and modules.
In fact, more than 80 companies use digital signal
processing in their chips but do not sell them as DSP
chips. The market is even bigger than the revenue for
traditional DSPs and embraces a group of relatively big
DSP vendors, such as Siemens, Rockwell, and Zilog.
Will Strauss, president of Forward Concepts, a market
research ﬁrm in Tempe, Arizona,1 predicts that the DSP
market will grow by a compounded rate of 36% per year
over the next ﬁve years.
Digital filters are perhaps the most widely imple-
mented class of DSP applications because they are
basic building blocks of many complex systems. To
enable a ﬁltering process at high bandwidth, designers
use specialized hardware that can operate at much
higher throughputs than are possible with general-pur-
pose DSPs. This hardware includes ASICs, which allow
hardware optimization of certain popular signal-pro-
cessing algorithms or functions at the cost of ﬂexibility.
Comparing ASICs with DSP microprocessors, it’s clear
that DSPs offer slower speed but maximum flexibility
(due to programmability), whereas
ASICs provide higher speed with minimal
ﬂexibility.
With commercial products incorpo-
rating DSP functions and the market’s
increasing quality requirements, tradi-
tional architectures are becoming inad-
equate. Pressing issues among DSP
designers include new design approach-
es to reduce time to market, as well as new architectures
allowing high testability, reliability, and programmabil-
ity without affecting performance.
Although researchers have proposed several archi-
tectures to reach optimal ﬁlter performance,2-4 no one
has adequately addressed the problem of designing 
self-repairing digital ﬁlters with programmability char-
acteristics. Self-repairing technology could enrich com-
mercial applications requiring high availability and
serviceability. It could also benefit space or defense
applications that must survive and perform at optimal
functionality for long durations in unknown, harsh, and
possibly changing environments.
In that light, we present a self-testable, self-repairable
architecture for ﬁnite impulse response (FIR) digital ﬁl-
ters. This architecture allows repair of permanent faults
without interfering with the ﬁlter behavior or introduc-
ing performance overheads. Although we developed
this approach to repair a single permanent fault, you
can easily scale it to repair any number of faults occur-
ring in the ﬁlter logic. A key feature of the architecture is
its modularity, which allows automatic generation—
that is, reduced design time. We’ve implemented a tool
for automatic ﬁlter synthesis to generate VHDL descrip-
tions of FIR ﬁlters, starting from functional parameters.
Filter architecture
FIR ﬁlters are perhaps the most widely implemented
class of digital ﬁlters. Until recently, engineers designed
Online Self-Repair of 
FIR Filters
Editor’s note:
Chip-level failure detection has been a target of research for some time, but
today’s very deep-submicron technology is forcing such research to move
beyond detection. Repair, especially self-repair, has become very important
for containing the susceptibility of today’s chips. This article introduces a self-
repair solution for the digital FIR filter, one of the key blocks used in DSPs.
—Yervant Zorian, Virage Logic
Alfredo Benso, Stefano Di Carlo, Giorgio Di Natale,
and Paolo Prinetto
Politecnico di Torino
digital filters using a
recipe directly from signal
theory. Starting from the
transfer function, they
could easily deploy ﬁlters
by following the signal
flow and adding glue
logic and delay blocks.
Although this time-con-
suming recipe can lead to
various designs for the
same parameters and
function, it worked well
when designs used only a
few digital filters. Today, however, it is absolutely
incompatible with the emerging market’s constraints
and reliability exigencies.
FIR filters essentially perform a moving, weighted
average of a sequence of input samples, as the follow-
ing equation indicates:
where N is the ﬁlter order; y[n] is the output signal and
x[n] is the input signal, at time n; and hi (0 ≤ i ≤ N) is the
set of ﬁlter coefﬁcients, which also corresponds to the
ﬁlter’s impulse response. This representation is easy to
implement with a modular circuit, including a weight-
ed delay line, as Figure 1 shows.
The delay line is usually a chain of registers working
in a pipeline. In an N-order filter, this sample-in-bus
pipeline contains N + 1 delay elements. This means pro-
cessing each output sample requires a ﬁltering window
of N + 1 input samples. The input samples shift every
time a new sample is ready. A set of multipliers then
multiplies these samples by the relative hi coefficient,
and a cascade of adders (see the lower part of Figure 1)
adds the results to generate the output samples.
The downside of the architecture in Figure 1 is its
use of multipliers, which are very costly blocks with
massive footprints. However, multipliers are critical
for reliability, which decreases quadratically with
area. Many researchers are striving to reduce the hard-
ware complexity of FIR filters to alleviate this prob-
lem. One example is signed-power-of-two algebra,
which allows multiplication using only shift-and-add
operations.5-7 SPT algebra expresses numbers as sums
and differences of negative powers of two, often
called SPT terms. In general, many equivalent SPT rep-
resentations, with a different number of SPT terms,
exist for a single number. To minimize the number of
SPT terms, we opted for the canonic signed digit
(CSD) form, which is the only unique-minimum SPT
representation.8
Despite its advantages, SPT algebra can introduce a
loss of precision in the representation of coefficients
and consequently in the ﬁlter’s output. The ﬁrst rule of
thumb we could infer from physics is that the coeffi-
cients’ precision must be at least equal to the desired
output precision. But in ﬁlter design, this might not be
enough.
Bellomo demonstrated that a simple coefﬁcient trun-
cation close to the desired precision does not produce
the desired precision in the output.9 Therefore, he
coded an implementation of the Trellis search algo-
rithm to choose the best approximating coefficients
under precision constraints.10 We use this algorithm to
obtain the SPT terms in our experiments.
For example, we express coefﬁcient hi = 0.0001 with
precision 106 as
hi = 213 – 215 + 217 + 220
The multiplication of hi for generic input sample x[n]
becomes
(hi)x[n] = (213 –215 + 217 + 220)x[n] 
= 213x[n] – 215x[n] + 217x[n] + 220x[n]
Adders and shifters are necessary to implement this
function. For each SPT term, a programmable shifter
moves x[n] to the right by the number of positions rep-
resented by the term’s exponent, and then adds the
shifted samples together. We call the basis block that
performs these operations the multiply-accumulate
 
y n h x n ii
i
N[ ] = ⋅ −[ ]
=
∑
0
51May–June 2003
x [n]
y [n]
Delay
h0
Delay
h1
Delay
h2
Delay
hN−1
Delay
hN
+
+
+
+
Figure 1. General FIR filter layout.
(MAC) cell, shown in Figure 2. The MAC cell is the basic
building block of our modular architecture, which
Figure 3 shows.
The MAC cell’s behavior is easy to explain. Each MAC
cell is fed by the input samples, and programmed by a set
of SPT terms stored in external registers. Each set of SPT
terms represents a given hi coefﬁcient. The MAC cell
processes an input sample M times (where M is the num-
ber of SPT terms), and each time shifts it a number of posi-
tions equal to the related SPT value. The programmable
shifter dynamically shifts
the sample and, if neces-
sary, takes the 2s comple-
ment of this term based on
its sign. The adder adds it
to the value stored in the
accumulator. This opera-
tion requires M clock
cycles; therefore, the entire
block is overclocked by M
cycles.
To allow ﬂexibility and
programmability, we can
serially load the SPT terms
from an outside source.
This solution lets the user
change the filter charac-
teristics. Obviously, the
SPT register size limits 
the precision of the SPT
terms’ representation.
A cascade of adders adds the MAC cell output val-
ues together to obtain the output signal. Because the
number of SPT terms is not necessarily the same in each
MAC cell, a delay network synchronizes the operation
among the MAC cells.
Now, consider the cascade of adders in the lower
part of Figure 3. Because of the long path between the
ﬁrst and last adder, the characteristics of these compo-
nents heavily influence the filter’s performance. The
main constraints are low area and high speed.
Sklansky’s topology seems to be the best choice in
terms of complexity—it has complexity (n/2)log2n (where
n is the adder parallelism)—and total delay, log2n.11
Test strategies
Our work aims to obtain high testability and relia-
bility toward hard and soft errors. We propose two solu-
tions, each covering different faults at different stages
of the ﬁlter’s operating life:
 Power-on self-test (POST). This solution, which the
system’s power-on enables, mainly serves to detect
permanent faults that affect the ﬁlter’s logic.
 Online self-repair. The ﬁlter can execute self-tests con-
current with its normal behavior and when neces-
sary to replace faulty blocks.
Power-on self-test
POST is an ofﬂine test strategy, usually enabled at the
Infrastructure IP
52 IEEE Design & Test of Computers
y [n]
SP
 terms
for h2(SPT2)
SPT
terms
for hN(SPTN)
Delay Delay Delay
+
+
+
Serial_load_out
MAC
h2
MAC
hN−1
MAC
hN
SPT
terms
for hN−1(SPTN−1)
x [n]
Delay
MAC 
h0
SPT
terms
for h0(SPT0)
SPT
terms
for h1(SPT1)
Delay
+
Serial_load_in
MAC
h1
Figure 2. Multiply-accumulate (MAC) cell.
Accumulator
Programmable shifter
(Complementing)
Valid
hix [n − i ]
x [n]
Modulus M
counter
Clock × M
SPT0
SPT2
SPT1
Delay
+
Figure 3. Modular filter architecture.
system’s power-on to detect permanent faults. The idea
is to give the ﬁlter a set of ad-hoc test samples, let the ﬁl-
ter work on these samples, and compare the results with
previously computed results.
Based on the general layout in Figure 3, we identify
three categories of components to test:
 the sample-in bus,
 the MAC cells, and
 the cascade of adders.
Figure 4 shows the sample-in bus’ structure. We must
carefully test both registers and interconnections to
avoid stuck-at or faulty connections, which could great-
ly inﬂuence system performance.
In normal mode, the sample-in bus streams the
digital samples for processing. In test mode, a set of
test patterns, called the mini bus test, feeds this bus.
The test pattern generator applies a single sample to
the filter and after N clock cycles (where N is the
number of registers in the sample-in bus), the output
data evaluator (ODE) observes the same pattern at
the chain’s output. If the ODE reads a different word,
the bus is faulty.
The fault coverage depends on the test patterns
applied to the sample-in bus. Detecting stuck-at faults
requires only two test patterns (000 … 0 and 111 … 1),
but detecting couplings or shorts between bus lines
requires additional patterns. Background patterns, usu-
ally applied during memory testing, provide a good
tradeoff between coverage and the number of test pat-
terns.12 The architecture can also accommodate custom
test patterns if the designer has particular reliability
requirements. If the block passes the mini bus test, the
architecture considers both registers and interconnec-
tions in the sample-in bus as fault free.
Now consider the MAC cell in Figure 2. A set of reg-
isters program the MAC cell to store the SPT terms. The
MAC cell receives input patterns from the sample-in bus,
which the mini bus test feeds. During POST, the system
modiﬁes the SPT registers to work as a linear-feedback
shift register (LFSR) to provide test patterns to the MAC
cell. The architecture verifies the absence of faults by
transforming the accumulator to work as a multiple-
input signature register (MISR) and by checking the
ﬁnal signature.
Concerning timing, the sample-in bus produces test
patterns according to the normal system clock, whereas
the MAC cell is overclocked by M cycles and receives
test patterns from the SPT inputs at this high frequency.
The combination of high-speed patterns from the SPT
inputs and low-speed patterns from the sample-in bus
provides high fault coverage.
Now we address the problem of testing the chain of
adders in the lower part of Figure 3. The idea is to
exploit the entropy produced by the MAC cells during
POST to produce test patterns for the adders. Every
adder connects to two different registers, each coming
from a different block. POST conﬁgures these registers
as MISRs so that their content changes continually and
pseudorandomly. 
This approach has a drawback, however: If every
MAC cell’s POST begins simultaneously, the entropy on
the adders will be insufﬁcient to ensure high fault cov-
erage, because the adders will see couples of similar
operands. The solution is conceptually very simple,
although its implementation requires more complexity
overhead: In enabling each MAC block’s POST, we
place a 1-clock pulse delay between that block and the
previous block.
The system finally signs the adder chain’s output
using a MISR to detect faults. A controller governs POST
53May–June 2003
Sample-in
bus
Output data
evaluator
(ODE)Test programgenerator
(TPG)
Normal/test
x [n]
Delay Delay Delay Delay
Figure 4. Sample-in bus test.
procedures by synchronizing the test structures and
checking the test results. Figure 5 shows the general
architecture for the POST.
Online self-test and self-repair
The filter’s modular structure is perfectly suited to
implement efficient online BIST and self-repair strate-
gies. A basic ﬁlter module (BFM) is the union of a sam-
ple register with its MAC cell and its final adder. Our
approach is to introduce one or more spare BFMs into
the architecture. During normal behavior, a spare BFM
periodically replaces each module. The system tests
each module using an approach similar to (and reusing
the same structures as) the POST mechanism just
described. 
If the test detects no faults, the system reintro-
duces the module and selects the next one for test. If
the test fails, the BFM is faulty, and the system inter-
rupts the replacement mechanism. Thereafter, the
system can no longer detect and correct the occur-
Infrastructure IP
54 IEEE Design & Test of Computers
x [n]
MAC h0
LFSR MISR
SPT0 MAC h1
LFSR MISR
SPT1 MAC hN
LFSR MISR
MISR
SPTN
+
+
DelayDelay Delay
TPG
y [n]
ODE
Power-on
self-test
controller
Figure 5. General power-on self-test (POST) architecture.
A B C D E F G H
(a)
A B C D E F G H
(b)
Figure 6. Filter before (a) and after (b) the repair process. The shaded area is the spare module.
rence of any other fault online, but it can still work
without degradation.
Figure 6 shows the fault-free ﬁlter structure and the
repaired structure, in which the spare module shifts the
pipeline’s functionality and acts as a substitute for the
faulty module.
Implementing this repair scheme needs the intro-
duction of alternative routing paths to exclude the cell
under test from the chain without introducing any delay
in the ﬁltering process. Switching devices manage the
alternative routing-path mechanism.
Figure 7 shows the ﬁlter’s new layout, where pairs of
multiplexers ensure the correct input at every stage of
the sample-in bus. After the repair operation, the system
asserts an output signal to inform the user that the out-
put values can be temporarily unreliable and that, if an
error occurs again, the system will not be able to repair
the chip.
Bypassing a faulty BFM is not enough to completely
repair the ﬁlter. In fact, the blocks are all equal, but they
receive different SPT terms from the SPT registers. The
repair process must remap the SPT terms in the new
conﬁguration. A switching element distributes the SPT
terms to different modules. The replacement mecha-
nism requires programming each module with both its
own SPT values and those of the previous module,
thereby minimizing the area overhead introduced by
the switching device.
Table 1 shows some routing scenarios. For the BFMS
test, the system programs each module with the related
SPTS terms (where S refers to the spare cell) and tests
the additional module. For the BFM0 test, routing is
switched so that BFM1 receives the SPT0 terms, BFM2
receives SPT1 terms, and the additional BFMS receives
the SPTN terms.
The downside of this strategy is that the switching
elements and SPT registers are not repairable. However,
these components represent only 2.4% of the total ﬁlter
area, making the architecture’s dependability level ade-
quate for most of today’s applications. If a higher relia-
bility level is necessary and the area is available, it is
possible to duplicate these elements to reach full
repairability.
Experimental results
We analyzed our architecture’s performance by
implementing a fourth-order ﬁlter with 16-bit input sam-
ples. The ﬁlter’s transfer function is
yn = (0.1)xn + (0.05)xn–1 + (0.07)xn–2 + (0.01)xn–3
The SPT terms representing the filter’s coefficients
are as follows:
0.1 = 2–3 – 2–5 + 2–7 – 2–9 + 2–11 – 2–14
0.05 = 2–4 – 2–6 + 2–8 – 2–10 + 212 – 214
0.07 = 2–4 + 2–7 – 2–12 – 2–14
0.01 = 2–7 + 2–9 + 2–12
This representation guarantees a precision of 10–4.
To evaluate the area overhead, we described the ﬁlter
in VHDL language, using Austriamicrosystems’ csx_HDR-
LIB to synthesize it with Synopsys’ design_compiler.
Figure 8 shows the implementation. We implemented
three different solutions in terms of dependability: no
test, POST, and built-in self-repair (BISR). Table 2 lists the
area (in Synopsys gate count values) for each of these
solutions.
The resulting area overhead is 13% for POST and 33%
for BISR. To evaluate the solutions’ fault coverage, we used
Synopsys’ TetraMax for fault simulations. The BISR solu-
tion detected and repaired 97.2% of single stuck-at faults.
We tested the entire circuit in (N + 6) × M × 6 × 256
fast clock pulses and (N + 6) × 6 × 256 clock pulses
(where N is the ﬁlter order, and M is the number of SPT
terms to represent a ﬁlter coefﬁcient). For N = 128 coef-
ficients with a 100-MHz clock, POST requires 205,824
clock pulses—that is, 2.05 ms.
55May–June 2003
Figure 7. Alternative routing paths.
Table 1. Routing scenarios.
       Single-power-of-two terms received by       
Test of BFM0 BFM1 BFM2 BFMN BFMS
BFMS SPT0 SPT1 SPT2 SPTN Under test
BFM0 Under test SPT0 SPT1 SPTN–1 SPTN
BFM1 SPT0 Under test SPT1 SPTN–1 SPTN
FUTURE WORK will continue to apply and refine the
design methodology presented here to achieve higher
levels of testability and dependability. In particular,
more work is necessary to identify hard-to-test areas in
the circuit. Fault coverage higher than 99% is essential
for mass production. In addition, new solutions should
be exploited to solve the repairability shortcomings for
switching elements and SPT registers, without strongly
impacting the design area. 
Acknowledgments
This work is partially supported by Instituto
Superiore per le ICT Mario Boella under contract Test
D.O.C.: Quality and Reliability of Complex SoC.
References
1. Electronics Market Research, Forward Concepts Co.,
Tempe, Ariz.; http://www.fwdconcepts.com.
2. L. Goodby and A. Orailoglu, “Redundancy and Testabili-
ty in Digital Filter Datapaths,” IEEE Trans. Computer-
Aided Design of Integrated Circuits and Systems, vol.
18, no. 5, May 1999, pp. 631-644.
3. C. Counil and G. Cambon, “A Functional BIST Approach
for FIR Digital Filters,” Proc. IEEE VLSI Test Symp. (VTS
92), IEEE CS Press, 1992, pp. 90-95.
4. C.-W. Wu and J.-C. Wang, “Testable Design of Bit-Level
Systolic Block FIR Filters,” Proc. IEEE Int’l Symp.
Circuits and Systems (ISCAS 92), vol. 3, IEEE Press,
1992, pp. 1129-1132.
5. N. Benvenuto, L.E. Franks, and F.S. Hill Jr., “Dynamic
Programming Methods for Designing FIR Filters Using
Coefficients –1, 0 and +1,” IEEE Trans. Acoustics,
Speech, and Signal Processing, vol. 34, no. 4, Aug.
1986, pp. 785-792.
6. D. Li, J. Song, and Y.C. Lim, “A Polynomial Time Algo-
rithm for Designing Digital Filters with Powers-of-Two
Coefficients,” Proc. IEEE Int’l Symp. Circuits and
Systems (ISCAS 93), vol. 1, IEEE Press, 1993, pp. 84-
87.
7. Y.C. Lim and S.R. Parker, “FIR Filter Design over a Dis-
crete Powers-of-Two Coefficient Space,” IEEE Trans.
Acoustics, Speech, and Signal Processing, vol. 31, no.
3, June 1983, pp. 583-590.
8. R. Hartley, “Subexpression Sharing in Filters Using
Canonic Signed Digit Multipliers,” IEEE Trans. Circuits
and Systems II: Analog and Digital Signal Processing,
vol. 43, no. 10, Oct. 1996, pp. 677-688.
9. P. Bellomo, Study of a Decimator for Sigma-Delta Con-
version Suitable for Space Applications, master’s thesis,
Dipartimento di Automatica e Informatica, Politecnico di
Torino, Turin, Italy, 2000.
10. C.L. Chen and A.N. Wilson Jr., “A Trellis Search
Algorithm for the Design of FIR Filters with Signed-Pow-
ers-of-Two Coefficients,” IEEE Trans. Circuits and Sys-
tems II: Analog and Digital Signal Processing, vol. 46,
no. 1, Jan. 1999, pp. 29-39.
Infrastructure IP
56 IEEE Design & Test of Computers
Figure 8. Fourth-order filter implementation used to analyze
architecture performance.
Table 2. Area occupied by architecture for three
different solutions.
Solution Area (no. of gates)
No test 839,123
POST 713,698
BISR 900,908
11. J. Sklansky, “Conditional-Sum Addition Logic,” IRE
Trans. Electronic Computers, vol. 9, no. 2, June 1960,
pp. 236-240.
12. A.J. Van de Goor, Testing Semiconductor Memories:
Theory and Practice, Wiley, 1991.
Alfredo Benso is a researcher in the
Department of Automation and Infor-
mation Technology at Politecnico di
Torino in Turin, Italy. His research
interests include DFT techniques, BIST
for complex digital systems, dependability analysis of
computer-based systems, and software-implemented
hardware fault tolerance. Benso has an MS in com-
puter engineering and a PhD in information technolo-
gies, both from Politecnico di Torino. He chairs the
IEEE Computer Society Test Technology Technical
Council (TTTC) Web-Based Activities Group.
Stefano Di Carlo is a PhD candi-
date in the Department of Automation
and Information Technology at Politec-
nico di Torino. His research interests
include DFT techniques, SoC testing,
BIST, and FPGA testing. Di Carlo has an MS in com-
puter engineering. He is the chair of the TTTC’s Elec-
tronic Submissions committee.
Giorgio Di Natale is a PhD candi-
date in the Department of Automation
and Information Technology at Politec-
nico di Torino. His research interests
include DFT techniques, BISR, and
FPGA testing. Di Natale has an MS in computer engi-
neering from Politecnico di Torino. He is an associate
Webmaster of the TTTC.
Paolo Prinetto is a full professor of
computer engineering at Politecnico
di Torino and a joint professor at the
University of Illinois at Chicago. His
research interests include testing, test
generation, BIST, and dependability. Prinetto has an
MS in electronic engineering from Politecnico di Tori-
no. He is a Golden Core Member of the IEEE Comput-
er Society and the TTTC’s chair-elect.
Direct questions and comments about this article
to Stefano Di Carlo, Politecnico di Torino, Corso Duca
degli Abruzzi 24, 10129 Turin, Italy; dicarlo@polito.it.
For further information on this or any other computing
topic, visit our Digital Library at http://computer.org/
publications/dlib.
57May–June 2003
Get access
to individual IEEE Computer Society 
documents online.
More than 57,000 articles 
and conference papers available!
US$5 per article for members 
US$10 for nonmembers
http://computer.org/publications/dlib/
