JGN II R&D Project : Research and Development of Surrounding Computing Technology by FUKUMOTO, Masahiro et al.
Kochi University of Technology Academic Resource Repository
?
Title JGN II R&D Project : Research and Development of Surrounding Computing Technology
Author(s)
FUKUMOTO, Masahiro, SHIMAMURA, Kazunori, IWATA,
Makoto, HAMAMURA, Masanori, SAKAI, Keiichi, MEND
ORI, Takahiko, TSUZUKI, Shinji, YAMAGUCHI, Takum
i, HAYASHI, Hideki
Citation????????, 2(1): 149-159
Date of issue2005-03-31
URL http://hdl.handle.net/10173/140
Rights
Text versionpublisher
?
?
Kochi, JAPAN
http://kutarr.lib.kochi-tech.ac.jp/dspace/
149
JGN II R&D Project 
Research and Development of Surrounding Computing Technology 
FUKUMOTO Masahiro*, SHIMAMURA Kazunori, IWATA Makoto
HAMAMURA Masanori, SAKAI Keiichi, MENDORI Takahiko
TSUZUKI Shinji, YAMAGUCHI Takumi and HAYASHI Hideki 
Kochi JGN II Research Center 185 Miyanokuchi, Tosayamada-cho,
Kami-gun, Kochi 782-8502, Japan 
E-mail : * fukumoto.masahiro@kochi-tech.ac.jp
要約：より快適な情報環境を実現するためには多様なサービスを必要に応じて提供することが重要で
あり、特に映像など 負荷が非常に大きな情報を処理するためには、ネットワーク上に分散する資源を
自由に活用するための技術が求められる。ユビキタス環境では、遠く離れた複数地点からデータを収
集しリアルタイムで処理するといったこともできるが、データの流れを考えたときデータを集約する
必然性は そもそもなく、分散処理することが自然である。そこで、ネットワークや端末の機能を意 
識せずに、ネットワーク上の計算資源やDB資源を自由に活用できる、進化するユビキタス環境である、
サラウンディング・ コンピューティング環境を確立する。本稿では、サラウンディング・コンピューティ
ング技術の基礎であるユビキタス環境で有用なデータ駆動プロセッサによるファイアウォールと情報
再現に適した信号処理方式について述べている。
Abstract : For comfortable information networking,　it is necessary to provision variety of services 
for responding the requirements and to flexibly use of the distributed resources。In the "ubiquitous" 
environment,the distributed processing is natural to push data for realtime application。The 
purpose of this research is to establish the "surrounding computing technology",which is evolution of 
the ubiquitous environment.　In this paper,an embedded data-driven firewall processor and a signal 
processing method that is suitable for an information reproduction are proposed.
JGN II R&D Project
Research and Development of Surrounding
Computing Technology
FUKUMOTO Masahiro∗, SHIMAMURA Kazunori, IWATA Makoto,
HAMAMURA Masanori, SAKAI Keiichi, MENDORI Takahiko,
TSUZUKI Shinji, YAMAGUCHI Takumi and HAYASHI Hideki
Kochi JGN II Research Center
185 Miyanokuchi, Tosayamada-cho, Kami-gun, Kochi 782-8502, Japan
E-mail : * fukumoto.masahiro@kochi-tech.ac.jp
�� : ��������������������������������������
������������������������������������������
������������������������������������������
������������������������������������������
�������������������������������������������
��������������������������������DB��������
������������������������������������������
������������������������������������������
������������������������������������������
�������
Abstract : For comfortable information networking, it is necessary to provision variety of
servic s for responding the requirement and to ﬂexibly use of the distributed resources. In
the “ubiquitous” environment, the distributed processing is natural to push data for real-
time application. The purpose of this research is to establish the “surrounding computing
technology”, which is evolution of the ubiquitous environment. In this paper, an embed-
ded data-driven ﬁrewall processor and a signal processing method that is suitable for an
information reproduction are proposed.
1. Introduction
JGN II is an open testbed network envi-
ronment for research and development, which
was previously operated by JGN (Japan Giga-
bit Network : Gigabit network for R&D) from
April 1999 to March 2004, and expanded by
the National Institute of Information and Com-
munications Technology (hereinafter NICT) as
a new ultra-high-speed test bed networks for
150
R&D collaboration between industry, academia,
government with the aim of promoting a broad
spectrum of research and development projects,
ranging from fundamental core research and de-
velopment to advanced experimental testing, in
areas including the advancement of network-
related technologies for the next generation and
diverse range of network application technolo-
gies.
Further, seven own research centers for re-
search and development (hereinafter Research
Center) are also being provided in collaboration
with JGN II.
The seven Research Centers collaborating
with JGN II are conducting the research and
development in relation to the following four
themes regarding Research and Development on
Advanced Networks and Application Technolo-
gies.
1. Highly Reliable Core Network Technology
2. Access Network Technology
3. Grid Technology
4. Platform and Application Technology
Kochi JGN II Research Center has been es-
tablished since April 2004, as a center of the or-
dinary research in the JGNII project, the R&D
for “Platform and Application Technology” is
being carried out. Speciﬁcally, we are focusing
on the “Surrounding Computing Technology”.
2. Design Concept of An Embedded
Data-Driven Firewall Processor
With rapid advancement of information net-
working technology, various networked systems/
devices are permeating among our daily lives
and oﬃces. It can be said that surrounding net-
working/computing environment will be coming
soon around us. In order to keep this com-
fortable environment robust and safe against
malicious or legitimate intruders and viruses,
there are developing many security products
such as ﬁrewall systems, network intrusion de-
tection systems, virus protection systems, and
so on [1][2]. Since most of ﬁrewall systems mon-
itor packets on the network wire, they cannot
completely prevent all accidents and attacks es-
pecially among local hosts. On the other hand,
software-based security solutions like [3] can be
rendered useless if the OS is exploited, compro-
mising the computer and potentially the internal
network.
This paper describes the design concept of
hardware-based ﬁrewall processor embedded in-
side to local hosts in order to eliminate the possi-
bility of internal attack from behind the perime-
ter ﬁrewall and then it illustrates some experi-
mental results in our feasibility study. This kind
of embedded ﬁrewall processor is required to
be robust, secure and evolutional even against
a newly-discovered attack, as well as to be
low power consumption and high performance.
Thus, our ﬁrewall processor was decided to be
designed by introducing the self-timed super-
pipelined data-driven chip-multiprocessor archi-
tecture [4] incorporating a speciﬁc instruction
set for ﬁrewall functions. The data-driven prin-
ciple provides us natural multiprocessing capa-
bility without any process scheduling or complex
interrupt handling. Furthermore, the self-timed
pipeline scheme serves us ﬂexible pipeline pro-
cessing capability for high-speed packet stream
and ﬂexible power saving feature.
The ﬁrst prototype system implemented
for our feasibility study is equipped with a
packet classiﬁcation, a stateful packet inspec-
tion for the layer 4 protocols, and a simple
URL ﬁltering. Since every function is realized
by super-pipelined algorithms, it can be exe-
cuted in highly-parallel on the data-driven chip-
multiprocessor. Preliminary evaluation results
in our feasibility study show that our ﬁrewall
processor potentially operates at over 100 M b/s
in wire-speed, even if it is equipped with a sin-
gle processor. Furthermore, since scalable per-
formance increasing along with the number of
processors is also observed, a single chip incor-
porating 10 processors could be expected to op-
erate over 1 G b/s packet stream.
151
2.1 Design Considerations on Embed-
ded Firewall
2.1.1 Speciﬁc Features of Embedded
Firewall
Recently mobile users, telecommuters and
business-to-business extranets are signiﬁcantly
increasing. Those are no longer protected by
only normal perimeter ﬁrewalls because they
can directly access to the protected intranet us-
ing dial-in, P2P, encrypted application traﬃc.
Thus, those hosts might be potentially unwit-
ting insiders if they are infected with some virus
or worm at the outside of the protected network.
These trends lead necessity of an embedded ﬁre-
wall attached to the mobile hosts such as mo-
bile PC, PDA, mobile phone, etc. The embed-
ded ﬁrewall is placed on a network interface card
of a host computer and ﬁlters Internet Protocol
(IP) traﬃc to and from the host. The embedded
ﬁrewall is tamper-resistant because it is indepen-
dent of the host’s operating system. The basic
concept of an embedded ﬁrewall was originally
proposed by C. Payne et al. [5].
The basic functions of the embedded ﬁre-
wall are shown in Fig.1. At ﬁrst, an incoming
packet to the host is stored to the packet buﬀer
and its IP header and TCP/UDP header are
transferred to a dynamic packet ﬁltering func-
tion. In this case, its destination IP address
must be same as the IP address of the host
except for multicast addresses. In case of the
outgoing packet from the host, its source IP
address must be same as that of the host as
long as the host is not a willing/ unwilling in-
truder. This means that ﬁltering cost of one
of IP address ﬁelds can be reduced. In the dy-
namic packet ﬁltering function, the TCP header
is identiﬁed whether it belongs to a new connec-
tion or not. If it will establish a new connec-
tion, a packet classiﬁcation module (classiﬁer)
checks it using a ﬁltering rule database. If it
belongs to the existing connections managed in
the ﬁrewall, the packet is examined by a state-
ful packet inspection (SPI) whether it will cause
APF
Header
Payload
Discard
Forward /
Discard
Incoming
packets
to the host
Fowarding
packets
APF: Application Filtering
Dynamic packet filtering
Packet Buffer
Content
Outgoing
packets
from the host
SPI UDP
ICMP
TCP
Classifier
SPI: Statefull Packet Inspection
Figure 1 Basic Function of the Embedded Fire-
wall.
a legitimate state-transition of the connection.
Although UDP is a kind of connectionless proto-
col, a virtual connection for the UDP stream can
be assumed by the pair of IP addresses and port
numbers. Therefore, even for the UDP packets,
a SPI is useful to make the host more secure. Af-
ter the stateful inspection of the layer-4 packet
header, only the acceptable packet can be passed
to an application layer ﬁltering (AF) to inspect
its contents such as URL or e-mail attached ﬁle.
Finally, the forwarding packet is determined and
forwarded from the packet buﬀer.
In order to accept high-speed packet stream
at the embedded ﬁrewall, pipelined parallel al-
gorithms for those functions must be investi-
gated and further their eﬃcient hardware plat-
form should be established. So, our ﬁnal ob-
jective is to realize a small, low-power, high-
performance embedded ﬁrewall processor capa-
ble to be employed into the CF card or mobile
phone.
2.1.2 Data-Driven Processing Paradigm
The data-driven computation paradigm al-
lows us to represent system functions in natu-
ral way, including both software and hardware
portions seamlessly. This advantage leads to
improve deign productivity of the system-on-
chip (SoC) LSI’s in contrast to the conventional
Von Neumann computation. Furthermore, the
data-driven execution control signiﬁcantly ﬁts
the self-timed super-pipeline circuit. It is neces-
152
sary for realizing high-performance, low-power,
and easy-to-design SoC systems even if it is
realized by a deep sub-micron semiconductor
process. These excellent features have been
proved by successfully developing a self-timed
super-pipelined data-driven multimedia proces-
sor (DDMP) chip in which ten processors are in-
terconnected to each other via an on-chip packet
router [4].
Furthermore, in order to apply its high-
speed stream processing capability to Inter-
net routers or IP forwarding engines, a multi-
protocol data-driven network processor (DDNP)
is currently developed. DDNP can accept both
IPv4 and IPv6 packet streams without any pro-
cess scheduling so that it can process both proto-
cols simultaneously at around 2 G b/s. Further-
more, we proposed a super-pipelined IP lookup
scheme on DDNP by introducing some com-
pound lookup instructions. This scheme can
search a forwarding path from a large routing ta-
ble (100 K routing entries) at over 50 M IP pack-
ets/s [6] and furthermore classify layer 4 packets
at around 12 M IP packets/s [7]. In this paper,
design concept of an embedded data-driven ﬁre-
wall system is proposed as one of applications of
DDNP.
2.2 Architecture of Embedded Data-
Driven Firewall
Essential data-dependencies among func-
tions in the embedded ﬁrewall can be expressed
as shown in Fig.2. If the maximum dataﬂow
based on these data-dependencies is always kept
on DDNP, many architectural advantages of
DDNP can be fully utilized. That is, the
dataﬂow diagram in Fig.2 shows us the following
key points in designing the data-driven ﬁrewall
system.
(a) Eﬃcient dynamic multiprocessing, i.e., pro-
cess creation, execution, and deletion, in-
cluding state-transition process.
(b) Parallel implementation of all ﬁltering func-
tions, i.e., classiﬁcation, stateful packet in-
Association
Packet
Classification
Input IP packets
Discarding
Active
Connections
SPI process i
Forwarding
Create a new
SPI process
Unknown Existing
Refer
Delete
a SPI process
Forwarding
Filtering
Rules
SPI
Rules
Discarding
Forwarding
Application
Filtering
Multiple SPI processes
Packet
Buffer
HeaderPayload
Figure 2 Basic Dataﬂow of the Embedded Fire-
wall System.
spection (SPI), and application layer ﬁlter-
ing (APF).
(c) High speed packet buﬀering mechanism
which is capable to access oﬀ-chip memory
modules, e.g., SDRAM, DDR-RAM, etc.
2.2.1 Software Structure
(a) Dynamic multiprocessing based on the
data-driven computation
Using the tag identiﬁer of the dynamic data-
driven computation principle, active data be-
longing to each process can be identiﬁed by its
tag identiﬁer. Thus, the data-driven processor
can carry out multiple instances of the same
program in the highly-parallel manner, even if
they are state-transition processes. The state-
transition process such as SPI is basically a sort
of history-sensitive process in which the next
state and one of selective functions are deter-
mined by only the current state and input data.
Each current state of the processes can be rep-
resented as a tagged token data without storing
it into the memory. Therefore, they can be exe-
cuted in parallel same as normal functional pro-
cesses under the data-driven ﬁring rule. How-
ever, in the case of multiple state-transition pro-
cesses, the processor must associate every input
packet header with one of the existing connec-
tions at the ﬁrewall and then process it with its
153
corresponding current state. This kind of asso-
ciation is essential for the dynamic data-driven
processor to create, execute and delete multipro-
cessing instances associated with the tag identi-
ﬁers. There is no existing data-driven processor
that supports it.
The association process for the SPI must ac-
cept the following queries; refer request from ev-
ery input IP packet and update request whenever
connections are established or ﬁnished. This
leads a quick search scheme for an associative
memory. In our implementation, the associa-
tive memory function is realized by introducing
a hybrid scheme with software hashing and small
content-addressable memory (CAM) [8]. Since
large CAM modules are still expensive due to
complex memory cells and WTA circuits, our
CAM is used to store the conﬂicted data by the
hashing module so as to reduce the capacity of
CAM. Particularly, it is more useful for the em-
bedded ﬁrewalls because distribution of IP ad-
dress of them is smaller rather than that of the
perimeter ﬁrewalls. In fact, preliminary evalua-
tion result of the proposed scheme indicates that
the scheme reduces the search time by 30%–90%
in comparison to the software implementation
without CAM.
(b) Parallel implementation of ﬁltering
functions
As described in 2.1,1, the embedded ﬁrewall
is required to support static packet classiﬁca-
tion, stateful packet inspection, and application
layer ﬁltering. As for the static packet ﬁltering,
our super-pipelined packet classiﬁcation scheme
can be applied. Although the level compressed
(LC) trie structure introduced in our classiﬁ-
cation needs little higher update cost in place
of reducing the search space, the classiﬁcation
rule in the embedded ﬁrewall does not often up-
dated usually. Therefore, it is introduced for
the static packet ﬁltering for the embedded ﬁre-
wall. As discussed in 2.1.1, classiﬁcation ﬁelds
of the layer 4 packets can be reduced to 3 ﬁelds;
one IP address and two TCP/UDP port num-
bers. Other ﬁelds can be checked using a simple
exact-matching method even if they are needed.
Preliminary evaluation estimates that the ﬁlter-
ing performance could be from 3.5 M to 5.0 M
IP packets/sec. and its required memory space
could be 8 k bytes when the classiﬁcation rule
size is assumed as around 2 k entries [9].
Secondly, SPI is required to detect malicious
packets by simulating state transition of the con-
nection and checking their TCP ﬂags with the
current state of the connection. Normally, SPI
for the embedded ﬁrewall should operate several
thousands connections at the same time. Our
SPI implementation works at 329 k IP pack-
ets/sec on a single processor [8]. Oﬀ course, the
performance can be scaleablly improved in pro-
portion to the number of available processors
because of the elegant multiprocessing capabil-
ity of DDNP.
Thirdly, application ﬁltering is required to
support various protocols such as http/https,
smtp, snmp, and so on to analyze their mes-
sage contents and ﬁnd malicious messages. This
analysis is a very heavy task and usually needs
some decryption engines and syntactic parsing
engine. In our feasibility study, a simple URL
ﬁltering is implemented in software without spe-
ciﬁc hardware mechanism. This URL ﬁltering
program operated around 9.1 k IP packets/sec.
on a single processor when all http packets are
”GET” messages. It could be improved by in-
troducing some strong string matching instruc-
tions in hardware. By the way, in case of embed-
ded ﬁrewall, highly functional ﬁrewall could be
realized by utilizing process information which
are running on the local host OS. This means
that even if the host might be infected by some
viruses, the embedded ﬁrewall can prevent ma-
licious network attacks checking the origin pro-
cess of packets. Even if the host intends to be
a malicious intruder using smart VPN software
like SoftEther [10], it could be detected using
the process information.
154
Table 1 Evaluation Results on a single processor.
Classiﬁcation SPI URL Filtering
Throughput [IP packets/sec.] 3.5 M ∼ 5.0 M 329 K 9.1 K ∼ 1.5 M
Program size [DDNP nodes] 16 443 1027
IM: Internal On-chip Memory
(CAM, SRAM)
EM: External Off-Chip Memory
TF: Timer Function
D: Flow Diverting Module
MM: Matching Memory
HSP: History-Sensitive Processing Unit
FP: Functional Processing Unit
CPS: Cache Program Storage
M: Flow Merging Module
IM
M MM
HSP
FP
CPSD
TF
EM
Input Data
Output Data
Figure 3 Basic Architecture of the Embedded
Data-Driven Firewall Processor.
2.2.2 Basic Hardware Architecture
Fig.3 shows the basic hardware architecture
of a data-driven ﬁrewall processor. An input IP
packet is ﬁrst divided into several data-driven
packets which hold a 32 bit data. Each data-
driven packet is then fed from outside to the
processor to a merging module (M). Passing
through M, the packet arrives at a location in
the matching memory (MM) and is stored there
until its counterpart arrives. In MM, a pair of
packets is identiﬁed by comparing their tag iden-
tiﬁer and destination node number each other.
If matching occurs, the two fragments are com-
bined to form an operation packet containing
an operation code, a destination node number,
a color, and a pair of operands. Then, the op-
eration packet will be delivered to the history-
sensitive processing unit (HSP) and the func-
tional processing unit (FP) where operation(s)
indicated by the operation code can be per-
formed. After the speciﬁed operation is exe-
cuted, a result packet is generated and sent to
a cache program storage (CPS). The next desti-
nation of the packet is read from the cache pro-
gram store. The old destination node number is
replaced with a new destination number. After
a result packet is generated by CPS, it fed to a
diverting module (D) and switched to an out-
put according to the destination named in the
header of the packet.
In the ﬁrewall processor, two lookup instruc-
tions for the high-speed classiﬁcation are imple-
mented in HSP. Furthermore, HSP allows the
operation packet to access three kinds of mem-
ory modules; on-chip SRAM module, on-chip
CAM module, and oﬀ-chip SDRAM module.
To access those RAM memory modules ﬂexi-
bly, three modes of memory addressing are sup-
ported; i.e., absolute address, relative address
by data of the packet, relative address by the
tag identiﬁer of the packet. The last one, which
is a unique feature of the data-driven proces-
sor, helps the operation packets modify data
belonging to their program instance (process).
It is very useful in multiprocessing. The small
CAM module is used for the eﬃcient associa-
tion as mentioned in section 3.1. As for oﬀ-
chip SDRAM memory, it is used as an IP packet
buﬀer to store the payload of the IP packet tem-
porally. In order to utilize a burst access mode of
SDRAM, a small cache memory is implemented
at the interface circuits of oﬀ-chip memory mod-
ule.
Since network processing often requires bit-
wise operations like extraction of each ﬁeld of IP
header, FP supports some bit-wise instructions.
Furthermore, it supports a timer function to re-
alize several time-out operations.
With the customization of FP and HSP, a
speciﬁc instruction set suitable for the embed-
ded ﬁrewall is realized on DDNP. By utiliz-
ing the speciﬁc instruction set, the ﬁrewall pro-
grams on DDNP were developed for our feasibil-
ity study. In order to estimate total performance
of the embedded data-driven ﬁrewall processor,
155
layer 4 packet classiﬁcation, SPI for TCP pack-
ets, and URL ﬁltering were chosen and imple-
mented. Preliminary evaluation results of them
are summarlized in Table 1. This table shows
that our ﬁrewall processor potentially operates
at over 100 M b/s in wire-speed, even when us-
ing only a single processor. Furthermore, since
scalable performance increasing along with the
number of processors is also observed, a single
chip equipped with 10 processors could be ex-
pected to operate over 1 G b/s packet stream.
2.3 Conclusion
This paper presents the design concept of our
embedded data-driven ﬁrewall processor and de-
scribes its software structure and processor ar-
chitecture. Preliminary evaluation results show
that our ﬁrewall processor potentially operates
at over 100 M b/s in wire-speed, even when us-
ing only a single processor. Furthermore, since
scalable performance increasing along with the
number of processors is also observed, a single
chip integrated 10 processors could be expected
to operate over 1 G b/s packet stream.
The proposed embedded data-driven ﬁre-
wall processor is one of applications for our
data-driven network processor, DDNP. DDNP
scheme has a lot of excellent advantages such
as elegant parallel multiprocessing capability,
natural power-saving capability, easy-to-design
for SoC. Furthermore, the self-timed pipeline
scheme introduced for the development of
DDNP is the most promising scheme to re-
alize highly-functional hardware modules for
larger SoC’s. For example, autonomous priority-
based queuing chip utilizing a self-timed folded
pipeline has been developed for Diﬀserv queuing
[11]. This queuing hardware could be expected
to be integrated into DDNP to realize a traf-
ﬁc engineering processor. However, this kind of
large self-timed system needs eﬃcient develop-
ment tools such as high-speed simulator [12] or
emulator [13]. Those powerful tools as well as
DDNP applications have to be investigated in
our further work.
3. Upper Limit of Step Gain for
NLMS Algorithm in Noisy Envi-
ronments
With advancements in LSI technology, adap-
tive ﬁltering has recently been put to practical
use, and applied to noise cancellers, automatic
equalizers, echo cancellers and so forth. In such
applications, a fast adaptive algorithm is nec-
essary to actualize a real-time processing. For
example, it is required thousands of adaptive ﬁl-
ter’s coeﬃcients to realize an acoustic echo can-
celler. Therefore, complex algorithms are un-
suitable for such systems. Although many algo-
rithms have proposed and it has been improved
the convergence speed and estimation accuracy,
useful adaptive algorithms are only simple ones
as the LMS algorithm or the normalized LMS
algorithm. Since the normalized LMS (NLMS)
algorithm requires few operations, it is widely
used. However, this algorithm behaves unstably
when a norm of an input vector becomes close
to zero. The division in the procedure of the
NLMS algorithm causes this property. One so-
lution for this problem is addition of a positive
constant to a square norm of an input vector.
On the other hand, it is known that to interrupt
adaptive ﬁlter’s coeﬃcients update when a norm
of an input vector is smaller than a threshold is
one of another ways to stabilize behavior of the
NLMS algorithm. Though the stability is im-
proved by using this method, the threshold must
be decided to obtain a desired property. It has
been shown the eﬀects of interruption of adap-
tive ﬁlter’s coeﬃcients update and indicated a
guarantee value (least upper bound of the con-
vergence value) and the stochastic fastest con-
vergence step gain (SF-NLMS algorithm). How-
ever, coeﬃcients update interruption causes re-
duction of convergence speed. In this paper, the
upper limit of the step gain that satisﬁes the
guarantee value without interrupt coeﬃcients
update is shown.
Now, in this paper, (L × N ) matrix A and
(N × 1) vector b are described by ALN and bN
156
respectively.
3.1 Normalized LMS Algorithm
We deﬁne the notation for the sake of conve-
nience and review the NLMS algorithm.
Let us deﬁne an input vector
xN (i) = [x(i), x(i− 1), · · · , x(i−N + 1)]T ,
(1)
and the coeﬃcient vector of the adaptive ﬁlter
hN (i) = [h(1), h(2), · · · , h(N )]T , (2)
where N and T denote the number of the ﬁlter’s
coeﬃcients and the transpose of a vector respec-
tively. The output signal of the adaptive ﬁlter
is expressed as
y(i) = hTN (i)xN (i). (3)
Assuming that wN represents the coeﬃcient
vector of the unknown system, the desired signal
is given by
d(i) = wTNxN(i), (4)
and the output error signal is deﬁned by
e(i) = d(i)− y(i). (5)
The NLMS algorithm is shown by the follow-
ing:
hN(i + 1) = hN (i) + α
xN (i)
�xN(i)�2
e(i), (6)
where α is the step gain.
3.2 SF-NLMS Algorithm
In this section, we indicate the SF-NLMS al-
gorithm.
Now, we deﬁne the observed output signal as
d�(i) = d(i) + v(i), (7)
where v(i) is observation noise.
3.2.1 Guarantee Value
The criterion for interrupting coeﬃcients up-
date to stabilize behavior of the NLMS algo-
rithm is shown as
�xN (i)�2 ≤ ε, (8)
where ε is the threshold. If coeﬃcients up-
date is interrupted according to the above cri-
terion, the worst condition is to continue the
state �xN (i)�2 � ε. Thus the converged norm
of weight error vector (�θ�N (i+ 1)�2 = �h�N (i+
1) −wN�2) is smaller than the norm of weight
error vector under the following condition:
E
�
�xN(i)�2
�
= ε. (9)
The converged norm of weight error vector is
shown as
E
�
�θ�N(∞)�2
�
≈ α
2σ2v
Nσ2x
eµ
eµ − 1 , (10)
therefore, the guarantee value of the norm of
weight error vector is given by
q = α
2σ2v
ε
eµ
eµ − 1 , (11)
where σ2v is the variance of the observation noise
and
µ = 2α− α
2
N . (12)
3.2.2 Threshold of Interrupting Coeﬃ-
cients Update
From (11), the threshold to ensure the guar-
antee value is given by
ε = α
2σ2v
q
eµ
eµ − 1 . (13)
3.2.3 Stochastic Fastest Convergence
Step Gain
The probability of executing coeﬃcients up-
date is expressed as
P =
� ∞
ε
1
�
4πNσ4x
exp
�
−
��xN(i)�2 −Nσ2x
�2
4Nσ4x
�
d�xN(i)�2. (14)
The time constant τ is expressed as
τ = − 1P loge(1 − µ)
. (15)
The time constant is the number of samples to
decay to 1/e (e is the base of natural logarithm).
Now, we deﬁne the convergence speed as
c = 1τ . (16)
157
0
1
0
St
ep
ga
in
�xN(i)�2
ε
αF
Nσ2d
Proposed
SF-NLMS
Figure 4 Comparison of step gain.
Since the convergence speed c is concave at
0 < α ≤ 1, the stochastic fastest convergence
step gain sets the diﬀerential ∂c∂α to zero. The
diﬀerential ∂c∂α is given by
∂c
∂α = −
2 exp(λ)
N (α− 2)







�
α2 −
�
3 + ισ2x
�
α + 2
�
�
exp
�
− ιασ2x(α− 2)
�
+ exp(λ)
�2
exp
�
− ιασ2x(α− 2)
�
+ exp(λ)(α− 1)(α− 2)
�
exp
�
− ιασ2x(α− 2)
�
+ exp(λ)
�2







. (17)
The stochastic fastest convergence step gain αF
is shown as
αF =
2
�
λ− loge
� ι
σ2x
��
λ
�
1 + κσ2x
�
− loge
� ι
σ2x
� , (18)
where
ι = 0.85
√
2Nσ2v
q , (19)
κ = σ
2
v
q (20)
and
λ = 0.85
√
2N. (21)
3.3 Upper Limit of Step Gain for
Guarantee Value
This section modiﬁes the threshold for the
guarantee value, and shows the upper limit of
the step gain for guarantee value without coeﬃ-
cients update interruption.
3.3.1 Upper Limit of Step Gain
By eq.(11), the guarantee value when coeﬃ-
cients are updated is given as
q = α
2σ2v
ε
eµ
eµ − 1 . (22)
If N � 1, then the guarantee value q is rep-
resented by
q = Nσ
2
v
ε
α
2− α. (23)
To give assurance the guarantee value without
coeﬃcients update interruption, �xN(i)�2 ≥ ε
should be satisﬁed constantly. So, the threshold
ε is set
ε = �xN(i)�2. (24)
From eq.(23), the guarantee value q is shown as
q = Nσ
2
v
�xN (i)�2
α
2− α, (25)
and the step gain α to satisfy this guarantee
value is given by
α = 2q�xN (i)�
2
q�xN(i)�2 +Nσ2v
. (26)
In general the step gain α ≤ 1,
2q�xN(i)�2
q�xN (i)�2 +Nσ2v
< 1, (27)
or
�xN (i)�2 <
Nσ2v
q , (28)
if �xN (i)�2 ≥
Nσ2v
q then the step gain α = 1.
Consequently, coeﬃcients update method to
satisfy the guarantee value is shown as
hN (i+ 1) = hN (i) + γ(i)
xN (i)
�xN (i)�2
e(i), (29)
158
where γ(i) is the upper limit of the step gain:
γ(i) =





2q�xN (i)�2
q�xN (i)�2+Nσ2v
, if �xN (i)�2<
Nσ2v
q
1, otherwise
(30)
If the guarantee value q is assumed
q = σ
2
v
σ2d
, (31)
the step gain γ(i) when �xN (i)�2 <
Nσ2v
q is
given by
γ(i) = 2q�xN (i)�
2
q�xN(i)�2 +Nσ2v
= 2�xN (i)�
2
�xN (i)�2+Nσ2d
, (32)
where σ2d is the variance of the output signal
d(i). In such case,
�xN (i)�2 <
Nσ2v
q = Nσ
2
d, (33)
therefore, the step gain γ(i) when q = σ2v/σ2d is
given as
γ(i) =





2�xN (i)�2
�xN (i)�2+Nσ2d
, if �xN (i)�2 < Nσ2d
1, otherwise
(34)
Figure 4 shows a comparison of the step gain
of the proposed method (eq.(34)) and the SF-
NLMS algorithm[16].
3.4 Computer Simulation
In this section, the result of computer simu-
lation is shown.
The input signals and the observation noise
are speech signals that are voiced speech seg-
ments sampled at 8 kHz. S/N is set as 20 dB.
The impulse response length of adaptive ﬁlter is
100 (N = 100).
The stochastic fastest convergence step gain
of the SF-NLMS algorithm is set to 0.816. The
variances of input signals and desired signals are
shown, respectively, as
σ2x = (1− b)
∞
�
k=0
bkx2(i − k), (35)
-30
-20
-10
0
0 10 20
N
or
m
 o
f w
ei
gh
t e
rr
or
 v
ec
to
r [
dB
]
Time [s]
Proposed
SF-NLMS
(a)
-30
-20
-10
0
0 10 20
N
or
m
 o
f w
ei
gh
t e
rr
or
 v
ec
to
r [
dB
]
Time [s]
Proposed
SF-NLMS
(b)
Figure 5 Comparison of convergence properties.
σ2d = (1− b)
∞
�
k=0
bkd2(i− k), (36)
where b is set to 0.9999.
Figure 5 shows the convergence properties.
The proposed algorithm surpasses the SF-NLMS
algorithm in tracking ability.
3.5 Conclusion
This paper shows the upper limit of the step
gain that satisﬁes the guarantee value without
coeﬃcients update interruption. The proposed
method improves convergence speed on the SF-
NLMS algorithm. It is clear that the proposed
method is very useful for applying in acoustical
systems, case of unsteady observation noise is
inﬂicted in particular.
159
Acknowledgements
This research project is supported by the Na-
tional Institute of Information and Communica-
tions Technology (NICT).
Reference
[1] R. Zalenski, “Firewall technologies,” IEEE Po-
tentials, Vol. 21, No. 1, pp. 24–29, Feb. 2002.
[2] J. McHugh, A. Christie, J. Allen, “Defending
Yourself: The Role of Intrusion Detection Sys-
tems,” IEEE Software, Vol.17, No.5, pp. 42–51,
Sep. 2000.
[3] S. Ioannidis, A. D. Keromytis, S. M. Bellovin,
and J. M. Smith, “Implementing a Distributed
Firewall,” ACM Conf. on Computer and Com-
munications Security, pp. 190–199, Nov. 2000.
[4] H. Terada, S. Miyata, and M. Iwata, “DDMP’s:
Self-Timed Super-Pipelined Data-Driven Pro-
cessors,” Proceedings of the IEEE, Vol. 87,
No. 2, pp. 282–296, Feb. 1999.
[5] C. Payne and T. Markham, “Architecture and
Applications for a Distributed Embedded Fire-
wall,” 17th Computer Security Applications
Conf. (ACSAC), pp. 329–336, Dec. 2001.
[6] D. Morikawa, M. Iwata, H. Hayashi, and
H. Terada, “Superpipelined IP-Address
Lookups on a Data-Driven Network Proces-
sor,” Int. Conf. on Parallel and Distributed
Computing and Systems, pp.431–436, Aug.
2001.
[7] D. Morikawa, M. Iwata, and H. Terada,
“Super-Pipelined Implementation of IP Packet
Classiﬁcation,” Journal of Intelligent Automa-
tion and Soft Computing, Vol. 10, No. 2,
pp. 175–184, Aug. 2004.
[8] R. Zhang, M. Iwata, Y. Shirane, T. Asahiyama,
W. Su, and Y. Zheng, “High Speed Stateful
Packet Inspection in Embedded Data-Driven
Firewall,” NEINE’04, Sep. 2004 (to be pre-
sented).
[9] D. Morikawa, T. Matsumoto, and M. Iwata,
“Fast Packet Filtering in Data-Driven Embed-
ded Firewall,” NEINE’04, Sep. 2004 (to be pre-
sented).
[10] SoftEther Corp., “SoftEther VPN System,”
www.softether.com, ver. 1.01, Sep. 2004.
[11] M. Iwata, M. Ogura, Y. Ohishi, H. Hayashi,
and H. Terada, “100MPacket/s Fully Self-
Timed Priority Queue: FQ,” Int. Solid-State
Circuits Conf. (ISSCC) 2004, Session 8, No.1,
San Francisco, CA. U.S.A., Feb. 2004.
[12] S. Sannomiya, Y. Ohmori, and M. Iwata, “A
Macroscopic Behavior Model for Self-Timed
Pipeline Systems,” 17th Workshop on Paral-
lel and Distributed Simulation (PADS 2003),
pp. 133–140, SanDiego, CA, U.S.A., June 2003.
[13] S. Ogasawara, S. Sannomiya, Y. Omori, M.
Iwata, “An On-Chip Trace-Driven Emula-
tion for Self-Timed Data-Driven Processors,”
NEINE’04, Sep. 2004 (to be presented).
[14] J. Nagumo and A. Noda, “A learning identiﬁ-
cation method for system identiﬁcation,” IEEE
Trans. Automat. Contr., vol.63, pp.1692–1716,
Dec. 1975.
[15] M. Fukumoto, H. Kubota and S. Tsujii, “Im-
provement in stability and convergence speed
on normalized LMS algorithm,” roc. IEEE IS-
CAS ’95, vol.2, pp.1243–1246, Seattle, WA,
May 1995.
[16] M. Fukumoto, H. Kubota and S. Tsujii, “Sim-
pliﬁcation of stochastic fastest NLMS algo-
rithm,” Proc. IEEE ISCAS ’99, 38.7, Orlando,
FL, June 1999.
