Towards a generic programming model for network processors by Lee, K et al.
TOWARDS A GENERIC PROGRAMMING 
MODEL FOR NETWORK PROCESSORS 
Kevin Lee, Geoff Coulson, Gordon Blair, Ackbar Joolia, Jo Ueyama 
Lancaster University, Lancaster, LA1 4YR, UK 
Abstract-Network Processors (NPs) are emerging as a cost 
effective network element technology that can he more readily 
updated and evolved than custom hardware or ASIC-based 
designs. Moreover, NPs promise support for run-time reconfigu- 
ration of low-level networking software. However, it is notoriously 
difficult to develop software for NPs because of their complex 
design, architectural heterogeneity, and demanding performance 
constraints. In this paper we present a runtime component- 
based approach to programming NPs. The approach promotes 
conceptual uniformity and design portability acros a wide 
variety of NP types while simultaneously exploiting hardware 
assists that are specific to individual NPs. To show how our 
approach can be applied in a wide range of types of NPs 
we characterise the design space of NPs and demonstrate the 
applicability of our concepts to the various classes identified. 
Then, as a detailed case study, we focus on programming the 
Intel IXP1200 NP. This demonstrates that our approach can be 
effectively applied, e.g. in terms of performance, in a demanding 
real-world NP environment 
I. INTRODUCTION 
Network Processors (NPs) are an attempt by hardware 
vendors to fulfill the growing need for low-priced specialised 
network hardware elements that are more future proof than 
conventional custom hardware or ASIC-based designs, and can 
be applied in a wider range of situations (e.g. in networked 
devices, as edge-network routers and even in the network 
core). In addition, NPs are seen by some as potential vehicles 
for the deployment of active networking-derived technologies 
[ I ]  which exploit the potential of NPs for run-time software 
reconfiguration. Architecturally, NPs are multiprocessor-based 
hardware units that support a number of network ports and 
provide software-programmahle packet processing facilities. 
They have the ability to perform relatively complex packet 
processing at line speeds. 
There is a downside to current NP designs, however: they 
are notoriously difficult to program [2], [3]. This is because 
of their complex design (e.g. involving multiple processors, 
including both gcneral purpose and specialised processors; and 
multiple memory and interconnect technologies), their extreme 
architectural heterogeneity across vendors and products [4], 
and their demanding performance constraints. 
Therefore, NPs often exhibit richly-featured hardware de- 
signs that remain underexploited by software [SI; and their 
extreme heterogeneity tends to inhibit translation of software, 
software designs, or even skills across brands. The problem 
is exacerbated by the need for high performance and runtime 
reconfiguration, both of which add considerably to software 
complexity. In particular, because of their complexity, many 
NP software toolkits fail to provide any support at all for 
runtime reconfiguration. 
The aim of the research discussed in this paper is to 
develop a generic programming model for NPs that accom- 
modates complex architectures and architectural heterogeneity 
while also supporting design portability, high performance and 
runtime reconfigurahility. Our approach is based on a run- 
time software component model. This promotes conceptual 
uniformity and design portability across a wide variety of 
NP types while simultaneously exploiting hardware assists 
that are specific to individual NPs. It features a distributed 
runtime with low memory footprint, employs formally speci- 
fied interfaces, supports components written in different pro- 
gramming languages, and uniformly abstracts over different 
processor types and different inter-processor communication 
mechanisms without loss of performance. I t  also explicitly 
supports run-time reconfiguration of software. 
The remainder of the paper is structured as follows. In 
section 11, we characterise the design space of NPs as a basis 
for arguing the genericity of our approach, and also survey 
a number of existing programming models provided both by 
the manufacturers of various NP products, and by indepen- 
dent researchers. In section 111, we present our approach to 
programming NPs and show how this improves on existing 
approaches. Then, in section IV, we provide a detailed case 
study of the application of our approach to the Intel 1XP1200 
NP. Finally, in section V we offer our conclusions. 
11. NETWORK PROCESSORS 
A. Classification 
As mentioned, the field of NPs is notable for its great 
architectural heterogeneity. In general, however, it can safely 
he said that NPs universally provide programmable support 
for processing packets. and that this usually takes the form 
of one or more packer processors. These can be supported 
either on a single chip or across multiple chips. In addition, 
NPs universally support a number of MAC-level pons, some 
memory, and some form or forms of processor interconnect. 
In this section we attempt to capture the design space of 
NPs in terms of a small number of orthogonal dimensions. 
In particular, we have chosenfour key dimensions which. we 
believe, most usefully partition the NP design space. These 
are : 
0-7803-8783-X/O4/$20.00 0 2004 IEEE 504 
Fig. I .  The Intel IXP1200 (from [71) 
. the pucker processor dimension - the range of types of 
. the memory architecture dimension - the range of tech- 
. the irrterconnect dimension - the range of interconnect 
. the control and mariagemen! dimension - the degree of 
We also demonstrate how some prominent NP products 
map to this space. In so doing, we lay the groundwork 
for a discussion on how our component-based programming 
approach can accommodate the full diversity of NPs. 
I) The Packet Processor Dimension: Most NPs feature 
multiple packets processors, hut the nature of these can vary 
from CPUs with very general instruction sets to single-purpose 
dedicated units for, e.g., checksumming or hashing, which 
are not programmable. Furthermore, some NF's feature only 
one type of packet processor and others support a number of 
different types. 
For example, the Intel IXP1200 NP [6] (see figure I )  sup- 
ports a uniform set of six so-called microengines which serve 
as packet processors. These are 233-600Mhz CPUs whose in- 
struction set includes YO tolfrom MAC-ports, packet queuing 
support, and checksumming. They support hardware threads 
with zero context switch overhead and can he programmed 
either in assembler or C. The IXP1200 also includes a general 
purpose StrongARM CPU which serves as a controller and 
also typically performs slow-path operations. 
On the other hand, the Motorola C-Port [E] employs so- 
called chamel processors which are generic packet processors 
grouped in sets of four that share an area of fast memory. 
But in addition it supports a range of dedicated, non pro- 
grammable, processors that perform functions such as queue 
management, table lookup, and buffer management. 
As a third example, the EZChip NP-I 191 has no fully 
generic processors. Rather, it employs dedicated packet pro- 
cessors that perform specilic tasks such as parsing packets, 
packet processors supported by an architecture 
nologies and organisations of the memory provided 
technologies employed 
support for centralised control and management 
table lookup or packet modification. Although these are ded- 
icated to their given 'domain', they are quite flexible and 
programmable within that domain. 
2) The Memory Dimension: Memory is used in all the 
fundamental operations of a NP, including packet storage, 
table lookup. queuing and synchronisation. The properties of 
different memory types typically differ in terms of size and 
speed, whereas their organisation differs in terms of the degree 
of centralisation employed and the accessibility from different 
packet processors. 
Memory types and organisations greatly affect the structure 
of NP software. To deal with the memory organisation of a 
particular platform, the programmer has to choose the hest 
memory use strategy for a particular operation. For example, 
when creating a flow-table for high-speed connections an Intel 
IXPl200 programmer might choose on-chip scratch memory, 
whereas an IBM PowerNP programmer [IO] might use that 
architecture's high-speed internal S U M .  
3) The Interconnect Dimension: Different NPs provide 
different mechanisms for inter-processor communication such 
as shared registers, buses (of varying types), shared memory 
(perhaps a range of types that make different trade-offs he- 
tween capacity and speed), and dedicated channels. 
For example, the IXP1200 provides a fast bus for commu- 
nication between its microengines, MAC ports and memory. 
It also provides shared registers and a range of memory types 
(i.e. SRAM, SDRAM). The shared registers and memory 
are typically used together at the software level to realise 
inter-processor communication. The newer IXF2.400 NP from 
Intel also provides 'next-neighbour' registers that provide a 
dedicated interconnect between two 'adjacent' microengines. 
The Motorola C-Port employs shared fast memory for 
interconnection between grouped channel processors (as men- 
tioned above). It also employs multiple onboard buses for 
communication hetween these groups, and shared memory that 
is managed by a dedicated processor. 
Unlike the two examples above, the EZChip offers a very 
static and limited interconnect which arranges the packet pro- 
cessors in a strict pipeline topology. The Cisco PXF [ I  I ]  uses 
a variant of this approach: i t  offers multiple parallel pipelines 
and some capability for communication between pipelines. 
Clearly, these architectures are less flexible, although poten- 
tially faster, than the bus-based interconnects discussed above. 
4 )  The Control and Management Dimension: Apart from 
the genericitylspecificity of their packet processors, different 
NPs make different choices regarding centralisation/ decen- 
tralisation of control and management. For example. some 
NPs rely exclusively on external control in the form of a 
host workstation. Others (e.g. the IXPl200) incorporate a 
commodity CPU on the NP itself which runs an operating 
system, and others support sufficiently powerful and general 
packet processors that any of these can potentially serve as a 
locus of control and management. 
The IXP 1200's on-hoard StrongARM CPU runs a com- 
modity OS such as Linux. As well as handling slow-path 
packet processing, the StrongARM is responsible for loading 
505 
code onto the microengines and stopping and starting them as 
required. 
The Motorola C-Port, on the other hand, has no built-in 
centralised controller. Instead, it relies on a host workstation 
to load and supervise the operation of its ‘channel controller’ 
packet processors. Nevertheless, it is theoretically possible to 
dedicate one of the channel controllers to take the supervisory 
role, especially if fine-grained dynamic reconfiguration of the 
NP is a goal. 
Similarly, the EZChip relies on a host workstation for 
control and management. In this case, there is no alternative 
because dedicating one of the packet processors, even if 
possible (cf. their lack of generality), would introduce an 
unacceptable bottleneck in the pipeline. 
B. Sofhvare f o r  Network Processors 
The provision of software development environments for 
different NPs is almost as diverse as NP hardware architecture. 
In this section we examine both proprietary and research- 
derived programming environments and show that each is hard 
to generalise beyond the specific architecture at which it is 
targeted. 
In terms of proprietary software, we focus on programming 
models and development environments for the IXPIZOO and 
the IBM PowerNP. Information on the software environments 
used by other NPs is unfortunately hard to obtain without 
signing non-disclosure agreements. 
Intel’s MicroACE [I21 is targeted at the IXF’12OO and 
other Intel IXA products. In this model, proxy-like software 
elements (called active computing elements or ACES) on the 
IXPl Zoo’s StrongARM control processor are ‘mirrored’ by 
blocks of code (called microblocks) that run on microengines. 
Thanks to this mirroring, when the programmer loads a Stron- 
gARM element, the corresponding microblock is transparently 
loaded onto a microengine as a side effect. The microblock can 
choose to offload packets to its associated ACE for handling 
on the slow path. 
Although it provides a useful degree of abstraction, the 
MicroACE approach is limited to IXPl ZOO-like architectures 
that employ a tightly integrated control processor. Further- 
more, the model leaves linkages between microblocks implicit 
in the way the microblocks are written: is not possible to 
combine microblocks in unanticipated topologies or to exploit 
interconnect mechanisms other than those explicitly chosen 
by the microblock author. Also, the ACE approach cannot be 
used to perform dynamic software reconfiguration as it takes 
no account of the integrity of a running configuration: if a com- 
ponent is replaced, a neighbouring component will inevitably 
fail as components expect to interact directly. Eja NP [I31 is 
another commercial product targeted at the IXPIZOO, although 
it also runs on the IBM PowerNP series which is very similar 
architecturally (at least in  terms of our classification scheme) 
to the IXPIZOO. Rather than offer an abstract programming 
model like MicroACE, Teja focuses on the provision of an 
integrated tool chain and development environment. Although 
this eases the development of NP software it provides minimal 
architectural abstraction and therefore minimal design porta- 
bility. 
Turning to research-derived programming environments, 
NetBind [14] provides the abstraction of a set of packet- 
processing components that can be’bound into a data path. 
This is done by adopting the convention of a standard entry 
and exit instruction sequence for microblocks, and offering the 
capability to dynamically ‘morph’ jump instructions in these 
sequences so that execution is transferred to the entry point 
of the microblock to be executed next. This separates the raw 
functionality of a microblock from the way it is composed with 
others, and also gives the NetBind programmer the ability to 
dynamically reconfigure compositions of microblocks. 
NetBind goes beyond MicroACE in supporting flexible 
composition of microblocks, but i t  offers no abstraction over 
the N P s  memory organisation, interconnects, or over different 
sorts of processors (e.g. the microengines- StrongARM, and 
workstation host of an IXPIZOO-based router). It therefore 
offers no more design portability across different NPs than 
MicroACE. 
NF-Click [ 151 is another component-based programming 
model for NPs; it is derived from an earlier PC-based software 
router model called Click. Again, NP-Click has been primarily 
targeted at the IXP1200.It is founded on a much richer model 
of components than NetBind. While communication between 
NetBind microblocks takes place over low-level untyped entry 
and exit points, Click components have typed pons; and 
connections between ports can be designated as either ‘push’ 
or ‘pull’ which provides declarative control over flow of 
control and threading. In addition, NP-Click abstracts, to a 
degree, over the different memory technologies offered by the 
IXP1200 by providing keywords such as ‘global’, ‘regional’ 
or ‘local’ which cause the associated component to be auto- 
matically allocated an appropriate memory type. Furthermore, 
it provides low level abstractions such as mallocO andfree0 
to facilitate and manage the allocation of NP resources such 
as microengine LIFO registers. 
NP-Click does a much better job of abstracting NP architec- 
ture than NetBind, but it still falls short of providing a generic 
approach to NP programming. While it abstracts pmicular fea- 
tures of the IXPIZOO. it has no notion of abstracting arbitrary 
architectures in a principled way, and thereby encouraging 
design portability and transferable skills across NP types. That 
is, there is no necessary commonality between the abstractions 
provided over different architectures (e.g. NPs other than the 
IXPIZOO may not use LIFOs). In addition, NP-Click provides 
no support for dynamic reconfiguration. 
VERA [16], 1171 is an extensible software router archi- 
tecture that comprises three layers: the top layer provides 
the abstraction of a router, the boltom layer abstracts the 
hardware, and a middle ‘distributed operating system’ layer 
mediates between the two. The latter layer organises the 
available packet processors into a hierarchy of levels. Initial 
classification occurs on a ‘low level’ processor attached to 
the MAC-port, and if the packet requires further or more 
complex processing then a ‘higher level’ processor is used. 
0-7803-8783-x/M/$20.00 0 2004 IEEE 506 
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  
Fig. 2. lllustnlian of capsules and caplets 
This provides a high degree of abstraction, but i t  is heavily 
dependent on the IXP1200 architecture. For example, it is hard 
to see how this same abstraction of levels could be maintained 
on the EZChip N P  (see section 11). 
Apart from the work discussed above, additional research 
has focused on creating toolsets for specific NPs such as C 
compilers, simulators, debuggers and benchmarkers; some of 
this work is described in [IS], [19], [20]. Like Teja, however, 
this work focuses on making tools more usable rather than 
on providing NP-tailored programming models that promote 
design portability and transferable programming skills. 
Finally, the Network Processor Forum [21], a Industry 
consortium that aims to facilitate and accelerate the devel- 
opment of NP products, is starting to take an interest in NP 
programming interface standardisation. To date their focus has 
been on hardware level interoperability, but they have recently 
announced the formation of a study group that will define a 
software API for network-computing applications. However, 
it is not yet envisaged that this API will address low level 
programming of individual NP products. 
111. TOWARDS A GENERIC PROGRAMMING MODEL FOR 
NETWORK PROCESSORS 
A. Overview of the OpenCOM Component Model 
A high-level view of our proposed component model, called 
OpenCOM [22], is given in figure 2. This depicts the central 
concepts of components (the filled circles), capsules (the 
outer dotted box), cuplets (the inner dotted boxes), interfaces 
(the amall circles), receptacles (the small cups), and bindings 
(the implied association between the adjacent interfaces and 
receptacles). 
Components, Capsules and Caplets Components are 
encapsulated units of functionality and deployment that in- 
teract with their environment (i.e. other components) exclu- 
sively through interfaces and receptacles. Components cany 
negligible inherent overhead and can effectively be used in 
extremely fine grained compositions. Crucially, OpenCOM is 
a runtime component model meaning that (unlike, say, NP- 
Click) components can be dynamically deployed at any time 
during run-time. The locus of deployment is either a capsule 
or a caplet. Both of these concepts represents a scope for 
component deployment; the latter are sub-scopes of the former 
(they can he nested to arbitrary depth). In principle (if the 
deployment environment permits), caplcts can he created and 
destroyed at run-time. Different caplets can also host com- 
ponents written in different languages (e.g. to accommodate 
interpreted languages like Java; or to accommodate different 
machine languages where caplets run on different CPU types). 
Each capsule offers a simple run-time API for component 
lifecycle management (i.e. loading components into the cap- 
sule and instantiating and destroying them), and interface/ 
receptacle binding (see below). To accomplish loading, the 
model supports the notion of plug-in loaders. New loaders 
with different behaviours can be added at runtime, and they 
can be selected according to their particular properties. Exam- 
ples are given below. Importantly, the loading of components 
into a capsule can he requested by any component hosted by 
the capsule no matter which caplet is hosting it (this is referred 
to as third-parry deployment). 
Interfaces and Receptacles Interfaces are units of ser- 
vice provision offered by components; they are expressed in 
terms of sets of operation signatures and associated datatypes. 
For language independence, OMG IDL [23] is used as a 
specification language. As in Click, components can support 
multiple interfaces: this is useful in recognising separations of 
concerns (e.g. between base functionality and management). 
Receptacles are ‘anti-interfaces’ used to make explicit the 
dependencies of components on other components: whereas 
an interface represents an element of service provision, a re- 
ceptacle represents a unit of service requirement. Receptacles 
are key to supporting a third-party style of composition (to 
complement the third-party deployment referred to above): 
when third-party-deploying a component into a capsule, one 
knows by looking at the component’s receptacles precisely 
which other component types must be present to satisfy its 
dependencies. 
Bindings Finally, bindings are associations between a 
single interface and a single receptacle that reside in a common 
capsule (but not necessarily a common caplet). Similarly to 
plug-in loaders, OpenCOM also supports a notion of plug- 
in binders. Again, the idea is to give access to an extensible 
range of binding mechanisms with varying characteristics. See 
below for examples. As mentioned, the creation of bindings 
is inherently third-party in nature; it can be performed by 
any party within the capsule (i.e. not only by the ‘first-party’ 
components whose interface or receptacle participates in the 
binding). 
B. Applying OpenCOM in NPs 
We now consider how the above concepts can be applied in 
the diverse range of NP types characterised in section 11. First, 
the scoping-related notions of capsules and caplets are useful 
in distinguishing different processors and types of processors 
on the NP in a generic manner (cf. the packet processor 
dimension). For example, in an IXP1200, we might map a 
single capsule onto the entire NP, and sub-scope individual mi- 
croengines, and the StrongARM control processor, as caplets. 
The capsule runtime in such a mapping would reside on the 
StrongARM where it  could run in a standard operating system 
environment. An alternative mapping could encapsulate all the 
SO7 
microengines within a single caplet. Furthermore, a plug-in 
loader associated with this caplet could perform intelligent 
load balancing of components across microengines, thus pro- 
viding a higher level of abstraction than the first alternative. 
The notion of caplets is also useful in isolating untrusted code, 
which is important in active networking environments. For 
example, a Java sandbox could be isolated as a caplet. 
The IXP1200 is situated towards the ‘centralised’ end of 
the control dimension defined in section 11-A. In an NP with 
less centralisation, such as the Motorola C-Port or the EZChip, 
the capsule abstraction could span both the NP itself and its 
hosting workstation. In this case, the capsule runtime would 
execute on the host. Alternatively, the capsule abstraction 
could be restricted to the NP itself, and the capsule runtime 
could execute on one of the general packet processors, if 
present. This would be possible in principle on the Motorola 
C-Port, but not on the EZChip which has no general purpose 
processors. 
The pluggable loader concept is closely associated with 
capsules/caplets. Typically, at least one loader will be provided 
for each type of caplet, and each will know how to load 
components into the hardware (and/ or language) environment 
underlying its particular caplet type. For example, on the 
IXP1200 mapping referred to above, there would be (at least) 
one loader for the StrongARM caplet and another for the 
microengine caplets. Importantly, the OpenCOM API allows 
selective transparency in the use of loaders. If full loader- 
selection transparency is desired, one can issue a call such 
as load(component-cl. caplet_/) which will deduce an appro- 
priate loader type from meta-data attached to component-cl, 
and use this to load the component into the designated caplet. 
This essentially masks the fact that different components 
may be implemented in different machine languages. Even 
more transparency can bc requested by issuing a call of the 
form load(componentr1) which causes the runtime to load 
compunenr.cl into a default capsule using a default loader. 
Alternatively, one can opt for complete control and zero 
transparency by issuing a call of the form load(component.cl. 
caplet.1, loader3).  
other hand, if  i t  is important to select a particular mechanism, 
we can say bind(interface.1, receptacle.15, louder-4). And so 
on. 
Note that the abstract model of binding provided by the 
pluggable binder framework is consistent across all types of 
NP regardless of the nature and diversity of the interconnects 
between packet processors. For example, it can uniformly 
accommodate the fixed hardware channels supported by the 
pipeline-oriented EZChip, or the bus and shared memory 
interconnects of the Motorola C-Port, in just the same way 
as the various mechanisms supported by the IXP1200. Of 
course, different NP architectures may impose constraints on 
the form of possible bindings. For example, i t  would not be 
straightforward to directly bind components on non-adjacent 
processors on the EZChip NP; although even here it  would 
be possible (if perhaps undesirable) to provide a plug-in 
binder that implemented this type of binding by transparently 
instantiating a forwarder on the intermediary processor(s). 
The component concept alone is capable of providing con- 
siderable abstraction power in terms of accommodating dedi- 
cated non-programmable processors such as those provided by 
the Motorola C-Port. These processors can be accommodated 
by representing them with a ‘dummy’ component and an 
associated special plug-in loader and binder pair. Loading 
the component and binding it to the client component has 
the effect of making the service provided by the dedicated 
processor (e.g. table lookup) look as if it were a normal 
software component. 
A final crucial property of the component model is its 
radically third-party nature in terms of loading and binding. 
Thanks to this, a component on an IXP1200 microen,’ Dine can 
load and bind two components on the StrongARM control 
processor, or even on the host workstation, if that comes within 
the scope of the capsule. 
Note that in this paper we omit, for lack of space, any 
discussion of the important OpenCOM notion of component 
frameworks which are used to support safe dynamic software 
reconfiguration. Information on this topic is available in the 
literature [221. 
The pluggable binder concept is equally central to the 
comoonent model’s abstraction Dower. In this case, the abstrac- IV. CASE STUDY: OPENCOM ON THE INTEL I x P l 2 0 0  
tion is over the interconnect dimension. For example, on the 
IXP1200 we can encapsulate the NetBind binding mechanism 
(see section 11-B) as a plug-in binder that is competent to 
bind components that have been loaded into a common caplet 
that represents a single underlying microengine. But equally 
well, we can provide a binder that is competent to bind 
components on different microengines (e.g. based on a shared 
memory or a next-neighbour register mechanism), or even 
between components on a microengine and components on the 
StrongARM. Again, the use of plug-in binders is selectively 
transparent. If we don’t know or care in which caplets our two 
target components are located, we can say bind(interface.1, 
recepIacle.15) and an appropriate loader will be selected 
according to location-rclated meta-data attached to the com- 
ponents that own the specified interface and receptacle. On the 
For the past year we have been working to deploy and eval- 
uate the OpznCOM component model on the Intel IXP1200. 
The IXPIZOO was selected because of its open and well 
documented architecture, and because it is a richly-fcatured 
NP in terms of the dimensions presented in section 11-A. 
To generate useful components with which to populate the 
implementation, we have taken as our starting point various 
modules (e.g. classifiers, forwarders, schedulers etc.) provided 
by the NetBind project [I41 from Columbia University. We 
have transformed these bare modules, which are written in C 
or assembler, for both the StrongARM CPU and the micro- 
engines, into standard OpenCOM components by attaching 
appropriate meta-data (e.g. IDL interfaces. and loader and 
binder attributes) to produce standardly-packaged and deploy- 
able units. 
0-7803-8783-x/04/$20.00 0 2004 IEEE 508 
The mapping we currently employ of OpenCOM capsules 
and caplets to the IXP1200 involves a single capsule that en- 
compasses both the NP and the host workstation, and contains 
separate caplets for: the host workstation (actually, a single 
Linux process on the workstation); the SuongARM (again, 
a single Linux process); and each of the six microengines. 
The OpenCOM runtime runs in the StrongARM caplet; all the 
other caplets are ‘slaves’ of this ‘central’ runtime and incur 
only minimal memory overheads (see below). The memory 
footprint of the central runtime itself is of the order of 
300Kb, although we believe that there is considerable scope 
for reducing this. 
The central runtime in the StrongARM caplet communicates 
with the other caplets by means of so-called caplet channels. 
The role of these is to bootstrap plug-in loaders and binders 
associated with non-cenual caplets, and to support commu- 
nication between their two parts: such loaders/ binders are 
implemented as a ‘delegator’ part that resides in the central 
StrongARM caplet, and a (minimal) ‘delegate’ part that resides 
in the other caplet. As examples, we now briefly describe 
example loader and binder plug-ins that are associated with 
the microengine caplets. 
The microengine louder plug-in is of interest in that it 
provides the illusion of dynamic loading despite the fact that 
the microengine hardware only allows modification of its 
instruction store when the CPU is stopped [12]. The basic 
capability provided by the microengine hardware is to stop 
the microengine, read/ write arbitraty instruction store loca- 
tions, and then restart it at a hard-wired address. To achieve 
transparent dynamic loading it is therefore necessary for the 
loader to not only load the new component hut also to patch 
the (hard-wired) restart address so that subsequent execution 
resumes at the point it left off. The loader also has the ability 
to autonomously move code around within the instruction store 
to avoid memory fragmentation as components are loaded and 
unloaded. The loader is also of interest in that i t  constrains 
the form of OpenCOM components it is willing to load. The 
general notion of particular loaders somehow restricting the 
components they can work with is a general and powerful 
pattern in OpenCOM. In the present case, the IDL interfaces 
of loaded components are restricted to supporting operations 
that accept and return a single integer. This restriction, which is 
enforced by inspecting the component’s IDL meta-data at load 
time, is imposed partly to simplify the design of the associated 
hinder (see below), and partly because the assumed model of 
component composition on the microengines (borrowed from 
above-mentioned NetBind work) is that components are bound 
into a more-or-less linear sequence and cooperatively work 
on queues of network packets whose addresscs are passed as 
integer arguments. 
Our intra-microengine binder plug-in is strongly coupled to 
the loader just described. It builds on the above-mentioned 
NetBind technique (see section 11-B) of creating bindings 
by ‘morphing‘ jump instructions. However, the hinder is 
more complex than the NetBind implementation because, 
together with the loader, it supports multiple instantiations 
of components (unlike NetBind which only supports single- 
ton components). The single argument and return value are 
passed via a designated register. The necessary entry and exit 
point information is obtained from IDL meta-data attached to 
the packaged component, which is transformed from relative 
offsets to absolute offsets by the loader. It is important to 
notice, by the way, that the IDL-specified interfaces do not 
incur performance overhead. In fact, the overhead of the binder 
in calling a null operation with no arguments or return values 
is only five (I-cycle) instructions. These subsume passing on 
the stack a pointer to the per-instance state vector of the 
called component, and the return address. Note that NetBind 
incurs an overhead of just two I-cycle jump instructions (for 
the call and the return). But this is because NetBind does 
not support multiple instantiations of components. Crucially, 
however, we could easily retrieve the NetBind performance 
in the OpenCOM environment simply by implementing and 
installing an new hinder plug-in that understands components 
that observe the NetBind calling convention and (therefore) 
does not support multiple component instantiation. The es- 
sential point is that OpenCOM’s plug-in architecture enables 
us to support any appropriate trade-off. More generally, it is 
important to observe that the performance of the OpenCOM 
programming model as a whole is almost entirely dependent 
on the performance of the binding mechanisms used. Almost 
all the value-added features of OpenCOM are confined to the 
central runtime and do not ‘get in the way’ when components 
communicate with each other on the NP‘s fast path. 
Apart from the microengine loader and binder discussed 
above, we are currently implementing loaders that load com- 
ponents into S t r o n g A M  and host workstation caplets; and 
binders that bind components across any pairwise combina- 
tions of the three caplet types. Bindings between the micro- 
engines and the other two caplet types are considerably more 
complex than intra- and inter-microengine bindings as they 
require stubs and skeletons to map the parameter and return 
value to a bus packet. To minimise memory overhead, the 
microengine-side stubs/ skeletons can be hand coded rather 
than generated automatically from the IDL specification. 
v. CONCLUSIONS A N D  FUTURE WORK 
In this paper we have characterised the design space of 
NPs and proposed a component-based programming model 
that, we have argued, can be applied generally within this 
design space. The component model, mainly through its plug- 
in loaders and binders and its associated notion of caplets, 
provides a high degree of design portability and potential 
for skill transfer. We have also demonstrated how plug-in 
loaders and hinders can exploit NP-specific features to provide 
both high performance (for example, our microengine binder 
incurs comparable overheads to NetBind on the IXPl200), and 
value-added behavior (for example, our microengine loader/ 
hinder supports multiple instantiations of components and 
transparently optimises instruction store use as components 
are dynamically loaded/ instantiatedl destroyed). 
509 
Most importantly, we have argued that our abstractions 
are generally applicable. NP-Click also abstracts Np.specific 
features e.g. it provides an to manage and allocate 
microengine LIFO resources on the IXP1200. But this API 
would ,,,de no on an ~p that did not support ~ 1 ~ 0 ~ .  
The OpenCoM approach would be to provide a Plug in 
binder (a generic abstraction) that internally uses, manages 
1141 A. T. Campbell. M. E. Kounavis. D. A. Villela J. B. Vicente, H. G. 
de Meer, K. Miki, and K. S. Kalaichelvan. NetBind: A Binding 
Tml far Constructing Dam Paths in Network Processor-based Routers. 
In 5th IEEE Intemorionol Conference on Open Architectures (OPE- 
NARCH'OZJ, June 2002. 
(151 N. Shah, W. Plishker, and K. Keutzer. NP-Click APrognmming Model 
for the Intel IXP1200. In 2nd Workhop on Nerwork Pmcerrors (NP- 
2 )  at the Yfh Inlemotiond Sympmium on High Pecfirmnce Computer 
Archilecture (HPCA-9). Anaheim. CA, F e b m q  2003. 
1161 S .  Karlin and L. Peterson. VERA An Extensible Router Archilecture. 
In 4th lntemational Conference on O ~ e n  Architectures and Newor k and allocates ~ 1 ~ 0 ~  (if present) to build a binder . .  plug-in. 
OpenCOM also supports run-time reconfiguration. In this 
paper we have discussed the hasic mechanisms behind this 
(i.e. receptacles, and the ability lo hind and unbind components 
at runtime). But we have not elaborated on OpenCOM's ap- 
proach to managing integrity, consistency, safety and security 
when performing reconfiguration operations. As mentioned, 
we rely on the notion of comporrentframeworks [22] to support 
this. We have already explored the provision of component 
frameworks in other domains in which we have applied 
OpenCOM (e.g. Middleware [24]); one aspect of our future 
work will be to further explore this interesting and demanding 
area in the NP  domain. 
The main thrust of our future work, however, will be to 
explore the use of OpenCOM in other NP environments. We 
are already investigating the more advanced IXP2400 from 
Intel and the IBM PowerNP hut we would also like to provide 
further evidence for the generality of our approach by looking 
in more detail at NPs elsewhere in the design space outlined 
in this paper. 
REFERENCES 
[I] L. Ruf, R. Pletka, P. Emi. and P. Dmz. Towards High-Performance 
Active Networking. Pmceedinp </' the F$th A w "  Inleniutinriui 
Working Conference "11 Active Nerworkr (IWAN 2W3). December 2003. 
Kyoto. Japan. 
[2] C .  Kulkmi. M. Cries. C. Sauer, and K. Keutzer. Programming 
Challenges in  Network Processor Deployment. In bit. Cmlference on 
Cnmpi1er.r. Archirecture. orid Synlherirfor Embedded S y s t e m  (CASES). 
October 2003. 
[3] Agere Systems Proprietq.  The Challenge for Next Generation Network 
Pracessors. April ?001 
[4] M. Tsai. C .  Kulkami, C. Sauer, N. Shah. and K. Keutrer. A benchmark- 
ing methodology for network processors. In Irr Workshop on network 
pmcerrorr ulon,ng wirh HPCA 2W2. February 2002. 
[5] P. G. Pauli". Network Processors: A Perspective on Market Require- 
ments. Processor Architectures and Embedded S M I  Tools. STMicmeler- 
troni<;.y, 200 I .  
hire1 Techrioloiv Joz~mtl. Volume 6. i . w e  3. Aueust I5 2002. 
[6] M. Adiletta et al. The Next Gcnemtian of Intel Network Pr&esson. 
[7] Radisys Corp&lion. IXP1200 White Paper: U& the Intel IXP1200 
Network Proccssor to Optimize Packet-Processing Application Devcl- 
opment. 2001. 
[B] Motorola Research. Architecture guidc. C-5dC-3r Network Processor, 
Silicon Revision BO. 2003. 
[Y] EZchip Technologies. Network Pmcessar Designs for Next-Generation 
Networking Equipment White Paper. 2003. 
IBM PowerNP Network Processor: Hardware Soft- 
ware and Applications. ISM Jounml ,/' Re.wrch urzd Dewloprneat. 
47(2/3):177-194, MarchJMay 2003. 
[ I l l  Cisco Systems Inc. Parallel Express Forwarding on the Cisco I0000 
LIZ] Intel Press. MicraACE. Design Document, revision I .O. /rite/ Pie.~s 
[ I 31  A. Drshpande. K. Crozier. and M. Bainrs. The Teja Software Platform 
for Network Processon. 
[IO] J. Allen et ill. 
senes. 2003. 
/,,rei c<,rp,,rati<,,L 200 I 
~ m ~ ~ ~ ~ ~ ~ " i n g  (OPENARCH), ~ p r i i  2001. T. Spalink. S. Karlin, L. Peterson. and Y. Gottlieb. Building a Robust 
Software-Based Router Using Network Processon. Oct 2001. 
J. Wagner and R. Leupen. C Compiler Design far an Industrial Network 
PmCessor. Proceedings oj the ZWI ACM SIGPLAN Workshop on 
Optimization of Middleware and Dirrribated Systems, 2001 
G. Memik, W. H. Mangiane-Smith, and W. Hu. NetBench A Bench- 
marking Suite for Network Processan. ICCAD, 2001. 
M. Cries, C. Kulkami. C. Sauer, and K. Keutzer. Comparing Analytical 
Modeling with Simulation for Newark Processon: A Case Study. 
Design. Asmmrrrion. ond Terl in Europe (DATE). Munich. Gernlrnry, 
March. 2003. 
Working Group, Network Processing Forum. Network Processing Forum 
Backgrounder, O n  2002. hltp://www.npfarum.org/. 
G. Coulson. G. Blair, D. Hutchison, A. Jaolia. K. Lee, J. Ueyama, 
A. Comes, and Y. Ye. NETKIT: A Software Component-Based Ap- 
proach to Prognmmable Networking . In ACM SIGCOMM Compeler 
Object Management Group. Inc. CORBA 3.0 - IDL Syntax and 
Semantics chapter. formal/O2-06-07. 
G. Caulson. G .  Blair. M. Clarke, and N. Parlavantws. The Design of 
a Highly Configunble and Reconhgurable Middleware Platform. ACM 
Distributed Comprrrinfi Joumul. 15(2): 10Y-1?6, April 2002. 
~ ~ ~ ~ ~ ~ ~ i ~ ~ t i ~ ~  i ~ ~ ,  33. NO 5, october 2003. 
0-7803-8781-X/04/$20.00 0 2004 IEEE 510 
