I. INTRODUCTION
Asynchronous Transfer Mode (ATM) [ l ] is a high-speed packet switching technology for the broadband integrated services digital network (B-ISDN), in which voice, video and data are multiplexed and transferred over high-speed links. This technology has gained acceptance as the backbone high-speed network of the future, and also as the lower bandwidth link to homes and the desktop, e.g. for interactive multimedia services. It supports different service classes, such as Constant Bit Rate (CBR), Variable Bit Rate (VBR), Unspecified Bit Rate (UBR) and Available Bit Rate (ABR).
In a switch, cells which arrive at an output link faster than can be transmitted are stored in an output buffer. However, when too many cells are destined for a particular link, the output buffer can fill up and cell delay will increase; worse still, cells will even get discarded when the buffer overflows. Such cell delay and cell loss may seriously degrade the quality of users' communication sessions.
ATM traffic management or congestion control methods [2] are needed to ensure that quality of service (QoS) parameters such as cell delay and cell loss probability are within agreed limits. It is difficult to build an efficient traffic control system because of the diversity in multimedia traffic characteristics, especially since each user has different cell generation characteristics and different QoS requirements.
Mathematical analysis usually provide useful results only in steady state situations and when certain assumptions are satisfied. However, in real-world situations, traffic conditions change frequently and the environment is non-stationary. Thus, characterization of traffic sources and network behaviour is difficult, especially if they need to be done in real-time. Neural network approaches which do not require precise models of the network processes have been used with varying degrees of success for traffic management and congestion control [8] , [SI, [ 1 I] , [IO] .
CONNECTION ADMISSION CONTROL (CAC)
When a user wishes to establish a connection with another party, his terminal sends a connection set-up request to the Connection Admission Control (CAC) controller, during which it declares information such as the required QoS and its own traffic parameters, which describes the cell generation characteristics of the source, e.g. Peak Cell Rate (PCR), Average Cell Rate (ACR), burstiness and peak duration. The controller forms an estimate of the resulting network situation if the connection is accepted, and accepts the connection only if the estimate indicates that the QoS requirements of the new and existing connections, especially those using the CBR and VBR service classes, will not be violated.
There is usually a difference between the declared and actual traffic parameters, making QoS estimation through analytical methods difficult. The number of connections can be large, thus compounding the estimation error in each connection. Neural network approaches rely on the actual observed values of various quantities of interest to form a mapping between different data sets which can be used in traffic management algorithms. However, the neural network architecture, and input and output values, must be carefully designed so that the required mapping is correct and consistent.
Many approaches to CAC make use of bandwidth allocation methods which reserves the estimated amount of bandwidth required by a connection in order for the network to provide the required QoS. Typically, the amount of bandwidth allocated to a Variable Bit Rate (VBR) source should be less than its peak but greater than its average bit rate. The sum of peak rates of connections multiplexed onto a link can be greater than the link capacity as long as the sum of their statistical or average bandwidth allocation is less than or equal to the link capacity.
There has been much research in determining the equivalent or effective bandwidth needs of multiplexed sources [4], [5], [6] , e.g. in [4] , the effective bandwidth for on-off fluid-flow sources are derived for both individual and multiplexed connections. Many of these equivalent bandwidth determination methods are over-conservative, causing network resources to be under-utilised.
In certain cases, detailed knowledge of higher-order moments of the traffic which are difficult to estimate in real-time are also required. However, complex models whose parameters cannot be easily estimated are of little value. As a result of these difficulties, some researchers saw the potential benefits of using neural networks in traffic management.
MULTI-SERVICE CONNECTION ADMISSION CONTROL
In earlier work [ 151, we had investigated CAC for the case of a two node network with sessions whose sources generate traffic at the same bit rate, and all sessions have the same QoS requirement. A simpler version of the algorithm which will be presented later in this section was used. In this paper, we consider CAC for the more realistic multi-service case with different traffic sources, each having its own traffic characteristic and QoS requirement.
In our scheme, all connections are categorised into several categories according to the following parameters: 1. declared traffic parameters (PCR and ACR), 2. QoS requirements (the maximum allowable CLR), and 3. service types (e.g. CBR, VBR or ABR). Two priority queues are used at each output port of an ATM switch. When a cell arrives at a switch from an input link, the switch looks into its local routing table to determine which output link it should redirect the cell to. If the link is currently idle, the cell is immediately transmitted down the link. Otherwise, the cell awaits transmission in one of the priority queues, namely, the CBRNBR queue or the ABR queue, depending on which type of service the cell belongs to (see Figure 1) . Cells in the CBRNBR queue have priority over cells in the ABR queue, i.e. the ABR queue is served only when the CBRNBR queue is empty. In each priority queue, cells are queued in a first-in-first-out (FIFJO) fashion in the fixed-size buffer. In both cases, cells are discarded when the buffer is full regardless of which source they originated from. Each queue maintains a separate counter to track its respective CLR.
According to Hiramatsu [SI, the behaviour of cells from many connections which are multiplexed together in an ATM link can be characterised by the number of connected connections in each category, since connections belonging to the same category have almost the same cell generation characteristics. For practical reasons, the number of categories is usually less than 100. In our experiments, only the maximum allowable CLR is considered as the QoS required by the various sources.
The proposed CAC-controller uses the Hierarchical Mixtures of Experts (HME) [13] modular neural network architecture.
The HME predicts the resulting CLRs for both the CBRNBR queue and the ABR queue at an output port when a new connection that passes through this output port is accepted. Thus, the output vector of the HME is two-dimensional. An HME-based CAC-controller is implemented at each ATM switch. At each output queue of an ATM switch, the numbers of connected connections in each category ni for i = 1,. . . , k are determined, where k is the total number of traffic categories under consideration. These values become the components of a kdimensional vector which serves as the input pattern to the HME at a particular output port of interest.
For training, the input pattern is
which is the connection pattern at the zth output port at training instant t. For prediction, the input pattern becomes n:(t) = [n,l(t),n,z(t),...,n*j(t)+l,. be reused for training, thus speeding up the learning process of the HME. It is assumed that the number of connections do not change rapidly in order that the connection pattern observed at the training instant is responsible for the CLRs zVBR,z(t) and Z A B R ,~(~) observed over the past At interval. After using the current input pattern n, (t) and CLRs Z V B R ,~ (t) and Z A B R ,~ (t) (i.e. current training pattern) for training, training is repeated using previously observed training patterns selected from the pattern tables. The current training pattern is always used for training, unlike in [ 151 where training only takes place when the previous decision was to accept a connection. This is because the mapping between the input pattern and the targets in the current scheme is always consistent and independent of the previous decisions made. However, as there are multiple CLR requirements, the contents of pattern tables are different compared to those of [8] , [ 
151.
The observed training patterns cannot be classified simply into low-cell-loss-rate events and high-cell-loss-rate events as what is "high" to one category may be considered "low" to another category with a less stringent CLR requirement. In this case, two pattern tables are still used: one stores no-cell-loss events, while the other stores events with cell loss.
We define a special type of operations, administration and maintenance (OAM) cell, referred to as a "CAC-cell":
C AC( SE-CLRV, SE-CLRA,MSCV,MSCA) , where SE-CLRV : sum of estimated CLR along the path for CBRNBR queue, SE-CLRA : sum of estimated CLR along the path for ABR queue, MSCV : most stringent category connected along the path for CBRNBR queue, and MSCA : most stringent category connected along the path for ABR qucue. The CAC-cell is sent from the source to the destination when a connection setup request is received, in addition to other signalling cells, Upon receiving a CAC-cell, the destination returns it back to the source. Figure 2 outlines the behaviours at the source and destination, while Figure 3 outlines the behaviour at the switch. The algorithm for the HME-based controller is shown in Figure 4 .
Source behaviour:
istics.
1. Before sending a connection request, categorise the connection into one of the IC categories based on its declared QoS requirements and traffic characteraccept c6inection request else reject connection request else (Le. current connection is ABR) if (SE-CLRA < MSCA's CLR) accept connection request else reject connection request Destination behaviour:
1. Upon receiving each forward CAC(SEXLRV,SE..CLRA,MSCV,MSCA), r e m a backward CAC(SE-CLRV,SE-CLRA,MSCV,MSCA) to the source. Prediction requested for output buffer z:
1. The HME receives an input pattern representing the resulting connection pattern for the zth output port if the new connection request is accepted:
n:(t) = [n21(t),nz2(t),.. . n r J ( t )
2. Produce J V B R ,~ (t) and J A B R ,~ (t), which are the predicted CLRs for CBRNBR and ABR queues corresponding to this connection pattern.
1.
For each output port z E 2, where Z is the set of output ports belonging to the ATM switch, form an input pattern representing the current connection
2. Train the HME with the input-output pairs obtained from each output port z E Z, and store these observed training patterns into their respective pattem 3. Repeat training w i e M training pattems from the pattem tables. For each of these M pattems, the cell-loss table is selected with probability Pi, while Raining requested:
Thecorresponding targets are the CLRs l v~~, = ( t )
and ~A B R ,~(~) observed over the last At interval.
tables.
the no-cell-loss table is selected with probability (1 -Pi). A training pattern is then chosen randomly from the selected pattem table. Fig, 4 . HME prediction and training procedure.
On its return trip, the CAC-cell sums up all the estimated CLR at each forward output buffer, thus obtaining an estimate of the overall end-to-end cell loss when it reaches the requesting source. This scheme is robust against any re-routing that may occur as each HME-predictor only predicts the cell loss for its output buffer, instead of the end-to-end cell loss. The CAC-cell also records the category number of the connection with the most stringent CLR requirement along the forward path. Two additional rules are also incorporated into the switch behaviour (Rule IC in Figure 3) : 0 If the current ABR queue is very congested &e. buffer queue length exceeds the fast-down queue threshold DQT), reject connection request.
If the total sustainable cell rate (SCR) of all ABR sessions exceeds the specified maximum ABR bandwidth allocation, reject new ABR connection request. For the first rule, it is straightforward to understand why new connections should be rejected if the ABR queue is in danger of buffer overflow. The second rule aims at limiting the total sustainable bit rate used by the ABR sessions. This concerns the issue of fairness which will be explained shortly. To enforce the rejection of a connection request when either of the above two rules occurs, the parameter SE-CLRA will be set to 1, which is the largest possible CLR.
Upon receiving the backward CAC-cell, the way in which the admission decision is made for a CBRNBR source differs slightly from that of an ABR source. Due to priority queuing, a CBRNBR connection, if accepted, may cause cell loss for the existing ABR connections even though it may not cause any cell loss for the existing CBRNBR connections. Thus, it is not sufficient to look at the estimated CLR for CBFUVBR alone when a CBRNBR connection request arrives. In order to honour the agreed CLR requirements in the existing service contracts for all traffic types, a new CBRNBR connection can only be accepted if the total estimated CLRs for both CBRNBR and ABR connections do not violate the CLR requirements of their most stringent connections along the entire path.
The admission decision for ABR connections should not depend on the estimated CLRs of the CBRNBR queues. This is because the ABR service has a lower priority, and its acceptance should have negligible effect on the CBRNBR CLR (assuming that the switching fabric is fast enough). In addition, cell loss in CBRNBR queue is not always accompanied by cell loss in ABR queue. Therefore, a new ABR connection can be accepted as long as the total estimated CLR for ABR alone does not violate the CLR requirements of the most stringent ABR connection.
However, these decision rules may lead to the starvation of CBRNBR services. New CBRNBR connection requests may be persistently rejected when predictions indicate that the existing ABR sessions may not meet their CLR targets. But new ABR connection requests are accepted without any concern about existing CBRNBR connections. As a result, the ABR services may dominate the link utilisation such that the ABR output queues are always just below the onset of congestion, while the CBRNBR applications have little share of the bandwidth. This explains why a limit has to be imposed on the total minimum cell rate (MCR) of all connected ABR sessions, as implemented by Rule IC of the switch behaviour shown in Figure 3 .'
IV. EXPERIMENTAL DETAILS A. Network Topology
The network topology used in the experiments for the multiservice situation is shown in Figure 5 . serves as a bottleneck link for virtual paths (VP) 1, 3, 4 and 5 while link LS12 serves as a bottleneck link for VP 2.
B. TrafJic Sources and Parameters used
The CBR sources send cells at PCR for the entire connection lifetime. The ABR sources used are also assumed to be persistent.
There are many models for simulating VBR sources. A popular example is the interrupted fluid process (IFP) [7] , [4] , where both the burst period (ON) and the silence period (OFF) are drawn from exponential distributions. During the ON mode, cells are transmitted at PCR (see Figure 6 ). Parameters such as the mean burst length to^), the mean interval between bursts @ O F F ) , and the PCR, have to be specified. Table IV . Table I , it can be seen that all the three CBR categories have the same mean holding time and mean inter-arrival time. This implies that all the three categories will have the same average number of connections in the absence of a CAC controller. Any difference is therefore due to the decisions made by the CAC controller in its effort to guarantee CLR requirements of the connected sessions, Among the three categories, Category 3 has the largest PCR. Thus, a wrong decision which leads to an extra Category 3 connection to be accepted at the onset of congestion is more severe, compared to if the same were to happen for the other two categories. Nevertheless, a fair CAC scheme should not starve any of the categories. Table I1 shows that the mean holding time and mean interarrival time of the VBR categories are the same as the CBR categories. The implication described earlier for CBR categories applies here as well: a wrong decision for Category 5 is more severe. Both Category 4 and 5 use the IFP model described above. 
From Table 111 , it can be seen that the ABR categories have the same mean inter-arrival time as the other traffic sources. However, the mean holding time cannot be specified as it depends on the network load condition -transmission rates for the ABR categories are high when the network is lightly loaded, and low otherwise. Note that each ABR category comes with a MCR, which is non-zero. This implies that the ABR categories will transmit cells at a non-zero rate even in the event of network congestion.
As a result, CAC has to be applied for the ABR categories as well so as to guarantee the CLR requirements of the connected sessions. 
C. Experiments
In order to evaluate the performance of the proposed HMEbased CAC scheme, simulations using the following four different CAC schemes were carried out:
PCRA : a CAC scheme based on PCR allocation to the sources ACRA : a CAC scheme based on ACR allocation to the sources EB : an equivalent bandwidth method proposed in [4] , with some modification HME .-the proposed HME-based CAC scheme. 
1026
For the PCRA scheme, each CBR or VBR session requests for its PCR, while an ABR session requests for its MCR. A connection is rejected if there is insufficient bandwidth in any of the links in its path. This is an over-conservative scheme which is expected to have zero cell loss if all the sources conform to their traffic descriptions and if flow control mechanism for the ABR sources work well.
The ACRA scheme is similar to the PCRA scheme described above, except that each VBR session requests for its ACR instead of its PCR. In contrast to the PCRA scheme, the ACRA scheme is likely to accept too many connections and lead to high CLR. It will only work well under two circumstances:
if all traffic sources belong to the CBR service class, where PCR = ACR, if the switch buffers are much larger than the maximum burst sizes of the VBR sources (ignoring end-to-end delay and delay variation). For moderate buffer size and multi-service traffic sources, which is the case in the simulation experiments, it is expected that a large number of cells will be lost. The results for this scheme indicate that a VBR source requires a bandwidth allocation higher than its ACR in order to satisfy its CLR requirements.
The EB scheme uses the equations derived in [4] to determine the equivalent bandwidth of multiple VBR sources (Categories 4 and 5) when they are multiplexed together. and is only valid for sources belonging to the IFP model. For CBR and ABR sources, PCR and MCR are allocated, respectively. This agrees with intuition that the CBR sources have equivalent bandwidths equal to their PCR, since they do not exhibit any bit rate fluctuation. Also, the inclusion of the lower-priority ABR sources will not affect the service behaviour of the CBRNBR sources. This permits a fair comparison between the EB scheme and the other schemes.
For the proposed HME-based CAC scheme, the HME architecture comprises of a l-level hierarchy with 8-experts. All the weights of the gating and expert networks were initialised to small random values. The HME-based controller makes wrong control decisions initially when the HME is still learning the relationship between the traffic patterns and CLRs, but improves rapidly thereafter.
For each of the four CAC schemes, the simulation duration was 100s. In order to evaluate the performance of the different CAC schemes under rapidly changing traffic conditions, there were no ABR connection requests (Le. Categories 6 and 7) in the 0-10s and 20-30s intervals.
v . RESULTS AND DISCUSSION
As CAC decisions between two end-users is mainly influenced by the availability of bandwidth throughout the entire path, attention should be focused on the bottleneck links of the network. For the network topology used in the simulations, the major bottleneck link is LS34. Hence, the main concern is the performance at the output port of SW3 which serves the link LS34.
The performance of the four CAC schemes will be evaluated based on: (1) CLR, (2) link utilisation, (3) throughput, (4) total declared peak and mean bit rates of all connected CBRNBR sessions, and (5) the connection rejection rate (CRR) of different traffic categories. Item (4) demonstrates whether a scheme is biased against bursty VBR sources with high peak bit rates but low mean bit rates. Note that ABR source parameters are excluded from the total declared peak and mean bit rate computations as they do not have mean bit rates, and their peak bit rates are set to be equal to the link capacity. Table VI shows the CRR for each traffic category using different CAC schemes. In general, for each of the four schemes, the CRR increases for categories that have higher bit rate requirements. For instance, among the CBR categories, Category 3, which has the largest peak bit rate, also has the largest CRR. For the VBR categories, Category 5 has higher CRR than Category 4 since it has higher PCR and ACR than the latter. The same observation applies for the ABR services, where Category 7 has a higher CRR than Category 6 since it has a higher MCR. Note that the CRRs for the ABR sources are much higher than the other categories because the total MCR of all connected ABR sources cannot exceed 5Mbps in the simulations. It is also interesting to compare the CRRs between the CBR and the VBR traffic sources, while keeping in mind that all CBR and VBR have the same average connection arrival rates and connection holding times. Ideally, the VBR sources can benefit from statistical multiplexing and their CRRs should not be too different from those of CBR sources if their ACRs are comparable. Any vast difference in CRRs is thus an indication that the particular CAC scheme is biased against certain categories. Detailed comparisons will be made in the following discussions on individual CAC schemes.
A. PCRA
From Figure 7 (a), it is observed that no cell loss occurs throughout the entire simulation period as expected. This is because sufficient bandwidth has been allocated to each CBRNBR session to accommodate its PCR. However, as the VBR sources do not transmit at PCR most of the time, the overall bandwidth utilisation for the CBRNBR sessions is rather low. As can be seen from the link utilisation plot, a lot of bandwidth is wasted when ABR sources are absent. For the ABR sources, even though they are admitted based on MCR allocation, they transmit at rates higher than their MCRs most of the time as they scavenge on the abundant bandwidth left unused by the CBRNBR sources. As aresult, the PCRA scheme has the lowest throughput for the CBRNBR sources among the four schemes but achieves the highest throughput for the ABR sources (see Table V) .
From Figure 7 (a), it is observed that total declared peak bit rate is maintained below 15OMbps throughout the simulation. This is consistent with the fact that any new connection request which causes the total peak bit rate to exceed the link capacity of 15OMbps will be rejected by the PCRA scheme.
The narrow gap between the peak and mean bit rate plots indicates that the PRCA scheme tends to reject bursty VBR sources with high peak bit rates, eventhough some of these sources can actually be accommodated as a result of statistical multiplexing gain. This observation is confirmed by the CRRs shown in Table VI, which shows that Category 5 with the largest peak bit rate also has the highest CRR among the CBR and VBR sources. Its CRR is much higher than that of Category 3 (CBR), even though its ACR is lower than the latter. The total declared mean bit rate, on the other hand, is far below 15OMbps. The PCRA scheme obviously does not benefit from statistical multiplexing gain and is over-conservative in its bandwidth allocation.
B. ACRA
Under the ACRA scheme, a large number of cells from both CBRNBR and ABR sources are lost throughout the entire simulation period (see Table V and Figure 7(b) ). This is a consequence of its optimistic bandwidth allocation strategy which only considers the declared average bandwidth requirements and ignores short-term bit rate fluctuations which may lead to temporary buffer overflow. From the link utilisation plot, it can be seen that the CBRNBR link utilisation is close to loo%, thus leaving little bandwidth for the ABR sources. Table V shows that this scheme has the highest throughput for CBRNBR sources but the lowest for ABR sources among the four schemes. The penalty for the high throughput is the extremely high number of cells lost. These results show that each VBR source has to be allocated a bandwidth that is greater than ACR but less than PCR in order to achieve high throughput while meeting CLR requirements. The correct bandwidth allocation depends on the statistical multiplexing gain achievable, which is difficult to determine, especially for complicated VBR traffic types.
From Figure 7 (b), it is observed that total declared mean bit rate is maintained at approximately 15OMbps throughout the simulation period since bandwidth is allocated according to the declared mean bit rates. The gap between total declared peak and mean bit rates is the widest among the four schemes, and together with the CRRs shown in Table VI, show that the ACRA scheme is the least biased against bursty VBR sources as it is not concerned with the declared peak bit rate at all. simulation period, link utilisation reaches 100% very frequently. During the intervals 0-10s and 20-30s when ABR traffic is absent, extremely high link utilisation is achieved without any cell loss. During the other intervals when ABR traffic is present, few ABR cells and no CBRNBR cells are lost. From Table V , it is observed that the EB scheme achieves a high average link utilisation of approximately 94%. Thus, the EB scheme seems to be nearly perfect in terms of achieving high throughput and low CLR performance here. The total declared peak and mean bit rate plots in Figure 7 (c) also exhibit a wider gap than the PCRA scheme, indicating that it is not as biased against bursty VBR sources. As can be seen from Table VI, Category 5 only has a slightly larger CRR than Category 3. The total declared peak bit rate is close to 200Mbps most of the time, thus the EB scheme benefits a great deal from statistical multiplexing gain.
D. HME
As the HME-based controller starts with random internal weights in the HME, it makes some decision errors which lead to cell losses in the first half of the simulation period. In fact, these errors enable the HME-based controller to learn to recognise high-cell-loss-rate traffic patterns and improve its performance. The observations for different intervals within the simulation period are presented below (refer to Figure 7 (d)): 0-10s : This interval only consists of CBR and VBR traffic. After some initial cell losses, the HME learns and is able to maintain high link utilisation, occasionally reaching full link utilisation. 10-20s : When traffic characteristics change with the injection of ABR traffic, utilisation by CBRNBR traffic decreases slightly because new CBRNBR connections are rejected whenever the ABR queue faces congestion. The ABR sources utilise nearly all the remaining bandwidth that are not used by the CBRNBR sources. As no cell loss occurs in this interval, the HME is unable to learn any high-cell-loss patterns from the mixing of CBR, VBR and ABR traffic. 20-30s : With the absence of ABR traffic again, the CBRNBR utilisation increases, thus demonstrating its adaptability to traffic changes. However, there are occasional cell losses in this interval as weight values learnt during the 0-10s interval have changed slightly during the time the HME learns low-cell-loss patterns in the 10-20s interval.
30-60s .-When ABR traffic is introduced again, the HME attempts to adapt to the current traffic characteristics. Occasional cell losses occur for both CBRNBR and ABR queues as the HME searches for the decision boundary. This happens now because high-cell-loss patterns did not occur earlier during the 10-20s interval when there was a mixture of CBR, VBR and ABR traffic, and the HME did not have a chance to learn the high-cellloss patterns properly. 60-100s : This is the period in which the HME has become accustomed to the traffic characteristics and is performing well. As can be seen from Figure 7 (c), high link utilisation is achieved by the CBRNBR sources, occasionally reaching full link utilisation without incurring any cell loss. Zero cell loss is achieved even for the lower-priority ABR sources that have MCRs to satisfy.
Although a large number of cell losses for CBRNBR connections seem to have occurred (see Table V ), all these cell losses occurred during the first half of the simulation period when the HME has not yet learnt the traffic patterns accurately.
For the lower-priority ABR sources, the number of cells lost is low and occurred entirely in the first half of the simulation period. By the time the ABR traffic is injected, the HME internal weights are no longer random, so the HME requires less time to adapt to the new traffic characteristics. In contrast, the EB scheme had cell loss for ABR traffic even in the 80-90s interval.
The HME's throughput for the CBRNBR sources is approximately 5% lower than the EB scheme. This is because whenever cell loss occurs during the first half of the simulation period, it is followed by a period of relatively low link utilisation. However, such fluctuations do not occur during the second half of the simulation after the HME has learnt the traffic patterns properly.
From the total declared peak and mean bit rate plots, it is observed that the gap between the two plots is wider than for the EB scheme. Thus, it is not as biased against bursty VBR sources. Similar to the ACRA scheme, Category 5's CRR is lower than Category 3's CRR. The peak bit rate is maintained at a level close to 200Mbps during the last phase of the simulation, thus demonstrating that the HME-based scheme is able to benefit from statistical multiplexing gain.
We can conclude from these results that a trained HME-based CAC controller is able to achieve throughput and cell loss performance comparable or better than the EB scheme.
VI. COMMENTS
The key advantage about the HME-based scheme is that it does not require any assumption about the traffic model, whereas the EB scheme described in Section IV-C assumes that the VBR sources follow the IFP model. The performance of analytical methods is expected to deteriorate when traffic sources do not conform to the traffic models under which the methods are derived. The HME-based scheme can be applied even in the presence of complicated types of traffic which are either intractable or too computationally intensive for real-time implementation.
In experiments performed with a more bursty Interrupted Bernoulli Process (IBP) traffic model replacing the IFP model for Category 5 VBR traffic, the equivalent bandwidth (EB) method over-estimates the bandwidth required and exhibits a high CRR for such traffic. On the other hand, the same HMEbased CAC scheme as the one used above is able to adapt to this different traffic situation and continue to give high link utilisation with low CLR.
In the simulations, all multiplexer buffers make use of FIFO queuing. As a result, cell loss probabilities for all sessions sharing the same buffer are identical. When many sessions having different CLR requirements are multiplexed together, the switch has to ensure that the most stringent CLR requirement among all sessions is met. Hence, those sessions which do not require such stringent CLR performance enjoy better QoS than is required. Higher throughput can be achieved and more connections can be ~ 1029 accepted if each session only obtains the exact CLR performance which it requires. However, this can only be achieved by implementing some form of buffer management and scheduling policy at the switches which distinguishes between cells from different sessions.
VII. CONCLUSION
In this paper, we have proposed an HME-based CAC method which makes use of predictions of CLR at each switch in the path from source to destination in order make the CAC decision. It does not require prior knowledge of traffic characteristics in order to work well. Although the controller does not perform well initially as well as when traffic conditions change drastically, the fast learning ability of the HME enable quick adaptation and good performance is again achieved after a short time. Although the proposed method requires some numerical computation to be done at each switch, this can be performed at the required rates using inexpensive DSP or RISC processors.
In future work, we hope to harness the prediction ability of HMEs to determine the amount of bandwidth to allocate for different kinds of VBR traffic along the lines of the equivalent bandwidth methods, but with the ability to switch models automatically when necessary. HMEs can also be used to determine the values of parameters in traffic and network models which are too computationally intensive to be evaluated in real-time but which can be estimated on-line. 
Software Radios for Wireless Networking

Introduction
The recent rapid growth in wireless network technology has greatly expanded the capabilities of mobile computing devices. However, the multitude of wireless network standards hinders seamless interoperability by requiring different physical devices to interoperate with different networks. Not only do wireless LANs operate in different RF bands, but even those using the same band employ different coding, modulation, and network protocols. The implementation of network interface cards (NICs) in dedicated hardware limits the flexibility of these devices. Our approach to solving this problem is to implement as much of the processing as possible in software, allowing the NIC functionality to be dynamically modified. Our software system provides all Our approach does not rely on special purpose DSP processors, but rather on general purpose processors such as the Intel Pentium or DEC alpha. The rapid advances in p ocessor clock speeds have made cycles available on the fact that today's general purpose processors and operating systems were not explicitly designed to handle the constraints imposed by signal processing applications, we have succeeded in designing a system for implementing computationally intensive real-time signal processing applications on such platforms.
The next section describes the software architecture, and the partitioning of the network and radio functions. Section 3 outlines our general architecture for implementing software network interfaces. An example network interface with performance results is presented in section 4.
s l ale required to perform these real-time tasks. Despite the
Software Architecture
Today, wireless networks are statically specified by their built-in link and physical layer functions. In the future, we envision systems that can dynamically modify their functionality to interact with different systems and/or adapt to changing conditions. For example, a cellular base station could tailor its channel allocation and modulation scheme based on traffic and environmental conditions, and then indicate to each mobile unit what kind of radio to compile. However, this requires a well defined way of describing a communication system. The software radio architecture presented in this paper consists of several well-defined processing layers, which can be used to completely specify a wireless communications system.
The layering presented in section 2.1 is a refinement of the OS1 layering model [Tan88], which subdivides the existing Link and Physical layers. The signal processing involved in these layers can be naturally subdivided into a finer grain model, but has traditionally been lumped into one layer because of its implementation in dedicated hardware. For our purposes, however, this is too coarse. To interoperate with different networks, it may only be necessary to change small parts of the existing layers. For example, two different systems may employ the same modulation and coding but use different multiple access protocols, or a given system may only need to change the type of coding to dynamically adapt to changing channel conditions. To facilitate this flexibility, we would like to create new network interfaces by simply combining existing functional modules, rather than by writing a new piece of software for each NIC that encompasses all of the functions in the link and/or physical layers.
While layering provides an excellent framework for modularity, it can lead to an inefficient implementation [CT90]. Since many software radio applications involve intensive data manipulation functions, the overhead of a layered implementation can be quite significant. In the short term, our engineering approach has been to group layers together where necessary, but to insure that the grouping still uses our interfaces at its edges. The penalty here is that our ability to re-use software modules is at a coarser granularity when these grouped layers are implemented. Our approach to dealing with the tradeoff between modularity and efficiency is discussed in section 2.2.
Processing Layers
The definition of new layers is not something to be done lightly. Too many layers would result in a cumbersome programming model, so the layering must balance this cost against the flexibility to be gained. The layers were defined according to the design principles of the OS1 model [Tan88]. appearance of transmission medium that is free from errors.
The physical layer is concerned with transmitting bits over a communication channel. Spectrum control over the physical layer can be achieved through the use of line coding, the first sublayer of the physical layer. Unlike channel codes, lines codes are not concerned with errors, but rather controlling the statistics of the data symbols, such as the removal of baseline drift or undesirable correlations in the symbol stream. The desired parameters are determined by physical characteristics of the transmission medium.
The modulation sublayer is concerned with the transformation between symbols and signals. This not only includes traditional modulation functions such as QAM, but also channel equalization functions. It is tempting to define equalization as its own layer, but this would not be appropriate for two reasons. First, it would violate the layering principle that layer n on one machine carries on a conversation with layer n on another machine. In general, equalization is concerned with correcting for effects imposed by the channel, not by a corresponding function on the transmitter, so which to communicate. The second reason is that equalizers are an integral part of the transformation between signals and symbols, and therefore should be part of the same functhere is no comparable layer on the transmission side with tional layer as modulation techniques.
Finally, we have the multiple-access sublayer, which includes techniques such as TDMA and FDMA. The choice of multiple access technique is usually independent of the modulation technique, and therefore should occupy its own layer.
In general a given system may contain only a subset of the layers. However, one could think of such a system as containing all of the layers, with default functions in some of the layers that do not manipulate the data in any way. This model provides a cleaner way of thinking about a system design.
One of the goals of layering is to minimize the information flow across the layer boundaries. This is enforced by the implementation of clean interfaces. There are four different data types that exist at the interfaces of the software portion of the layered structure described above: bytes, bits, symbols and discrete signals. The data type required for each interface is indicated on the right-hand side of figure 1. The interface data structure also contains parameters relevant to the data type, such as sampling frequency and bits per sample in the case of discrete signals.
Integrated Layer Processing
A layered architecture provides functional modularity, but often at the cost of efficiency in the implementation. One approach is to use the layered architecture as a design tool, but to separate this model from the engineering of any particular application, where integrating layers can provide significant performance gains [CT90] ,[AP93]. However, we do wish to maintain some of the modularity of the layered model in the implementation, so that a given wireless NIC can be dynamically modified by changing only a small amount of code. The ability to dynamically incorporate different protocols is enabled by a layered implementation.
On the other hand, the processing involved is quite intensive, and it is often necessary to combine layers to achieved the desired latency or throughput characteristics. In particular, the layers involved with the processing of discrete signals involve many load and store operations because their data sets are typically quite large. For example, the waveform associated with a single bit in the above example requires a buffer of size BitPeriod*SamdingFrequency = 16
samples, and each sample requires two bytes of storage. A typical IP packet containing an ICMP packet generated by the "ping" application is 64 bytes. By the time each of these bits are framed and then modulated up to the IF frequency, the waveform requires over 20K bytes of storage. In order to balance the tradeoff between flexibility and performance we follow a few guidelines for implementing integrated layer processing:
0 Combine layers only if necessary to meet performance We wish to leave as many layers separate as possible, so it makes sense to start by integrating the layers that require the most processing time. Using valid interfaces still allows for modularity, albeit at a coarser level, and by separating the OS1 layers we leave open the ability to interoperate with other software or hardware systems that implement these layers.
Architecture
Our system architecture moves the analog/digital boundary as close to the antenna as possible and moves the software/hardware boundary right up to the wideband A/D converter. This increases flexibility by bringing more functions under software control. Since current A/D technology and available processors will not support the direct sampling of wide R F bands, our approach, as illustrated in figure 2, is to use hardware only to convert the deaired RF band to the IF frequency, then directly sample the wideband IF waveform and transfer these samples into host memory. All subsequent processing is performed in user-level software.
The introduction of even this minimal amount of hardware could significantly reduce our flexibility by locking us in to a single R F band. However, multi-band frontends are becoming available which allow the software to select the center frequency and width of an RF band in the range spanning 2 MHz to 2 GHz, and sample the specified band at a resolution of 12 bits'. Such frontends will allow for the construction of true multi-band, multi-mode software radios. The system described in this paper used a frontend that operates in the 2.5 GHz ISM band', which allowed the implementation of all functions in software, except for RF band selection.
From Analog IF to Software
There are two stages in the conversion of the IF signal to a software accessible form: digitizing the signal and transporting the digital samples into the host computer's main Wideband radio applications require that data be transferred into host memory at a very high rate. For example, to transfer a stream of 16-bit samples of a 10 MHz wide IF band (i.e. a minimum 20 MHz sampling rate) to the application, a 320 Mbits/sec data rate is required. Conventional workstations have two 1/0 bottle necks which had to be overcome. First, the available 1/0 ports on today's workstations cannot handle the required data rates. Second, the path through the operating system between a device driver and the application is inefficient [OUS~O] . To overcome these limitations, we developed a PCI-based 1/0 system which consists of two parts. The hardware component, the GuPPI (for General Purpose PCI I/O), physically connects the analog frontend to the workstation's 1/0 bus. The software component, several operating system additions, provide the means for the application to efficiently access the sample streams.
The GuPPI provides the ability to burst data between the analog frontend and main memory at near the maximum 1/0 bus rate. In order to accommodate the jitter in the availability of resources, we use memory to temporally decouple [TB96] the sample stream. FIFOs on the GuPPI, connected to the A/D and D/A converters, decouple the timing between the fixed rate domain of the analog frontend and the variable rate 1/0 bus without losing any samples. This effectively absorbs any jitter caused by the bursty access to the PCI bus. These functions are performed without significant intervention from the processor; the required processing overhead per sample is less than half a cycle.
The operating system support consists of a device driver for the GuPPI and several small additions to the virtual mem-'For example the Rockwell 95x family of wideband receivers. 2Constructed using evaluation boards from RF Micro Devices [RF 971. ory system, all for the Linux kernel. The total size of the code is just under 1200 lines, with the virtual memory system additions representing just 200 of those. Another important aspect of the additions is that they do not affect the performance or functionality of any part of the system not related to the GuPPI; all other applications run completely unperturbed. The device driver provides for the continuous transfer of data between the GuPPI and main memory while absorbing jitter due to the scheduling of the signal processing applications. Data buffers in the driver, which can be as large as several hundred pages each, allow us to store data when we don't have enough cycles available, and then catch up by processing the buffer faster than real-time when the cycles become available. The virtual memory additions provide low overhead, high-bandwidth transfer of data between the application and the device driver by eliminating the expensive data copying between kernel and user space.
Running on a 200 MHz PentiumPro running Linux with a 33 MHz, 32 bit wide PCI bus, the 1/0 system has been shown to support continuous sample streams at rates up to 512 Mbits/sec. The peak burst rates are 933 Mbits/sec for input and 790 Mbits/sec for output. Details on the implementation can be found in [Ism98] .
Once the data is in memory, the software must process the data to produce a network frame. We have chosen to interact with the operating system at the IP layer; the interface to the kernel's network stack is through our SoftLink device driver. This driver appears to the kernel as just another network device driver. However, instead of exchanging packets between a hardware device and the IP layer, the SoftLink driver exchanges packets between our user-level application and the IP layer. For reception, the processing application converts incoming IF data into an IP packet; this packet is then handed to the SoftLink driver, which passes it on to the IP layer in the kernel, just as the device driver for any network card does. For transmission, the SoftLink driver accepts IP packets from the network layer and hands them up to the user-level process. Here the packet is processed producing the IF waveform. This waveform is transferred, via the GuPPI, to a 12-bit D/A converter (AD9713) which outputs the analog IF waveform.
Example Network Interface
This section presents an implementation of a software NIC designed to be compatible with a commercial frequency hopping radio operating in the 2.4 GHz ISM band, employing FSK modulation and supporting a data rate of up to 625 kbps [Sha97I3. Parameters such as the FSK frequency deviation and the spacing of the hopping channels can be dynamically modified in software; the only constraints im3The NIC was designed to be compatible with the 2.4 GHz frequency hopping radio from GEC Plessey, model DE6003.
posed by the hardware are the width of the IF band and the R F band to which the signal is converted. The results reported here utilized a 4.8 MHz wide IF band sampled at 10 MSPS with 12 bit resolution and an R F band centered at 2.45 GHz4. The transmission system generates continuous phase waveforms at a sustainable data rate of 320 kbps while hopping 1000 times per second; the reception currently runs at a rate of 64 kbps, supporting the same hopping rate. The following is a description of the transmission and reception applications, together with a discussion of the design issues and an evaluation of the performance of the system. The sequence of processing modules for the transmission application is shown in figure 3 . The system interfaces with the host at the IP layer, through our SoftLink device driver.
Frequency
The first level of processing is the network framing. For this example, the packets were framed by inserting a start code and byte stuffing the data. A length code, indicating the total length of the packet including the stuffed values, was also inserted after the start code. The next module, representing the channel coding, takes the sequence of bits output by the network framing layer and performs byte framing, inserting start, stop and parity bits. Note that this system does not contain a line coding layer, which means that the symbols input to the modulation layer are actually bits.
The conversion of each symbol into a discrete signal is performed by the FSK module, representing the modulation layer. The multiple access technique is frequency hopping, which assigns the waveform to the appropriate IF frequency.
In this implementation we combined the modulation and multiple access layers, which resulted a significant computational savings. This allowed us to directly generate the IF sinusoid corresponding to the particular bit and hopping frequency, rather than generating a sinusoid for each bit, and then re-modulating that sinusoid to the appropriate hopping frequency. All of the possible transmission waveforms are known a priori. There are two possible waveforms, corresponding to 1 or 0 for each hop frequency. All of these waveforms can be precomputed and stored at startup, significantly reducing the computation required to produce the transmitted waveform. On a 180 MHz PentiumPro. 2.2 ps were required for producing the IF waveform corresponding to a single bit. This corresponds to a maximum possible transmit data rate of x 4 5 0 k b~s .~ The generation of continuous phase waveforms is fairly straightforward in software. The precomputed waveforms 4Since these results were obtained, our system has been shown t o support real-time processing of IF bands sampled at 25 MSPS. 5The measurements were obtained using the PentiumPro cycle counter. To insure that branch prediction did not lead to erroneous cycle counts, the serializing "cpuid" instruction was executed prior t o each reading of the counter. The overhead of this instruction, plus that of reading the cycle counter was quantified and subtracted out. are actually oversampled, and only a sub-sampled set, corresponding to the output sampling rate, are copied into the output buffer. The oversampling allows us to index in to the buffer to match the phase, and the pattern is treated as a circular buffer, allowing the generation of waveforms for any bit period. After copying the samples to the output buffer, the phase value is updated and used as the index for the waveform corresponding to the next bit. In a similar manner, we are able to maintain continuous phase between hops, even when the hop occurs in the middle of the bit.
Reception
As is usually the case, reception is considerably more complex than transmission, although the sequence of processing modules is essentially the reverse of the transmission system shown in figure 3. The receiver must detect the presence of a valid transmission and sync to it, as well as perform the inverse function of each of the transmission layers. Again, combining the parameters of the frequency hopping and the FSK demodulation, we constrained the receiver to look for one of the two valid waveforms at a given hop frequency. Separate functions were implemented to track the hopping sequences, lock onto a bit boundary and demodulate the bits. These bits are de-framed, and then the IP packet is extracted. The SoftLink driver then hands the packet off to the host IP layer for processing. The number of cycles required for each reception function is given in 
Real-Time performance
The system we have constructed is not a hard real-time system, since it is implemented on a multi-tasking operating system without explicit real-time support. In a hard realtime system, each task has a deadline for completion, and there is some mechanism to insure that this deadline is met. Rather than insuring real-time performance with the tight synchronous control over the processing that is typical of many DSP and digital hardware designs, we take an approach that is statistical in nature, Although the actual number of cycles available over any given period of time varies due to demands on the system, the buffering and temporal decoupling provided by the 1/0 subsystem allow In order to quantify the performance of our system, we introduce the notion of statistical real-time performance. We characterize the system by defining a probability that the work will be completed within the specified time limit, and specifying the action that is to be taken when the deadline is not met. The probability is determined by profiling the algorithm under the expected conditions on the work station. In this case the expected load was simply a Linux workstation running an X server, an NFS file system, and the usual network daemons (e.g. sendmail, amd, etc.). In another case, the expected load might involve other signal processing tasks, or significant user activity. Figure 4 shows the distribution of times required to extract one bit using the FSK demodulation function from the example of section 4. The probability that more than lops is required is less than 0.003 If we chose to stop the processing after lops and make an arbitrary decision as to the
Summary
In this paper, we have described an approach for building software wireless interfaces. We also described the design and implementation of one instantiation of that architecture, a NIC designed to interoperate with an existing commercial 2.4 GHz ISM band frequency-hopping spreadspectrum radio.
This implementation and our experience with it demonstrated some important facts about software NICs in general and It is feasible to build software NICs that achieve good performance, It is relatively easy, using our environment, to build software NICs. The code implementing this particular radio totaled only 520 lines of C++, Our architecture facilitates constructing devices that interoperate with existing systems, and Software NICs can ride the technology curve of commodity PCs. As PC's get faster, our wireless NIC will automatically get faster.
Our experiments did not illustrate one important potential advantage of software NICs, the ability to implement functionality that is not available through NICs that perform their signal processing on dedicated hardware. In the long run, this may be the most important advantage of all.
