For the implementation of multipoint-to-point connections in ATM, various approaches exist, each with its own advantages and disadvantages. VP-based methods require unique sender identi cation but they do not require reassembly in merging points. In contrast, VC-based methods do not require unique sender identi cation but they do require reassembly in merging points. It is likely that VC merging will be the method of choice as it is scalable and yet relatively simple to implement. One of its drawbacks is the increased output bu er space required at the switches because of packet reassembly at the merging points. This paper investigates the impact of the switch architecture and characteristics on the output bu er space by means of simulation. The results obtained demonstrate that for typical switch architectures, VC merging does not require signi cant additional bu ering compared to VP merging.
Introduction
In current A TM networks, there exist only point-to-point pt-to-pt and point-to-multipoint pt-to-mpt connections. For the interconnection of routers across an ATM network as well as for many other information-gathering applications, multipoint-to-point mpt-to-pt connections appear to be more appropriate. Interconnection of N routers requires order N 2 labels for the order of N 2 pt-to-pt connections as described in 1 . With mpt-to-pt connections only N labels for the N associated connections are necessary. This signi cantly reduces the required label space and thus makes the method more scalable. The same new ATM connection type could also be used in the context of merged connections for MPLS 2 .
To implement mpt-to-pt connections, di erent solutions are possible. We focus on the two most important methods: VP merging and VC merging.
VP merging: Each sender is assigned a globally unique identi er having the format of a V CI. The identi er is carried in the VCI eld of the ATM cell. The ATM switch translates incoming VPIs for the same destination to the same outgoing VPI. The receiver distinguishes amongst the di erent sources by the di erent V CIs. The key advantage of this scheme is that no VCC resources are required in the switching nodes as only VP switching is performed. This implies no change of hardware but only a change of the connection establishment protocol. Some of the disadvantages of VP merging are the lack of scalability caused by the VPI address space limitation of 4096 entries and the need for a global VCI uniqueness" protocol. There are proposals to circumvent the nonscalability b y enlarging the VPI address space at the expense of VCI address space. This is not desirable, however, because it requires changes in the switching hardware. VC merging: This method avoids the requirement for globally unique sender identiers, and it consumes only one VCI per traversed link. These characteristics make this approach scalable. The ATM switch translates incoming VCIs belonging to the same connection to a single outgoing VCI. This means that cells of packets belonging to different senders could be interleaved. As the receiver is not able to distinguish cells from di erent senders, packet reassembly has to be performed at the merging points, and all cells from a given packet must be sent contiguously so that reassembly at subsequent merging points and at the receiver will be possible. AAL 3 4 would solve the problem by i n troducing the Message Identi er MID eld for sender identi cation in every cell. The use of AAL 3 4, however, has other drawbacks such as the limited space of the MID eld, the ine cient encapsulation method, and the less powerful CRC capability. In this paper we consider the employment of AAL 5 because it is widely available and supported in ATM switches, especially in data networks. Packet reassembly at the merging points introduces additional bu er requirements on the switching architecture because all of the cells of a packet sent b y a sender belonging to a mpt-to-pt connection have to be stored and must wait for the last cell of the packet identi ed by the End Of Packet" EOP marker used by AAL 5 to arrive. Figure 1 depicts the cell interleaving problem. Packet reassembly also introduces additional delay for packets transported over a merged connection and adds burstiness to the tra c. This is because all the cells of a packet have t o w ait at every merging point. They appear afterwards as a burst of a whole packet at the output link. This burstiness becomes even worse as it is often cascaded and thus accumulated over numerous merging points. Reference 3 gives some hints about how to solve the problems involved in mpt-to-pt VC merging. A third possibility is to handle a mpt-to-pt connection of N senders to one receiver like N pt-to-pt connections without applying any merging. This possibility again requires order N 2 labels for the order of N 2 pt-to-pt connections. Of the above possible solutions, VC merging appears to be the method of choice as it is relatively easy to implement and yet scalable. At the ATM Forum, VC merging has been accepted and will be introduced in the PNNI v2.0 speci cation expected to be nished in the autumn of 1998. The only concern is with the reassembly required in the switches in terms of additional bu ering and delay. The numerous simulations presented in the following sections are used to investigate the required additional bu er overhead for VC merging. It is also very likely that di erent methods of merging and nonmerging will exist simultaneously in an ATM network. Some interworking aspects of these methods are discussed in 4 .
Section 2 of this paper describes our switching architecture model for VC merging and the model of the arriving tra c. In Section 3 we show our simulation setup and discuss the results of the simulations. In Section 4 we give a summary and derive some conclusions.
Switch and Tra c Model 2.1 Switch Model
In this paper we consider the general class of single-stage, nonblocking M M packet switches with both input and output queuing 5, 6 . The shared output bu er is assumed to be su ciently large so that the switch performance is close to optimal, corresponding to the pure output queuing. Cells are transferred from the head of the input queues to the shared bu er. The speed of the input and output switch ports is denoted R S , and the speed of the outgoing links is denoted R L . Let k denote the speed ratio of the switch speed per port to the outgoing link speed, i.e. k = R S =R L . T ypically k is greater than one, which implies that an output queue should be provided in order to cope with the speed mismatch.
As described above, VC merging requires an amount of additional output bu ering due to the packet reassembly. W e i n troduce a so-called reassembly bu er at each output port of the ATM switch. Figure 2 shows the concept of the reassembly bu er. A switch has M input ports and N sources of mpt-to-pt connections because it is likely that di erent connections will coexist. Hence N can be much larger than M. The model considered in this paper is valid for the case of N M. The case of N M is not covered by the present switch model and is therefore a subject for further investigation. At e v ery merging point, each of the sources participating in the corresponding mpt-to-pt connection is associated with a distinct reassembly bu er at the output queue. When the last cell of a packet with the EOP marker arrives at the reassembly bu er, all of the cells of a packet are instantly transferred into a single output bu er per output port. Physically the reassembly and the output bu ers of one output port share a common memory pool. The transfer from the reassembly to the output bu er can easily be done by a p o i n ter movement and will therefore not incur additional delay. The simulation models for VC and VP merging are shown in Figures 3 and 4 , respectively. Cells belonging to the various VCs are transferred from the head of the switch input queues in the shared bu er and, subsequently, to the corresponding output queues. It is assumed that the tra c is uniform, i.e. the destination of an arbitrary packet can, with an equal probability, be any of the output ports, and that successive packets are independent regarding their output port destinations. Owing to tra c symmetry, all of the output queues have identical behavior. Let us turn our attention to a particular output queue and study its behavior. The corresponding simulation model considers N sources feeding the output queue in a roundrobin fashion governed by the factor k. This model is also appropriate for the case where the switch fabric is capable of transferring only a limited number of cells to any given output 7 . 
Tra c Model
The tra c and simulation model we use is shown in Figures 3 and 4 . We use N arrival processes, which correspond to the tra c destined to the output queue. Packets are assumed to arrive according to either a Poisson process nonbursty tra c with the mean arrival rate o r a h yperexponential process bursty tra c. The hyperexponential process is generated by a t wo-stage hyperexponential distribution. The mean values corresponding to the two stages are 0:51 and 16:48 , respectively. The corresponding routing probabilities for the two stages are 0.97 and 0.03, respectively, so that the mean arrival rate is again equal to . Each packet is assumed to contain a number of cells geometrically distributed with a mean of E cells 8, 9 . We used E = 10, E = 30, or E = 180 cells 10 cells correspond to 472 bytes, 30 cells to 1432 bytes and 180 cells to 8632 bytes.
It is shown in 9 that the mean packet size in a core network where ATM is likely to be applied is about 289 bytes. This yields 6.2 ATM cells of data using AAL 5 with the null encapsulation method as described in 10 additional overhead of AAL 5 is 8 bytes per packet. The dominant packet sizes in an Internet backbone are 40 or 44 bytes at about 36 of the tra c TCP acknowledgment packets, TCP control segments such as SYN, FIN, : : : , and Telnet packets carrying single characters, 552 or 576 bytes at about 25 512 and 536 bytes of TCP implementations without path MTU discovery as the default maximum segment size MSS for nonlocal IP destinations, yielding a 552 or 576-byte packet size, 185 bytes at about 2.7, and 1500 bytes at about 1.5 Ethernet tra c. These statistics were collected on Feb 10, 1996, in FIX-West network as a sample wide-area network, and are shown in 11 . A more recent study of tra c characteristics in an Internet backbone conducted in August of 1997 is presented in 12 . It is shown that almost 50 of the tra c is 40 or 44 bytes in packet length. More prominent packet sizes are 532, 576, and 1500 bytes, each representing 15 of the tra c. Comparing the two studies we observe a shift to smaller packets of size 40 or 44 bytes and larger packets of size 1500 bytes.
For the future development of packet sizes, the spreading of the use of path MTU PMTU discovery will have a signi cant impact. PMTU will a ect MTUs in IPv4 as proposed in 13 and even more MTUs in IPv6 over faster LANs. There will be numerous packets with possible sizes up to 64 kilobytes max. packet format for AAL 5 is 64 kilobytes 14 . A single packet of this size involved in reassembly could alone ll the entire reassembly bu er in a switch output queue. Reference 15 gives an overview of other typical frame sizes being applied on AAL 5. These are 8 kilobytes used by the Network File System NFS and the 9180 bytes of IP MTU over SMDS 16 that became the default value for IP MTU over ATM AAL 5 14 . These big packet sizes in conjunction with VC merging could induce present problems that VP merging would not encounter. On the other hand there will also be much Figures 5-13 show the VC merging bu er size solid line and the corresponding VP merging bu er size dashed line. The results serve to compare VP and VC merging. They cannot, however, be used directly to show the required output bu er space in an ATM switch because no ow control has been taken into account. The simulations were carried out for an extremely large number of events such that 95 con dence intervals were very small. , and about 30 to 37 cells for l = 30 over some magnitudes of over ow probability. A t high loads the output queue contains a large number of cells, which translates to long delays. Therefore, by the time the rst cell of a packet is ready for transmission at the output link, the corresponding last cell has most likely arrived and, consequently, the packet reassembly has been completed. In this case, therefore, the additional overhead due to reassembly is almost negligible. It is also important to note that the workload of today's switches normally lies at high levels of around 70 or 80. In contrast, at low loads, the rst cell may be ready for transmission while the reassembly is in progress. In this case it has to be delayed until the reassembly process has been completed. However, owing to the low load, the number of packets under reassembly is small and, therefore, the additional bu er requirement o f V C merging is minimal. The results obtained are in agreement with those presented in 9 . We then made the same simulations with bursty arrival processes. We model the bursty arrival process by a h yperexponential packet arrival process as described in the previous section. The results for l = 30; 70; 90 are shown in Figure 6 . We see that the bu er requirements for both VC and VP merging grow signi cantly for high loads. Of course ow control would alleviate this problem to some extent due to the overall load reduction. The additional bu er requirements for VC merging compared to VP merging are minimal even for the case of bursty tra c. In particular, for high loads they become negligible for the reasons given above.
Simulation results were obtained for di erent v alues of k and di erent loads l. B y v arying k we expected to see an in uence on the additional bu er requirement. Surprisingly, only the extreme value k = 1 resulted in a big additional bu er requirement for VC merging. It is obvious that VP merging requires almost no output bu er with k = 1 as the speed of the switch output port R S is equal to the speed of the output link R L . We then tried to determine the critical k for every load factor l considered. The critical value of k is de ned as follows: For all values of k larger than the critical value, there is practically no distinction between VC curves and VP curves, whereas for all smaller values of k the curves start becoming distinguishable. We found that the critical k lies close to the extreme value k = 1. The range of the critical k is between 1:1 and 1:3 for l = 90 and l = 70, respectively. This means that the critical k becomes larger with lower loads, but it is still far away from the values implemented i n t o d a y's switches greater than 2. To substantiate these observations we i n vestigated the critical k for l = 30, too. In this case the critical value for k is approximately 1:5, which is still much smaller than 2 and thus con rms our theory. Figure 7 shows the results of our simulations for k = 1 :1; 1:2; 16 at l = 90. We observe that all of the curves for the output bu er of VC merging at di erent v alues of k lie close together. The value of k = 1 :1 is the critical one because the corresponding curve starts to show a deviation. The same applies to the curves for VP merging. For values of k greater than the critical one, the additional bu er requirement for VC merging at low o ver ow probabilities is minimal. However, the di erence between VC and VP merging becomes noticeable for values of k less than the critical value. Figure 8 shows the results of our simulations for k = 1 :2; 1:3; 16 at a lower load of l = 70. In this case, we observe that the di erent curves for VC and VP merging lie Here again, the additional bu er requirements for VC merging at low o ver ow probabilities become noticeable for values of k less than the critical value. Figure 9 shows the results of our simulations for k = 1 :5; 2; 16 at a low load of l = 30. We observe again the similarity of the curves for VC merging over the entire range of k. The curves for VP merging vary slightly so that the additional bu er space becomes smaller for a larger k, with a critical k at about k = 1 :5. There is a noticeable additional bu er requirement for VC merging in the entire range of values of k. F urthermore, the additional requirement increases as k decreases.
We then tried to investigate the possible in uence of more speci c tra c characteristics such as larger packet sizes and increased numbers of sources in a mpt-to-pt connection on the additional requirements of VC merging compared to VP merging. First, we performed simulations for a larger mean packet size of the arrival process E = 30. Figure 10 shows the curves for l = 90 and k = 1 :1; 1:2; 16 with a mean packet size of E = 30. Compared to Figure 7 we observe a greater di erence between the curves for k = 1 :1 and for k = 1 :2. It appears that the critical k is shifted to a value slightly larger than k = 1 :1 b e t ween k = 1 :1 and k = 1 :2. Furthermore we see that the mean packet size, which is three times larger, requires an output bu er size that is also three times larger. Moreover, the additional output bu ers for VC merging are about three times larger for E = 30. Therefore the additional bu er requirement for VC merging appears to grow linearly with the mean packet size. This trend is also veri ed by our simulations for E = 180. Figure 11 shows the results of the same simulation for l = 70, k = 1 :2; 1:5; 16 and an increased mean packet size of E = 30. Compared to Figure 8 we again observe a shift of the critical k from a value of about k = 1 :2 to a slightly larger value. Concerning the additional bu er requirement for VC merging, the same observations were made as in the case of load l = 90. This means that, also at this load, as the packet size increases, the additional bu er requirement increases by the same factor. Finally we performed simulations for a larger N N = 6 4 ; 128 to assess the in uence of a large number of sources associated with one mpt-to-pt connection on the additional bu er requirements of VC merging due to reassembly. An increased number of sources could translate to an increased degree of reassembly. This again would lead to a signi cantly larger required bu er space for reassembly than for nonreassembly. Figure 12 shows the results for the simulations for N = 1 6 ; 64 sources, factor k = 16 and nonbursty tra c. The results obtained also apply in the case of nonbursty tra c. This is explained by P alm Khintchine's theorem 17, p. 156 , which states that summing up a large number of iid processes for instance hyperexponential processes as used for our bursty tra c results in a process of Poisson type our nonbursty tra c. As our simulation has 64 sources, each with an iid process for the arrival tra c, we are able to apply this theorem and to simulate nonbursty arrival tra c. All of the corresponding curves in Figure 12 lie close together. Concerning VP merging, as N increases, the corresponding curves converge because the aggregated arrival process tends to a Poisson one. For VC merging, the bu er requirement does not increase with the number of sources. This is because increasing the number of sources translates to decreasing the arrival rate per source such that the load at the output link remains constant. This shows that our previous simulation results hold also for a larger scenario with a larger number of senders. We h a ve i n vestigated the impact of varying k given large values of N. Previously we found a critical k of about 1.1 to 1.3 at N = 16. Figure 13 shows the results of the simulations with N = 64 and k = 2 ; 4; 16 with nonbursty tra c. The di erent curves for VC and VP merging again lie close together and we see no signi cant di erence between the curves belonging to the values k = 2 and k = 16. Consequently the critical value of k is smaller than 2. Once again, for values of k greater than the critical value, the additional bu er requirement for VC merging does not increase.
Summary and Conclusions
VC merging is likely to become the method of choice to implement mpt-to-pt connections in ATM networks. Because of the cell interleaving problem created by V C merging, reassembly has to be performed in the merging points. The e ect of reassembly has been investigated assuming an output queue switch architecture. The results obtained demonstrate that, at high loads and for arbitrary arrival processes, the implementation of VC merging in the switches will not require much additional bu er at the output queues of the switches. In contrast, at low loads, additional bu er is required but this is minimal. Furthermore, it was found that the additional bu er requirement for VC merging is proportional to the average packet size. Consequently, large packet sizes can result in large reassembly bu er requirements. We further investigated the e ect of the speed ratio between switch output port and output link and came to the conclusion that for su ciently large speed ratio values k 2 the output bu er requirement for VP and VC merging remain the same, respectively. W e found a critical k which grows with decreasing utilization and also with growing mean packet sizes of the arrival tra c. But it always remains between 1.1 and 1.3 for high utilization of 70 and 90.
