Asynchronous Transfer Mode (ATM) igh-speed ATM links are typically shared by several TI&% users. For these reasons, ATM users may want access to shared, flexible encryption that supports multiple concurrent ATM VCs and multiple concurrent ATM users. Software-based encryption is flexible, but it sometimes raises performance and security concerns. Hence, some users require hardware-based encryption. Key-agile encryptors are the current state-of-the-art in high-speed, shared hardware-based ATM encryptors [2, 3] .
access to several different key-agile encryptors. This raises both cost and administrative issues.
An algorithm-agile encryptor would allow each VC to use a different encryption algorithm [4] . Of course, an algorithm-agile encryptor still has to maintain separate end-to-end encryption contexts and encryption keys for each VC. However, a single algorithm-agile encryptor will likely provide both initial-cost savings and administrativecost savings over sCveral key-agile encryptors. Algorithmagile encryptors may also simplify connection setup if security policy becomes a routing metric within protocols such as ATM's PNNI (Private Network-Network Interface).
The context-switching time between the different VCs' encryption-contexts can limit the performance of both key-agile and algorithm-agile encryptors. Ideally, the context-switching mechanism would sustain rates between 106,132 contexts/sec (for 45 Mb/s DS3 links) and 5,660,377 contexts/sec (for 2.4 Gb/s SONET OC-48 links).
For algorithm-agile encryptors, there are additional performance limitations. First, if the algorithmagile encryptor uses a single shared-processor then the context-switching time between the different encryption algorithms may further limit performance. Preliminary results indicate that the switching time between different encryption algorithms may be much larger than the switching time between different VC' s encryptioncontexts. (For example, the key-switching and algorithmswitching times are 60 ns and 61 ms, respectively, for Altera lOKlOO complex programmable logic devices.) Hence, a single, shared-processor architecture may not scale, to Gb/s ATM link rates. Alternatively, an algorithmagile encryptor could use multiple parallel-processors. Each processor could then execute a different encryption algorithm. During call-setup, a VC would request a given encryption algorithm. The call-request would proceed if either: a) that algorithm is already loaded in one of the multiple processors, or b) there is an idle processor that can load the requested algorithm. In this scenario, M processors could support N algorithms where N may be greater than M. This may provide a cost saving over N key-agile encryptors. This multi-processor architecture Section 2, of this paper, proposes an architecture for an algorithm-agile encryptor that consists of M parallel-pipelines that feed a common output-queue for the outgoing ATM-link (see Figure 1) . Each pipeline is a keyagile encryptor that provides a single encryption algorithm for many VCs. Section 3 then bounds the CDV generated 
PARALLEL-PIPELINE ARCHITECTURE for
Algorithm-agile encryptors come in two basic varieties --namely a single shared-processor or multiple parallel-processors. In both cases, the encryptor serves one input ATM-link and one output ATM-link. So, the encryptor is single-input, single-output (SISO). Performance analysis for the single shared-processor case is a classic scheduling-problem. However, a single, shared-processor architecture may not scale to Gb/s ATM link-rates because of the context-switching overhead between different encryption algorithms. So, this paper considers the multiple parallel-processor architecture shown in Figure 1 . As previously mentioned, this parallelarchitecture may trade increased scalability for some callblocking. Section 2.1 describes this generic architecture in greater detail. Section 2.2 briefly compares this encryptor architecture with ATM-switch designs.
ALGORITHM-AGILE ENCRYPTION

Components:
In Figure 1 , the Input Link extracts the incoming ATM cells from the incoming physical layer. It also performs the Header Error Check (HEC) function. It does not perform a Segmentation and Reassembly (SAR) function since the ATM Security Specification [ l ] uses per-cell encryption. The Output Link represents the corresponding functions for the outgoing ATM link.
The Cell Sorter assigns incoming cells to the correct pipeline (i.e., encryption algorithm). During callsetup, the Cell-Sorter might load the association between a VC and its encryption pipeline into a table-structure. These table-lookups are another potential bottleneck at Gb/s rates. The design considerations include flat-memory versus associative memory and cache sizes. This paper does not explore these issues further.
The M pipelines represent M different parallelprocessors. Each pipeline, Pi, implements a different encryption algorithm, A,. Every VC that uses encryption algorithm A, shares Pipeline PI . (Note: definc the ATM cells that use PI as "type i" cells.) So, each individual pipeline must be a key-agile encryptor. Since few encryption algorithms can encrypt an entire ATM-cell (384 payload bits) within one ATM-cell interarrival-time (about 2.8 ps for transport in 155.52 Mb/s SONET OC3c), each encryption pipeline might use SI stages. Each stage, P,k, implements a portion of algorithm A, on each ATM-cell. (Section 3.2 discusses the general case where the encryption-block size is not equal to 384 bits, or one ATM cell). Section 3 discusses constraints on execution speed versus pipeline depth in greater detail. These constraints yield upper bounds on the CDV generated by this algorithm-agile encryptor.
The encryption pipelines may have different execution times. Hence, output-queueing is required. For example, let cells for VCs 1 and 2 use pipelines 1 and 2, respectively. Let algorithms 1 and 2 have normalized (to the ATM-cell interarrival-time) encryption times of 2 and 1, respectively. Then, for example, an input cell-sequence of 1,2,1,2 ,..., at normalized times 0,1,2,3 ,..., has simultaneous arrivals at the output queue at (normalized) times 2 and 4. The output-queueing discipline can be quite general. Section 3 gives upper bounds on the CDV generated by the FIFO, priority and oldest-job-first queueing disciplines.
Finally, this generic architecture glosses over the control issues associated with swapping N algorithms between M pipelines. However, that swapping occurs over the time scale of VC call-setup requests rather than at the ATM-cell rate. So, the performance issues are blocking and added call-setup delay rather than context-switching speed.
Comparison with ATM Switch Design:
Algorithm-agile encryptors present different design problems than typical ATM switches. First, ATM switches are usually multiple-input, multiple-output devices. For each transmission direction, an algorithmagile encryptor is only single-input, single-output (SISO). So, Call-Admission Control (CAC) should be simpler for the encryptor. Second, while both devices generate CDV, the mechanisms are different. ATM switches try to keep internal path-delays equal, since this reduces CDVgeneration. However, output-queueing still occurs in ATM switches since simultaneous cell-arrivals at different inputs may have the same output link. This outputqueueing causes CDV in ATM switches. Third, cell-loss in ATM switches typically occurs because of finite outputbuffer size. Conversely, like any SISO device, algorithmagile encryptors can have zero cell-loss.
PERFORMANCE RESULTS
This section bounds the CDV generated by the parallel pipeline architecture proposed in Section 2. It also explores the tradeoffs between CTD and CDV for various output-queuing algorithms (FIFO, priority and oldest-jobfirst).
Assumptions:
All subsections use the following seven assumptions. First, the ATM cell interarrival time is normalized to 1. Second, the CDV, CTD and the service times of each pipeline stage are then normalized to that cell interarrival-time. Third, the input link is full (i.e., no idle or unassigned cells). Fourth, each encryption pipeline, i, has a fixed execution-time, TI, per ATM cell. Fifth, the output and input links are synchronous. Sixth, the analysis ignores the jitter caused by extractinghserting ATM cells fromhnto the physical layer. Finally, the encryptor has zero cell-loss (i.e., each encryption pipeline can "keep up" with the full input cell-rate).
Assumptions one and two are for convenience. Assumptions three and four allow analytic tractability. In practice, a pipeline's per-cell execution-time may vary because of the context-switching between different VC's encryption-contexts. Assumptions five and six are reasonable if the physical layer is SONET. In any case, the physical-layer jitter should statistically affect all VCs identically. Finally, Section 3.2 derives a relationship between execution speed and pipeline depth that guarantees zero cell-loss. Other architectures where the individual encryption pipelines can't "keep up", may be studied in future work.
Execution Speed and Cell Storage vs. Pipeline Depth:
Consider the simplest "algorithm-agile,' encryptor, which has one pipeline with one stage. That encryptor is a simple D/D/l queue with service time TI. Stability, and hence no cell-loss, of course requires TI 5 1.
A slightly more complex system is one pipeline with S stages. Let Now consider multiple pipelines, P,. Let each pipeline have S, stages. Then stability relquires T, 5 SI , for all i. As a collorary, the overall encryptor can't store more Sk cells total, where s k is the largest of the Si's, in its pipelines and output queue. (For both claims, just assume various input cell-streams of all type i cells.) The next section uses these simple results to derive bounds on both the maximum CDV and the worst-case CDV probabilitydensity.
CDV Bounds: 3.3.1 Conditions for No CDV Generation:
Consider, the two pipeline case. Let the two pipelines have execution times of T I and T 2 , respectively. Let cells k and k+l be of types 1 and 2, respectively. Cellorder is clearly maintained if (TI -T2) <: 1. Reversing the cell input-order shows that cell-order is maintained whenever I TI -T2 I < 1. For a general number of pipelines, consider pair-wise arrival patterns. This leads to the general rule. The proposed parallel-pipeline architecture does not generate CDV i f I Ti -Ti I < 1, for all i,j
CDV Bounds for FIFO Output Queueing:
The previous equation will probably not hold for most algorithm-agile encryptors. So, consider a twopipeline system where T I 2 T2 + 1. Also assume FIFO (First-In, First-Out) output queueing. Let the arriving cell be type 1. From the stability results given in Section 3.2, that arriving type 1 cell can neither pass nor be delayed by previously arrived type 1 or type 2 cells. So, that type 1 cell's CDV depends on how many later-arriving type 2 cells pass it. If a type 1 cell arrives at time t = k, then it can be passed by type 2 cells that arrive in the interval C. C = { k + l , k + 2 , ..., k+LT, --T2]}, Letting T2 go to zero, yields another useful result. The CDV for type 1 cells is bounded by the largest integer 1. T I . (The CDV for type 2 cells is discussed below.) This result generalizes to multiple pipelines if TI denotes the execution time for the longest pipeline.
For the two-pipeline case, the CDV probabilitydensity for type 1 cells is the probability of x type 2 cellarrivals in the interval C. The general case is straightforward, but messy. However, algorithm-agile encryptors probably have a small number of parallel pipelines (e.g., 5 5 ) . So, the CDV probability-density equations remain tractable.
Priority Output Queueing:
Consider the two-pipeline case with TI 2 T2 + 1. 
Oldest-Job-First Output Queuing:
Priority queueing can yield unstable behavior if there are three, or more, priority classes. An alternative queueing discipline is Oldest-Job-First (OJF). This discipline transmits, from the output queue, whichever cell has been in the system longest. Hence, OJF gives preference to cells from the pipeline with the longest execution-time. Again, assume a full input-link. Let T I be the execution time for the longest-time pipeline. In that case, OJF yields the same CTD, namely [T, + 11, and zero CDV for all cell-types for any number of pipelines. architecture, that used multiple, parallel encryptionpipelines, was proposed. That algorithm-agile encryptor's effect on the ATM Quality of Service (QoS) metrics, such as Cell Transfer Delay (CTD) and Cell Delay Variation (CDV), was analyzed. Bounds on the maximum CDV and the CDV's probability density were derived. The key result was that pipelined algorithm-agile encryptors could cause CDV even if the constituent encryption-pipelines kept up with the input ATM cell-rate. One solution appends delay-lines to each encryption pipeline, such that the inequality in Section 3.3.1 holds. In that case, the encryptor can trade zero CDV for increased CTD. For example, an algorithm-agile encryptor that implements this delay-line technique for null-encryption, DES and Triple-DES [6] could have a CTD of 8 ATM cell interarrival-times, which is about 21 ps at the SONET OC-3 rate. However, this simple delayline solution can complicate adding new algorithms to an existing algorithm-agile encryptor.
