I ( N B S )
n January 1977, the National Bureau of Standards adopted an IBM developed block cipher called the Data Encryption Standard (DES) [l] . Approximately four years later, in December 1980, the NBS published a document titled "DES Modes of Operation" [2] , that describes four DES operating modes and some o f their characteristics. This paper describes a new LSI device called the Digital Encryption Processor (DEP), developed by AT&T Bell Laboratories and manufactured by AT&T Technologies. The DEP has been validated by the NBS as complying with the DES. This is the first LSI device to incorporate all of the standard DES modes of operation into a single integrated circuit.
The DEP combines a fast hardware implementation of the DES with a 64-bit input shift register, a 64-bit output shift register, a set of multiplexers necessary to configure the operating modes, a data latch, and four sets of key and initial value registers. Control over this hardware is provided by a user programmed sequencer. This sequencer provides the flexibility necessary to program any of the four DES operating modes, and to tailor the encryption function to the system requirements. Additionaliy, the four different key and initial value registers may be used to program multiplexed ciphering operations.
The DEP is designed as a microprocessor peripheral and is packaged in a standard 40 pin dual-in-line package. Figure  1 shows a block diagram of the device. There are two separate parallel bidirectional eight-bit ports, two separate serial bidirectional data ports, and a serial key Oort. All of these ports may be read or written asynchronously with respect to the clock input. The separate data ports are provided to increase data throughput and security by allowing separate plaintext and ciphertext buses. There are seven possible data port configurations. The serial key input port would typically be used to load a key from external circuitry, for example a ROM, that the user keeps locked when not in use. Microprocessor polled or interrupt systems may be configured, since output flags may be read from the data buses or on independent output pins. Maximum data rate for the device in any of the standard operating modes is as follows: Input Clock 2.5 MHz (worst case) 4. The internal user programmed sequencer enables the device to accommodate special system requirements and reduces host processor overhead. The DEP should reduce the cost and simplify the process of encrypting digital data to a point where it is possible to include a ciphering function in modems, terminals, and work stations.
Architecture
The DEP architecture may be divided into two sections: the ciphering hardware and the user programmed sequencer.
The Ciphering Hardware
T h e DES specifies a cryptographic algorithm which is a nonlinear, 64-bit block cipher using a 56-bit key. The components of the algorithm are simple and individually weak. They consist of permutations, combinations ("exclusive-or" sums) of the data and internal key bits, and nonlinear substitutions. The key input of the DEP device is a 64-bit number with the least significant bit in each byte, or every eighth bit in a serial key load, a parity bit. Odd parity is checked and a flag is set if parity fails. Device operation is not inhibited by a parity failure. Figure 2 shows a block diagram of the DEP ciphering hardware, with the DES key schedule and enciphering circuitry enclosed in dotted lines. The algorithm specified in the DES was designed to be implemented in hardware. There are several permutation matrices specified in the standard which have no overhead in hardware, since the permutation matrix is simply a crisscross of wires. The DES section of the enciphering circuitry consists of a 2:l multiplexer with 64 sections, two 32-bit L and R registers, 32 "exclusiveor" gates, and a cipher function (F) of the internal key and R register. Figure 3 illustrates the F function. The eight s-boxes shown are the nonlinear algorithm elements and are implemented as eight ROMs, each consisting of 64, four-bit words (six address lines and four outputs). T h e result is the K-bit ciphertext which is fed back and shifted into the K least significant bits of the DES input block to form the next DES input block. Output Feedback (0FB)-starting with an initial value, the DES is operated as a pseudo-random bit stream generator (the DES output is fed back as the input). Ciphertext is produced by adding the plaintext to the random bit stream modulo 2. Figure 2 shows the sets of multiplexers, the "exclusive-or" gates, and the data latch necessary to configure the DES operating modes. MUX 6 and the latch register are used to shift the input data block for the CFB mode. MUX 13 is used to select the input to initial value registers 0-3. The initial value registers may be used to hold temporary products in a multiple encryption operation, or to store the next DES input block for the current ciphering operation, before jumping to a different ciphering operation. The input and output shift register circuitry is clocked by the rising edge of the decoded data write and read strobes applied to the chip. When the input shift register is filled, an ISRFULL flag is set and the DEP can cipher the new data and clear the flag. When the output shift register is empty, an OSREMPTY flag is set and the DEP can reload this register and clear the flag. These two flags may be read by the user on either of the two eight-bit data ports, or on separate output pins. This structure allows the external read and write strobes to be independent of the DEP clock. To achieve maximum throughput a user would have to complete the reading and writing of data during the DEP ciphering operation.
NBS

T h e User Programmed Sequencer
T h e two eight-bit bidirectional data ports, master and slave, may be thought of as plaintext and ciphertext ports, respectively. All control registers must 
OSR-FULL ISR-
be written through the master port. Other than data, which may be read or written for ciphering, only three flag bits of the status register may be read from the slave port (see Tables I and 11 ). Control over the ciphering hardware is provided by the user programmed sequencer. A block diagram is shown in Fig.  4 . T h e sequencer executes a 22-bit instruction every two clock cycles. Depending on the address in the program counter, these instructions may come from either a RAM or ROM program memory.
T h e ROM contains three programs and one subroutine. The subroutine executes the DES algorithm using whatever key is currently in the C and D registers (Key Schedule, Fig. 2 ) to encipher whatever data is at the input to the initial permutation matrix (labeled IP, DES Enciphering circuitry). There are four pairs of key and initial value registers that may be externally loaded. These registers are loaded by writing the address (0-3) of the key/initial value pair to an internal status register. Then the appropriate ROM program is executed. T h e three programs are described (ROM Code, 'These are self contained programs. They may not be called as subroutines from another program. **B6 is an urmecessary mnemonic in this code.
A load key program waits
for an eight-byte number to be written to the master port. When the ISRFULL flag is set, this number is clocked into the key register addressed by the status register. Odd parity of each byte is checked. The least significant bit in each byte is the parity bit. 3. A serial load key program waits for a 64-bit number to be clocked into the serial key data port using the serial key clock. When this program is executed, a hardware key request pin goes active. When the key is loaded into the input shift register, the sequencer clocks the number into the key register addressed by the status register. T h e key request flag is thkn cleared, Odd parity of every eight bits is checked. Each eighth bit input is the parity bit.
At the erid of all three of these programs the sequencer goes into an endless loop (wait state) until a new program is executed.
T h e RAM contains the ciphering program and must be written by the user prior to any ciphering operation. T h e RAM may hold u p to 32 instructions, enough tu program any standard DES mode. T h e user loads the RAM through the eight-bit master port. After writing the RAM address (20H to 3fH) to the mode control register, the user writes three bytes for each 22-bit program instruction. T h e two most significant bits, in one of the bytes, are not used. T h e RAM, or the ROM (address OOH to 1 lH), may be read in a similar mahner.
T u begin the execution of a program, the user writes the program memory starting address to the mode control register. T w o clock cycles later, this address is loaded into the program counter (Fig. 4) and execution begins. Data flow through the ports and the associated assignments of the master and slave flags are controlled by the port configuration register (see Table IV ). Normally, this register would be written before executing a program.
Micro-Code Instruction Set
Mnemonics, corresponding to actual signal names, were defined for the program instruction set. Table V defines a 22-bit instruction composed of three bytes-M1, M2, and M3.
Bit 4 of M2 controls the interpretation of M1 and the three most significant bits in M2. If bit 4 of M2 is low, the multiplexer select lines are latched. In the program convention used, the presence of a mnemonic, for example S1, indicates the control line is latched high. Conversely, the absence of a multiplexer mnemonic indicates the control line is latched low. If bit 4 of M2 is high, the specified signal is enabled only for the duration of the instruction period, two clock cycles. An enable and the associated clock signal, for example, LDDES and CKDES, must be programmed in the same instruction since none of these signals is latched.
Bits 0-3 of M2 are decoded and select one of 12 commands. With the exception of R E T and CLEAR, all of these commands use all or some of the bits in M3 as an argument.'The three commands SROL, ADD, and IO latch bits of M3 until overwritten, or a subsequent CLEAR command is issued.
A C language* assembler was written to facilitate the development of ciphering programs. The output of that assembler is shown in Table  I11 for the ROM code. Whenever bit 4 of M2 is set low, the ciphering multiplexers are set-up, and the assembler program prints the inputs to the DES (DES INPUT), output shift register (OSR INPUT), initial value register (IV INPUT), and data latch (LATCH INPUT) . This is useful in checking that the multiplexer configuration latched is correct. The six-instruction DES subroutine may then be explained as follows:
1. The input to the DES initial permutation matrix is clocked into the L and R registers (Fig. 2 ) .
Simultaneously, the key schedule C and D *C is a general purpose programming language designed for and implemented on the UNIX (registered trademark 5. This statement is executed six times as the next six DES iterations are clocked into the L and R registers. Simultaneously, the key is shifted two positions six times. 6. T h e fifteenth iteration of the DES is clocked into the L and R registers, and the key schedule C and D registers are again shifted one position.
At this point the output of the DES enciphering circuitry, the inverse initial permutation matrix (IP-I), will have the sixteenth DES iteration or the outpuf block.
A sample of the RAM micro-code for the standard ECB and CBC operating modes is given in Table VI. T h e code for the remaining standard operating modes is available and documented.
Applications
The following applications illustrate the unique capabilities of the DEP. In order to perform similar operations with available integrated DES devices, considerable processor' overhead or multiple devices might be required.
Two-Way Encryption Application
T h e first application describes a two-way encryption system using separate receive and transmit keys. A drop-in box between a terminal (or computer) and a modem was built. Clearly, this system requires a character oriented protocol. The eight-bit cipher feedback mode was used. In a typical terminal-to-computer connection, the number of characters transmitted and received is unequal. The ciphering operation desired is shown in Fig. 5 .
T o transmit an encrypted character the number in initial value register 0 is encrypted using key register 0. As this number is being clocked into the DES enciphering hardware, it is also clocked into the data latch. When a plaintext character is input, it is added modulo 2 to the eight most significant bits in the DES output block, DESOUT ISR. (The symbol is used to define the "exclusive-or '' operator.) This byte of ciphertext output is clocked into ' the output shift register for transmission by the modem. It is also clocked into initial value register 0 as the least significant byte. T h e seven other bytes are simply the previous initial value shifted one byte to the left. T h e most significant byte of the previous initial value is discarded.
T h e receive operation is nearly identical. The number in initial value register 1 is encrypted using key register 1. As this number is being clocked into the DES enciphering hardware it is also clocked into the data latch. When a ciphertext character is input, it is added modulo 2 to the eight most significant bits in the DES output block. This byte of plaintext output is clocked into the output shift register for reception by the local terminal (computer).
T h e ciphertext in the input shift register is clocked'into initial value register 1 as the least significant byte. T h e seven other bytes are simply the previous initial value shifted one byte to the left. T h e most significant byte of the previous initial value is discarded. There aie two differencesj then, between transmit difference is that key/initial value register pair 0 is and receive. One difference is the feedback to the initial used in transmit and pair 1 is used in receive. value register. During transmit, ISR A DESOUT is fed Code for this ciphering mode is shown in ISR^ DESOUT", taken from Table VII, should be read as follows: the input to the DES enciphering block equals the data latch output (Qn) shifted eight bits to the left and concatenated with the eight most significant bits in the "exclusive-or'' sum of the input shift register and the DES enciphering block output. After loading the RAM program memory with the hex data in Table  VII , the program start address (2bH) is written to the mode control register and execution begins. The program will remain in a loop (2cH to 2fH) until the input shift register is filled. Depending on the most significant bit in the port configuration register, the DEP will either encrypt (transmit) using key/initial value register pair 0, or decrypt (receive) using key/initial value register pair 1. The mnemonic LT? is used to test the most significant port bit. A low is used for transmit (jump condition), and a high for receive (next instruction). T h e only timing requirement on the input to the DEP, when changing from transmit to receive, is that the data byte written be delayed from the port register write by three DEP program instructions.
This guarantees the LT? instruction (2dH) will be executed after the port register write and before data ciphering. With a 4 Mhz DEP clock, this is 1.5 microseconds. After the data byte is written, the DEP program sequencer will detect an input shift register full condition and cipher the data. If the previous output data has been read, the new cipher byte will be written to the output shift register; the next initial value will be stored and the sequencer will again cycle waiting for the input shift register to be filled. If the previous output data has not been read, the sequencer will wait (31H) until the output shift register is emptied. At most, it will take 24 instructions from the time an input byte is written until the cipher text is available to be read. For a 4 MHz clock, this is 12 microseconds.
In order for two stations to communicate properly, if kO and kl are input to key registers 0 and 1 (respectively) of a DEP device at station one, then kO and kl must be input to key registers 1 and 0 (respectively) of the DEP device at station two. The two stations need not have the same initial value, since a station will synchronize after eight characters have been received. This is a property of the eight-bit CFB mode. Therefore, to begin a session, the two stations only have to establish session keys. The protocol shown in Fig. 6 was used to exchange session keys. This protocol does not require either station to be a master or slave; both stations perform exactly the same operations. A master key is input to key register 2. A random number loaded into key register 0 is encrypted in the ECB mode under the master key. This ciphertext is then transmitted. The received ciphertext is decrypted and loaded into key register 1. After these three operations, the session key exchange is complete and two way communications may begin.
Conclusions
A user programmed Digital Encryption Processor based on the National Bureau of Standards DES algorithm has been described. The DEP has been certified by the NBS as complying with the DES. All four of the NBS defined operating modes may be programmed. Multiplexed ciphering operations may be programmed, eliminating the need for more than one encryption device in some applications. The internal program sequencer allows the user to tailor the ciphering function for the specific system application. These features place the DEP beyond existing commercial devices. T h e data throughput rate of 0.59 megabytes per second, for the standard modes under worst case conditions, is comparable with the fastest commercial part now available.
For some of the unique modes, the data rate will be much faster since databases. These networks, along with the boomil there is no host processor overhead.
cable television market and satellite communicatio The proliferation of smart terminals and computers networks, are prime candidates for low-cost, secu is leading to distributed networks with access to large encryption. 
