The asynchronous transfer mode ͑ATM͒ is a promising technique for broadband switching that is capable of supporting high-bit-rate multimedia services. Progress in all-optical parallel processing shows that photonics may be used in the future in full-functionality ATM switching nodes. All-optical switching fabrics and buffers have already been demonstrated. Fewer studies have been dedicated to ATM header-processing functions. As an example of photonic header processing, the implementation of a header-error control ͑HEC͒ subsystem by means of an optical circuit is investigated. Although traditional electronic decoders perform HEC in a serial mode, the architecture chosen for the optical HEC subsystem is based on parallelism, and an appropriate parallel-decoding algorithm was used in designing it.
Introduction
The telecommunications infrastructure in the near future will have to be able to support multimedia services. These new kinds of services, which have already encountered significant success in experimental trials conducted recently, will require the establishment and the maintenance of multiple connections across networks with large bandwidths and differing qualities of service.
The asynchronous transfer mode 1,2 ͑ATM͒, based on the collection of digital data streams into fixed-size data packets ͑cells͒, is today's most promising switching technique for broadband networks because it was designed with the purpose of guaranteeing flexible and on-demand access to network resources for current and also future users. It will be supported in an ever-increasing number of installations of different kinds, from local-area networks to wide-area networks. 3 Electronics, the technology used to date in broadband switching nodes, could become a bottleneck in future networks. 4 The parallelism allowed by an optical implementation is considered to be a valid solution to overcoming this limitation. In optical systems there is no need for optoelectronic conversion: therefore not only do optical systems have a potentially larger bandwidth but they also can be transparent to the bit rate, the data format, and the transmission encoding. Optics also has many other advantages versus electronics, such as electromagnetic-interference immunity, low skew, low dissipation of transmission lines, absence of impedance-matching problems, and so on. 5, 6 Recent advances in high-speed all-optical processing devices show that photonics may also be used in the future in full-functionality packet-switching nodes. Although an all-optical ATM switching node probably is still far in the future, some of its subsystems have already been demonstrated. In optical ATM switching research most of the activity has focused so far on space-switching fabrics 7 ͑multiplex switching͒ and on all-optical buffering systems for output-contention resolution. 8 Fewer studies have been dedicated to ATM node subsystems that perform control and header-processing functions.
This paper explores the possibility of introducing optics into the header-processing functions of an ATM switching node. In particular, we describe as a first example of all-optical processing the implementation of a header-error control ͑HEC͒ subsystem. Our purpose is not to carry out an industrial design but rather to demonstrate the feasibility and the ad-vantages of massive parallel optical processing in telecommunications switching.
In accord with ATM standards a node must perform operations that occur in two layers: an ATM layer ͑which includes label switching, flow control or policing, and congestion control͒ and an underlying physical layer. The physical layer is further divided into a transmission-convergence sublayer and a physical-medium-dependent sublayer ͑comprising bit synchronization and control of transmission͒. HEC, the operation considered in this paper, occurs in the transmission-convergence sublayer of the ATM protocol stack. Other important functions of this sublayer are cell synchronization and cell delineation.
HEC is a function that is intended to protect the information contained in the header of the cell from transmission errors: An error that affects the header would cause an invalid switching operation with potentially serious effects on all the network traffic. This is why the HEC operation is performed link by link in each ATM node of the network.
Two HEC operations are
• On the transmitted cell, HEC sequence generation: The HEC sequence is the result of a simple cyclic code applied to the first 4 bytes ͑32 bits͒ of the header. The 8-bit binary sequence thus obtained is added to the end of the header and becomes the last octet of the header itself.
• On the received cell, HEC sequence decoding ͑error detection or error correction͒: The decoding operation is performed on each received cell. The HEC sequence is used to detect or correct errors that occur during the transmission of the header bytes. The cyclic code chosen in the ATM standard permits the detection of two errors or the correction of a single error. Usually the node performs correction on incoming headers. Detection is performed only when a sequence of multiple consecutive cells that contain errors is found. This mechanism is used to protect the cell flow against bursts of errors more effectively.
HEC sequence decoding is a very important function in the ATM technique also because it is the basis for cell delineation and synchronization, as is better described in Section 5.
In the subsystem proposed in this study a parallel HEC sequence decoder is used to perform error detection on the incoming headers. The correctionmode HEC operation is considered for this first design.
This paper is structured as follows. In Section 2 the basic choice of a parallel architecture for optical implementations is discussed. A parallel HEC decoding algorithm is then described and compared with the traditional serial algorithm. In Section 3 a logical architecture of the HEC subsystem is defined by analysis of the parallel algorithm. Section 4 describes a possible all-optical implementation of the HEC that is based on free-space optical interconnections. In Section 5, conclusions concerning the advantages of the proposed architecture are discussed, and future developments of the present research are considered.
Parallel-Decoding Algorithm
The architecture chosen for the all-optical HEC subsystem presented in this paper is based on parallelism, one of the most advantageous features of optics. 9 The concept of optical parallelism combines naturally with the use of spatial bandwidth: Each signal of a parallel data flow is transported through a spatial channel by a light beam. Many spatial channels can propagate together in the same device and cross each other with no cross talk. Their number is limited by only the space-bandwidth product of the device itself.
To design a parallel architecture for the HEC subsystem, one must use an appropriate parallel algorithm in the decoding operation of the HEC sequence. For introducing the parallel algorithm a brief review of its classical serial counterpart may be useful.
According to the International Telecommunications Union ͑ITU͒ T recommendation ͑ITU-T͒ standards, 10 the code to be used in HEC operations is a shortened cyclic code. The information word to be coded is given by the first 32 bits ͑4 bytes͒ of the header of the ATM cell, and the code word is composed of the whole header ͑40 bits, or 5 bytes͒. The code used in ATM is systematic, i.e., a copy of the coded information is contained in the first part of the code word. The main advantage of this type of code is that the transmitted message can easily be accessed independently of the decoding operations. In this case redundancy is represented by the fifth byte of the header, which contains the HEC sequence.
The theory of serial decoders for cyclic codes, like those usually implemented in ATM nodes, is well known. 11 This theory incorporates the most commonly used formalism for cyclic-code theory. In this formalism binary polynomials are used to represent binary words. Each block of bits corresponds to a polynomial in the undefined variable D with binary coefficients.
Consider the generic cyclic code ⌽͑n, k͒ with a generator polynomial g͑D͒. The input of the decoder located in a receiving node is the coded word x͑D͒ that is corrupted by transmission errors e͑D͒:
The decoding operation in detection mode is carried out by the computation of a polynomial known as the syndrome polynomial. This syndrome polynomial s͑D͒ can have a maximum degree of n Ϫ k ϩ 1. The syndrome polynomial is the remainder of the division between the received word and the generator polynomial: (1) where m͑D͒ is the quotient. Within the limits of the correction capacity of the chosen code the received block does not contain errors if and only if the syndrome polynomial has all zero coefficients.
The serial-decoding algorithm is implemented easily by electronic digital hardware. The decoder is usually a circuit that uses a shift register as its fundamental device and is essentially capable of performing a binary-polynomial division. To divide two polynomials, it is necessary for the shift register to have a number of flip-flop cells that is equal to the degree of the divisor and a pattern of logical feedback loops that are configured according to the coefficients of the divisor ͑i.e., the generator polynomial͒. Each feedback loop is connected to a modulo-2 summing node ͑XOR͒ at the output of the proper flip-flop.
An alternative parallel-decoding algorithm that is more suitable for the optical decoding operation can be found if one resorts to matrix algebra. 11 The formalism, which is based on matrices, is usually used to treat noncyclic linear block codes; nevertheless, it must be noted that cyclic codes are just a particular case of block codes.
After the transmission of the code word x the received word is represented by the vector y:
where the random vector e represents the errors that occur during transmission. The error-detection operation involves the same steps that were described for the case of the polynomial formalism: A vector that corresponds to the syndrome word is evaluated; if any bit belonging to this vector equals 1 an error is detected.
It can be shown that the syndrome vector s is obtained by the multiplication of y by the transpose of a matrix H, called the parity-check matrix:
In the case of the generic code ⌽͑n, k͒, s is made up of n Ϫ k elements, whereas H is of size ͑n Ϫ k͒n. If the elements of H are known it is possible to build a decoder that detects errors by means of a single operation. This operation can be performed completely in parallel: It does not require flip-flops or memory and does require the product of the received vector to be multiplied by the fixed and known transposed parity-check matrix H. The fact that H is fixed is significant because, as we see below, it greatly simplifies the architecture of the decoder.
The ATM HEC code is defined in official standard recommendations by the assignment of its generator polynomial. It has dimensions given by ⌽͑40, 32͒.
The generator polynomial assigned by the standards is
which corresponds to a 9-bit word: g ϭ 100000111. The parity-check matrix H can be obtained quite simply from the generator polynomial of the code.
The parity-check matrix in the case of an ATM HEC code has a size of 8 ϫ 40 bits and is represented in Fig. 1 .
Parallel Asynchronous-Transfer-Mode Header-Error Control Subsystem's Logical and Physical Architectures
The logical architecture of the decoder can be designed from the decoding-matrix algorithm. This design can be achieved by the interpretation of the algebraic operations of the algorithm in terms of a sequence of steps involving Boolean logical operations. As we explain in the following, the sequence resulting from this decomposition entails four main steps that correspond to the four operators of the implemented device.
The main algebraic operation of the algorithm is the product of the vector y and the parity matrix H, which, as a result of the syndrome vector s ͓Eq. ͑2͔͒, yields s ϭ yH.
The product of a vector and a matrix is the result of a sequence of scalar sums and products: Because we are using modulo-2 arithmetic, there is a one-to-one correspondence between the scalar sums and products and the logical operators XOR and AND. Therefore Eq. ͑3͒ can be rewritten as
͓y͑ j͒ AND H͑i, j͔͒,
The notation ᮍ jϭ1 n indicates a multiple-input ͑n-input͒
We chose a slightly different formulation that is useful for designing a more suitable architecture. In Eqs. ͑3͒ and ͑4͒ the information contained in y is used eight times ͑i.e., the number of elements of s͒, which is also equal to the number of columns of H.
A matrix Y with a size of 8 ϫ 40 elements ͑the same as in H͒ can therefore be built by the placement of eight identical columns side by side with each column equal to y. The term y is the transpose of the row vector y ͑the first step͒:
An AND operation is then performed element by element to obtain matrix C, again with a size of 8 ϫ 40 elements ͑the second step͒:
Finally, XOR operations must be performed along the columns of C ͑the third step͒:
C͑i, j͒,
After the syndrome vector has been evaluated the values of its elements must be tested to verify if they are all zero. This testing can be carried out by a simple OR operation performed on all the syndromevector elements ͑the fourth step͒:
The result of this last ͑fourth͒ step is the binary number OUT, which is also the output of the whole HEC subsystem. When OUT is zero it can be stated, within the code-detection power limits, that the checked header does not contain errors. The physical architecture of the optical decoder is derived directly from the above structure of the decoding algorithm. The optical module can be divided into four functional operators, each performing one of the steps defined above:
• Operator 1: Replication of the received header y to obtain the matrix Y ͓Eqs. ͑5͔͒.
• Operator 2: An element-by-element AND operation ͓Eq. ͑6͔͒.
• Operator 3: A XOR operation along the columns of C ͓Eq. ͑7͔͒.
• Operator 4: An OR operation between the elements of the syndrome ͓Eq. ͑8͔͒.
The architecture is represented in Fig. 2 .
A careful examination of the matrices and the logical operations of the decoder leads to a remarkable simplification of the hardware. In fact, in Eq. ͑7͒ only the elements of C that are nonzero should be taken into account. An element of C is always zero when the corresponding element in H is zero. Because H is known, we are able to eliminate a priori the nonsignificant elements of C. As can be seen from the representation of matrix H in Fig. 1 , more than 60% of the elements initially taken into account can be discarded. This has a very favorable influence in reducing the complexity of the structure needed to implement the XOR's along the columns. 
All-Optical Free-Space Architecture of the Header-Error Control Subsystem
The increase in the transmission bit rate allowed by optical fiber communications means that the incoming data need to be processed at high speeds. Current systems perform a serial information treatment by use of electronic technology. The maturity of VLSI technology allows the realization of fully electronic systems with faster and faster performances. However, as Yamanaka 12 discusses in his paper, some strong limitations exist concerning power supply-cooling and interconnection. For example, advanced ATM gigabit-per-second switching chips consume more than 10 W of power, which results in very difficult problems of cooling and power dissipation when one wants to use more than one switching element. On the other hand, electrical interconnections limit the system as a result of high losses, cross talk, and reflection.
A real breakthrough that is needed to overcome this bottleneck is the exploitation of the capabilities of the parallelism of optics. We now propose an optical implementation of the functional operators identified in Section 3 on the basis of a free-space propagation optical architecture. This method of interconnecting optical devices is the most suitable for exploiting parallelism and allows one to achieve a high processing density under totally cross-talk-free signal conditions. The presented optical implementation is just a proposal: Our aim is to show the features of free-space parallel processing in relation to traditional approaches for the case of a real ATMlayer logical function. Figure 3 illustrates the hypothesis of optical implementation that is described in the text.
In the following description, we refer to the functional operators ͑operator 1 through operator 4͒, as identified in Section 3. The input of the HEC decoding system must be interfaced with the optical transmission medium, i.e., the input fiber of the switching node. We do not consider in this feasibility study the operations or the format conversions necessary to deal with the overhead of the incoming transmission frame. To match with the proposed parallel architecture, it is necessary that this interface contain a device that is capable of converting data received in a serial form ͑as they arrive from a digital network link͒ into a parallel form ͑a two-dimensional image͒. In this case the bits of the received header must be transformed by a serial-to-parallel optical converter into a parallel array to be fed subsequently to the HEC subsystem. A free-space serial-to-parallel optical converter based on electro-optical ͑e-o͒ gates has already been studied and experimentally evaluated in the laboratory ͑although in a simplified version͒, as reported in preceding papers. 13 The HEC subsystem does not require special output interfaces because its output is a single binary optical signal that represents the validity of the decoding operation. The inner structure of the freespace architecture of the HEC subsystem comprises a sequence of processing stages connected by free-space optical interconnections. A processing stage can be either a single device or an array of optical devices operating in parallel. In a free-space architecture optical components of different kinds may be used Fig. 3 . Scheme of the proposed HEC subsystem optical. In the inset is shown an experimental implementation, which exploits serial-to-parallel conversion that is based on e-o wafer gates ͑see text͒.
together, both relational ͑which establishes a fixed mapping between input and output optical beams with their state's being independent of the data͒ and logical. 6 Because logic devices are generally more expensive and critical than relational devices, 14 they are adopted as building blocks of the HEC subsystems in only those cases in which a relational device cannot be used. Let us illustrate briefly each of the four operators into which the architecture has been subdivided.
A. Operator 1
The first operator receives the optical array representing the bits of the header that were converted to a parallel form by the input interface. Operator 1 must perform the replication of the input data to create the matrix Y. Operator 1 can be composed of relational devices. To create a two-dimensional image of the matrix Y, one should divide each beam by 8. Actually, an improved version is possible. Observe Fig. 1 ; it can be seen that the maximum number of nonzero elements in a row of H is five. Therefore some elements of Y are not significant and can be eliminated. In this way the beams composing y can actually be divided by 5 instead of by 8, imposing less attenuation over the optical inputs of the second operator, while preserving equal power for all the replicas of y.
B. Operator 2
Operator 2 performs the AND operation between Y and H. This operator can be implemented in an extremely simple and inexpensive way. As was pointed out above, H is a constant matrix that depends on only the code. Therefore the AND operation rather than a real operation with two input variables ͑C and H͒ can be considered to be a transformation of one variable matrix ͑C͒. In other words, H can be built into the architecture in a way similar to the way in which the generator polynomial was built into the feedback-loop pattern in the serial decoder. Because of this property of the parallel detection algorithm, operator 2 can be implemented by relational devices.
We assume that the received data are encoded by intensity modulation of the optical carrier, which is by far the most common modulation technique in optical communication systems. In this case a simple spatial filter can be employed. The filter, which represents the parity-check matrix H, must absorb optical power in locations corresponding to the 0's in H and be transparent in coincidence with the 1's. This mask could easily be integrated on the output surface of the beam-splitter array of operator 1 or on the input surface of the subsequent logical stages of operator 3. It should be noted that the designer is free to choose the exact geometric shape of this stage, provided the geometry chosen matches that of the adjacent stages.
C. Operator 3
The third operator of the optical HEC system performs multiple-input XOR operations along the eight columns of C. As we noted, all the elements of C that are always 0 can be discarded, and in this way the number of inputs can be greatly reduced. With this simplification taken into account, the number of inputs of the eight parallel XOR circuits are, respectively: 14, 14, 15, 16, 16, 16 , 20, and 15.
For operator 3 relational devices cannot be used because a XOR operation cannot be reduced to simple spatial transformations. In fact, it cannot be performed by passive systems ͑0 ᮍ 0 ϭ 1͒. To implement operator 3 requires that proper logical devices therefore be employed.
A multiple-input XOR operation is logically defined as a sequence of repeated elementary two-input XOR operations:
The most usual scheme for implementing the multiple-input XOR operations used in electronic digital circuits stems from this decomposition. Twoinput XOR gates are usually used as building blocks. By the exploitation of the associative property of the XOR operator 15 it is possible anyway to design various different implementations: with optics it may be more convenient to use XOR building blocks with more than two inputs, as is illustrated in Fig. 4 .
For operator 3 a solution based on smart-pixel matrices 14 can be used: The optical elements of the matrix C are detected by a matrix of photodiodes, and all the logic functions are assigned to electronics. Otherwise, in the case of the fast response times required for the logic processing an all-optical solution based, for example, on fiber-switching technologies 16 can be adopted.
D. Operator 4
The last operator, which performs an OR operation on the eight elements of the syndrome vector s, can be implemented as an optical gate array of logic devices, as for operator 3. Also, in this case there is the possibility of decomposing the eight-input OR operation in various ways, and thus many different logical designs can be carried out.
If the output signal is not bound to assume fixed power levels representing its logical states an optically wired OR can be used instead of a gate-array architecture. This solution is extremely simple: It can be obtained by the collection of the eight beams representing the outputs of the last stage of operator 3 in the same spatial location. The power of the output beam would be simply the sum of the powers of the eight beams.
We would like to discuss in the following some relevant issues concerning the proposed architecture:
• Scalability and latency: Because the HEC protocol is cell-dimension dependent, the proposed archi-tecture is fixed in terms of logic dimension. The only scalable parameter is the transmission bit rate: The use of a very high bit rate ͑for example, 10 Gbits͞s͒ affects the choice of the serial-to-parallel converter only. On the other hand, optical gate-array requirements are not very strict, as the gate-array operation has to be repeated just once in a cell time ͑a 424-bit interval or 42.4 ns in the case of a 10-Gbit͞s bit rate͒. In the case of serial treatment of the information less than 10 ps of response time would be required for the optical gate array. Furthermore, by use of the proposed parallel processing no additional limitations of latency are introduced because the header bits are processed simultaneously in space.
• Power budget: The use of an optical implementation to perform HEC functionality in a parallel way implies an obvious worsening in the final power budget. The double splitting necessary to obtain the space-coded image that is ready for parallel processing entails a net loss of approximately 23 dB for the single processed bit at its arrival in the output logic stage. These losses can be compensated by use of an standard input optical erbium-doped fiber amplifier.
• Physical size: We propose the use of the serialto-parallel converter structure presented in Fig. 3 . 13 The first input interface useful for producing the 40 delayed optical replicas of the cell header can be realized in fiber optics by means of a 1 ϫ 40 splitter and suitable fiber delay lines ͑terminating with a lens array for collimation͒. A 40-spot image with a width of w ϭ 2 mm and a height of h ϭ 100 mm is obtained with each spot size being 2 mm in dimension. To produce the optical gating, one uses an array of electro-optic crystal wafers ͑each one sampling many optical replicas of the cell header simultaneously͒. A cascade of beam splitters and right-angle prisms ͑2.5 mm in dimension͒ constitutes the 1 ϫ 5 splitter ͑with dimensions of l ϭ 7.5 mm ϫ w ϭ 12.5 mm ϫ h ϭ 2.5 mm͒. The different paths covered by the optical bits during their travel through the single splitter introduce temporal delays ͑the largest one of approximately 37.5 ps͒, which are negligible for optical signals as high as 10 Gbits͞s. The mask that is useful for AND operation 2 has dimensions of w ϭ 12.5 mm ϫ h ϭ 100 mm. The whole free-space implementation can be l ϭ 50 mm ϫ w ϭ 20 mm ϫ h ϭ 100 mm in dimension. The narrow length of the free-space section is consistent with the collimation obtained by the lens array in the input. In spite of the large dimensions involved in the implemented system, the freespace solution is to be preferred because it allows the exploitation of massive parallelism with respect to architectures based on waveguiding optics. Adopting suitable integrated micro-optics technology allows dimension limitations to be overcome easily.
Conclusions
We have described a design example of an optical header-processing function. We have chosen to present an all-optical architecture of a HEC decoder and error-detection subsystem for an ATM switching node. This architecture is based on free-space optical interconnections.
This particular subsystem is very suitable for an optical implementation for three main reasons:
• RAM's are not required. On the contrary, the other header-processing subsystems, i.e., label switching and policing, require large look-up tables. This requirement is a large obstacle to the use of optics because optical RAM technology currently is not mature enough to produce cheap and fast devices.
• A parallel-decoding algorithm is available. This allows the design of an all-combination logic architecture, eliminating the need for a shift register. Optical buffers and flip-flops would still be too expensive and too difficult to produce.
• HEC error detection does not modify the header itself but only produces as output a signal to be used inside the node. Thus no requirement must be taken into account with respect to the effects that this subsystem produces on the signals to be transmitted to the rest of the network.
• This subsystem is suited for operation in concurrence with operations of the other headerprocessing subsystems of the node.
The main advantages of the HEC decoder architecture and of its optical implementation based on free space can be summarized as follows:
• Because of the inner parallelism, the switching speed of each single logical gate required to operate the subsystem at a certain data bit rate is lower than that required in the serial version. In a system operating on a parallel data flow an equivalent bit rate can be defined as the bit rate of a single data channel of the flow multiplied by the degree of parallelism. In this case the equivalent bit rate is 40 times the bit rate of the corresponding serial counterpart because the header is a block of 40 bits.
• The architecture proposed above is not dependent on a particular technology. This independence is a desirable characteristic because it allows the ar- chitecture to maintain its validity, regardless of the evolution of photonic technology.
• In a free-space architecture, coupling between devices can be achieved easily, and the physical separation among components allows the mixing of different kinds of components ͑i.e., relational and logical devices͒. In this way each operator can be implemented by the most appropriate technology, according to the function it must absolve.
We have so far dealt with the first function absolved by the decoder of a HEC sequence: detection of transmission errors. As we mentioned in Section 1, HEC sequence decoding also is used in ATM for synchronization at the cell level. A possible future development of the research reported in this paper might be the development of an all-optical ATM cellsynchronization subsystem. In such a subsystem, all the properties of the optical HEC architecture proposed here could be exploited usefully, and an innovative and effective solution could be found for the problem of cell synchronization, which is still an open question in current all-optical networks.
