Abstract -This paper describes the design and prototyping of EMS, a telecommunication intellectual property soft-core developed in the scope of industry-academia cooperation. EMS performs insertion (mapping) and extraction (demapping) of EI channels into/from Synchronous Digital Hierarchy (SDH) frames. The basic SDH frame is transmitted in 155.52 Mbps rate, allowing to pack up to sixty-three 2.048 Mbps El channels. El channels belong to the Plesiochronous Digital Hierarchy (PDH). The paper addresses the solution of several synchronization problems implied by the El channels mapping/demapping process. EMS was fully described in RTL VHDL. It was functionally validated by simulation and prototyped in FPGA platforms. Together with the exploration of the techniques involved in embedding PDH into SDH frames, another contribution of the work is the availability of a reusable and parameterizable telecom core with high performance, low latency, and small size.
INTRODUCTION
Over the last three decades, communication networks are evolving from analog to digital [I] , which results in better transmission quality and larger bandwidth. In many areas, this technological change significantly boosted telecommunication systems research. With the advent of the WorId Wide Web comes an unprecedented increase in data traffic. The telecommunication systems complexity is growing faster, due to: (i) media and protocol diversity; (ii) applications nature variety; (iii) increased data communication speed. To cope with these fast changing scenarios, new telecommunication technologies are necessary. The Synchronous Digital Hierarchy is one example technology, due to its high data communication speed and high capacity to pack telecom protocols with different natures, like El, ATM and others.
This work introduces EMS, an EI channel MapperlDemapper into/from SDH frame. EMS is an intellectual property (lP) soft-core developed in the scope of industry-academia research and development cooperation. EMS is a successor of an EI IP soft-core design [2] developed within the scope of the same cooperation. An lP core is a pre-designed and pre-verified hardware module used in combination to compose larger circuits, typically custom VLSI integrated circuits or large programmable devices, such as multimillion-gate FPGAs. According to availability, IP cores can be classified into soft, firm or hard cores [3] . Soft-cores are open-source codes, enabling high flexibility. Soft core design may not guarantee some functionality, such as exact timing in the final implementation. IP cores are increasingly important resources for complex digital systems design in general and particularly for telecom systems. According to ITRS report [4] , by 2012, 90% of complex silicon chips surface will be composed by cores. In addition, approximately 50% of the programmable devices sold today are used in telecom applications [5] .
The goal of this paper is to present the design of the EMS core. The main contributions are: (i) the design of an IP soft-core compliant with specific ITU-T standards [6] [7] [8];
(ii) the prototyping of the system in hardware to guarantee timing constraints using Xilinx Virtex FPGAs; (iii) present results of accurate buffering analysis to perform EI mapping and demapping into/from SDH frames; (iv) make available a design with small footprint, which enables straightforward scaling and IP reuse in larger circuits.
The rest of this paper is organized as follows. Section 2 summarizes some important aspects of the plesiochronous digital hierarchy, which defines the characteristic of El
Revista da Sociedade Brasileira de Telecomunica~oes
Volume 20 Numero 02, Agosto de 2005
carriers. In Section 3, some basic concepts of SDH are explained. Section 4 presents reviews previous works related to SDH mappers. Section 5 introduces the EMS architecture. Section 6 describes the EMS validation and prototyping, while Section 7 presents some conclusions and future work.
2.
PLESIOCHRONOUS HIERARCHY
The term plesiochronous describes communication system where transmitted signals have the same nominal digital rate but are synchronized with different clocks. The Plesiochronous Digital Hierarchy (PDH) is a hierarchy of data and voice transmission systems that communicate using plesiochronous synchronization. PDH is a conventional multiplexing technology for network transmission systems.
European/South-American and North-American/Japanese versions of the PDH system differ slightly, but the operating principles are the same. North-American/Japanese basic data transfer rate is a stream transmitted at 1. PDH can combine multiple El channels generating other data and voice streams, such as 8 Mbps (E2), 34 Mbps (E3), 140 Mbps (E4) and 565 Mbps (E5) which are used in several kinds of communication systems. As an instance, 565 Mbps is typically used to transmit data over an optic fiber system for long distance communication.
3.

SYNCHRONOUS HIERARCHY
The Synchronous Digital Hierarchy (SDH) and the Synchronous Optical NETwork (SONET) are hierarchies used in Europe/South-America and North-America/Japan, respectively. Both systems employ synchronous time division multiplexing techniques to transmit different tributaries (EI, Ethernet, ATM, etc) through the same physical channel. A primary goal in the development of the SDH/SONET formats is to define a synchronous optical hierarchy with sufficient flexibility to carry payloads of different types. SONET and SDH are based on transmission at rates that are integer multiples of 51.840 Mbps. SONET basic frame structure is called synchronous transport signal level one (STS-l). SDH basic modular signal is called synchronous transport module level one (STM-I). The STM-I rate is an extension of the basic STS-I (for this reason also called STM-O) and operates at 155.52 Mbps, carrying three interleaved STS-l frames.
Synchronous hierarchies differ from PDH in the exactness of data transport rate. SDH systems are tightly synchronized to network base clocks, making the entire network operate synchronously. The structure of SDH synchronization network (SSN) is founded on master-slave mode hierarchy of clock. The SSN highest hierarchy level is performed with a high precision clock defined as primary reference clock (PRC) [9] and is generally implemented by atomic frequency oscillators. The remaining of the hierarchy is organized as a tree. The reference timing-signal generated by PRC is distributed to the clocks of lower hierarchy levels, which are named slave clocks (SCs). SC tracks the reference-timing signal by means of phase-locked loop (PLL) systems.
SDH is a multiplexed structure. Different containers (C II, C-12, C-2, C-3 and C-4) with different rates are mapped to virtual containers (YC-ll, YC-I2, YC-2, YC-3 and YC 4). Pointers implement virtual container alignment, generating tributary units (TU-ll, TU-l2, TU-2 and TU-3) or administrative units (AU-3 and AU-4 l. Tributary units are multiplexed in tributary unit groups (TUG-2 and TUG 3) according to container rate. TUG-2 can be multiplexed in YC-3 or TUG-3, and TUG-3 is multiplexed in YC-4. Administrative units are grouped in administrative unit group (AUG). Finally, AUG is multiplexed in one or more STM-ls.
This work focuses only in the path highlighted in Figure  l . The El channel is packed into C-12, implying the insertion of a control and stuffing byte. C-12 is mapped into YC-12. A pointer implements virtual container alignment, generating TU-l2. Three TU-12 are multiplexed into TUG 2, seven TUG-2 are multiplexed into TUG-3, and three TUG-3 are multiplexed into YC-4. Pointers implement virtual container alignment, generating the AU-4. The AU-4 is grouped into the AUG, and the AUG is multiplexed into one or more STM-ls. In each SF, there are four different kinds of TU-12 PTR (VI, V2, V3, and V4). VI and V2 pointers are joined to compose the V5 address, which marks the beginning of C 12. When POH and SOH clock frequencies are synchronized, there are 1024 bits of useful data transmitted in 4 El channels. Otherwise, it is possible to add or subtract R -stuffing b)1e r -stuffing hit one bit from each SF. The C-12 structure of S~' merges two or three extra bytes with 32 or 31 El bytes. The first C-12 encloses two stuffing bytes and an El channel. The second and the third C-12 of each SF contain one control byte, one stuffing byte and one El channel. The last C-12 of each SF encloses one control byte, seven data bits and a justification opportunity bit, one stuffing byte and 31 bytes of E I channel. The majority of CIS and C 2 s bits implies positive or negative frequency justifying. This means that when two or more Cj bits are 1, the Sj bits are valid data. This, in tum, means that 1025 bits of data are available for SF. On the other hand, if two or more Cj bits are 0, the Sj bits are stuffing bits, meaning that 1023 bits of data are available for SF. The number of bytes implies slight changes of clock frequency. For instance, 1025 bits implies 2,050 KHz and 1023 bits implies 2,046 KHz. This frequency change allows mapping/demapping a POH frame into/from an SOH frame without data loss.
4.
RELATED WORK
The demand for telecommunication services leads to a significant offer of SOH systems in the market [10] [11] [12] . Each distinct equipment presents its own distinguishing features, different prices, and use cases. In addition, the high complexity of SOH systems like asynchronous to synchronous mapping, design for testability, abstract modeling and others, drives the research in this field. Lin et al. (1994) propose a flexible architecture for implementing an SOH STM-l Add-Orop multiplexer [13] . They implemented an internal Telecom Bus-like architecture, to provide El and E3 communication services between internal and external circuits (adapters/converters). Yongming et al. (1996) considered three ways of mapping asynchronous 2.048 Mbits/s tributaries into SOH VC-12: asynchronous, bit synchronous and byte synchronous [14] . The authors focus on the asynchronous mapping, discussing positive/zero/negative justification to improve the capacity of elastic buffer store. Fuqiang et al. (1996) provide a similar analysis, quantifying the best elastic buffer sizes for POH to SOH and SOH to POH conversions [15] . Xiaoru and Lieguang (1996) prototyped and verified an SOH STM-l in 8 3190/3090 Xilinx FPGAs [16] . The final target implementation was a O.5llm three-layer CMOS gate array. Thalmann et al. (1999) report an architecture of an Add-OroplTerminal-Multiplexer for SOH, allowing to integrate all digital functions into one ASIC [17] . The idea is based in two approaches: buffer usage optimization and embedded processor, which substitutes various large hardware blocks.
Clauberg et al. (1999) introduced a scalable modular architecture for SOH/SONET technology, by exploiting the regular multiplexing principle inherent to this hierarchy [18] . They demonstrated the feasibility of their architecture with a framer chip, able to handle 4 STM-l and variations of STM-4.
Herkersdorf et al. (2000) complemented Clauberg works by covering the mapping of ATM, IP and T IIT3 traffic streams into SOH/SONET ranging from OC-l to OC-48/STM-16 [19] .
Revista da Sociedade Brasileira de Telecomunicac;oes Volume 20 Numero 02, Agosto de 2005
Rower et al. (2000) implemented an SOH/SONET input block using a new paradigm called programmable intellectual property [20] . In it, modules can be reconfigured by downloading new software versions into IP embedded processors.
Peng, Oepeng and Lieguang (2000) developed THXC, an SOH cross-connected ASIC with an embedded BIST circuit [21] . THXC is programmable and monitored by an external computer, designed to allow various switching rates and enables cascade connections among several identical chips.
Silveira and Van Noije (2000) presented the modeling of an El mapper for SOH Systems, pointing out the difficulties to implement them, due to the synchronization mechanisms and the nature of the POH infornlation carried in SOH frames [22] .
Baechtold et al. (2001) implemented a single-chip for 4xOC-3c, OC-12c SOH/SONET framing [23] . The chip features: low power consumption, integrated clock recovery that fulfils the ITU-T, Bellcore and ANSI jitter requirements, and functions to enable low-cost digital cross connect and add/drop multiplexing systems.
The multiplexing section overhead (MSOH) is a central part of SOH circuits since it treats many frame errors. Torres et al. (2003) presented an MSOH processor for STM-O/STS-l to STM-4/STS-12 [24] . Their work purpose was to show the requirements specification, architecture and verification of such systems.
The present work introduces the EMS architecture, which is similar to Peng et al. (2000) system with the scalability introduced by Clauberg et al. (1999) . In addition, the proposed approach is distinct from that of Yongming et al. (1996) and Fuqiang et al. (1996) , achieving better elastic buffers sizing for PDH to SOH mapping and vice-versa.
5.
THE EMS ARCHITECTURE
EMS is a scalable architecture, allowing from 1 to 63 El channels mapping/demapping into/from an SDH frame. The smallest functional EMS operates with one El channel and is called Basic EMS, or simply BEMS. When EMS operates with full capacity (63 El channels), it is capable of dealing with an entire SOH STM-l being called in this case STM-l EMS, or simply SEMS.
The BEMS external interface is comprised by three signals sets as depicted in Figure 4 and described in Table  1 To Implement EI channel mappmg mto SDH frames, the EMS core adds data from an El channel (ElIN signal) to the Telecom Bus (DTBDATAOUT signal). The El demapping from SOH is implemented by receiving data from the Telecom Bus (DTBDATA signal) and mapping these to an El channel (EIOUT signal). The selected EI channel is addressed by the CHANNEL signal, which allows specifying one out of 63 channels. Five main modules and an auxiliary logic circuit (channel multiplexing, buffering, some control, and glue logic) compose the BEMS system (see Figure 5 ):
l. Figure 6 . SEMS internal module structure.
COLUMNADDRESS MODULE
The ColumnAddress module receives as inputs the Telecom Bus control signals and generates the 11 pointer (payload start), VI pointer (TU-12 start) and the number of each column present in the payload (through the colurnnAddress signal). The column number allows the system to locate specific data of a channel in the Telecom Bus. The J I pointer corresponds to the first column number. The VI pointer allows the system to locate the VC-12 internal pointers (e.g. V5 pointer).
DELAY MODULE
The Delay module generates the replace control signal, responsible for defining the exact moment to insert data in a valid VC-4 column.
5.3
V5ENABLE MODULE
The V5Enabie module searches for valid data (dataValid signal) in the Telecom Bus. It also indicates the super-frame start (superFrarneStart signal). In addition, the V5Enabie module stores data from Telecom Bus and forwards them to VC12Drop and VC12Add modules. The module also detects valid columns through a 30 table mapped into a ROM-like structure. The table contains the first TU-12 channel number inside VC-4. For example, 9 ("000100 I") is the channel I address (second position of the partial VHDL code in Figure 7 ). The first word of the ROM corresponds to the channel a and is not used. The remaining 63 words correspond to each one of the 63 channels inside of VC-4 payload. constant channel-position: rom := ("1111111", --don't used "0001001", "0011110", "0110011", "0001100", "0100001", "0110110", "0001111", "0100100", "0111001", "0010010", "0100111", "0111100", "0010101", "0101010", "0111111", "0011000", "0101101", "1000010", "0011011", "0110000", "1000101", "0001010", "0011111", "0110100", "0001101", "0100010", "0110111", "0010000", "0100101", "0111010", "0010011", "0101000", "0111101", "0010110", "0101011", "1000000", "0011001", "0101110", "1000011", "0011100", "0110001", "1000110", "0001011", "0100000", "0110101", "0001110", "0100011", "0111000", "0010001", "0100110", "0111011", "0010100", "0101001", "0111110", "0010111", "0101100", "1000001", "0011010", "0101111", "1000100", "0011101", "0110010", "1000111"); Figure  7 .
ROM-LIKE STRUCTURE CONTAINING TU-12 ADDRESSES.
Based on the first occurrence of each channel, the V5EIlabie module computes the others three TU-12 occurrences into the VC-4 payload, by adding 63, 126 and 189 to the first column value, as is illustrated in the VHDL code of Figure 8 To extract VC-I2 from TU-I2 it is necessary to remove the bytes corresponding to V I, V2, V3 and V4 pointers that are represented by TU-12 PTR in Figure 2 . These 4-byte positions are also provided by the colunmAddress signal. It is possible to observe in the partial VHDL code of Figure 9 that data is valid (da taValid =1) only if its address is different from the pointer addresses (I, 36, 72 and 108, respectively). In this case, a valid data is provided through the da taOu t signal. The V5 poinll'r address <represented by TU-12 POH in Figure 2 ) i~ computed joining the two least significant bits of the V I pOlntl'r and all eight bits of the V2 pointer. This results in an oll~l'l. shown in Figure 3 . The VC-I2 address is obtained ~1;lrtin~ at the V2 pointer. For example, if the address or "S p,lInter is O. it means that this pointer is located in thl' rlr~t hyte after the V2 pointer. The pointers VI, V2. \', ;lIld "'+ must be skipped. When the offset is between () ;Ind''+. it means that V5 starts after V2 and before V.' l)lllllllT~. heing necessary to add 37 to the offset, to compose thl' ,·S pointer address. When the offset is between 35 and h(). 11 is necessary to add 38 to the offset, skipping thl' h~ ll'~ ahove the V3 pointer. The last case is when the olhl't i~ hetween 105 and 139. In this case, the offset must hl' (kcreased by 104. The resulting address will be between \ I and \'2 pointers. To search the 144 bytes of TU-12. thi~ nhllluk uses a counter called counter144 (see Figurl' 9l . Thl' counter144 counter is set to 0 when the V I poillll'l i~ detected. When the value of this pointer is equal to thl' ,·S pointer address, the beginning of the super frame is signakd through the superFrameS tart signal.
VC12DROP MODULE
The VC / ~ /)/11/1 lIlodule is responsible for extracting data from the Tl'ieclllll Bus and sending these to the El output channel. This module has an internal 64-bit circular FIFO buffer. The \\ rile pointer starts pointing at position O. To avoid data loss. the FIFO is read only when half of it is written. In other words, when the write pointer reaches position 32. the system starts the drop operation. After this, the FIFO reading operation is continuously active, and the reading clock is adjusted according to the difference between the write and read pointers, expressed by the DELTA signal.
This operation performs zero/negative/positive frequency justification. If DELTA is greater than the FIFO length, there is data loss. If DELTA is too smaiL the system can drop wrong data. The FIFO is dimensioned to avoid these problems.
A hysteresis mechanism was implemented to control the variation of the DELTA signal. It allows keeping minimal and maximal distances between read and write pointers before executing positive or negative justifications. Figure  10 shows this behavior. When the DELTA signal is between the minimal and maximal values, the system operates at the nominal frequency (2. Figure 10 . Hysteresis behavior for VC12Drop.
As depicted in Figure 3 , the control justification bits (CIS and C 2 s) indicate if Sj and/or S~ are valid data bits, When the justification bits indicate valid data. bits contained in SI and/or S2 must be written to the FIFO. At the nominal frequency, only one of them is valid data. When SI and S2 are valid data, the amount of data to write in the FIFO is increased, and the DELTA signal is also increased, reaching the maximal hysteresis value. In this case, the reference clock is divided by 31, to obtain a 2.114 MHz frequency. This higher frequency reduces the DELTA signal to the normal range thus recovering the nominal frequency. When SI and S2 are not valid data bits, they are not written to the FIFO, making the DELTA signal reach the minimal hysteresis value. In this case, the CKEIOUT frequency must be decreased, dividing the reference clock by 33, to obtain a 1.986 MHz frequency, This lower frequency increases the DELTA signal to the normal range and as a result, the nominal frequency is recovered.
The FIFO and hysteresis limits have to be dimensioned to avoid data loss, increased latency and memory usage. Short FIFOs can cause data loss, since bytes are written in burst using the 19.44 MHz clock frequency, while reading is continuously performed at a 2.048 MHz clock frequency.
Large FIFOs can increase memory cost and latency, and lead to improper operation. This is due to the time that the output frequency stays different from the nominal. Even with good FIFO dimensioning, the circuit may operate improperly due to the difference between minimum and maximum hysteresis limits. If the limits are too near or too far to/from each other, the output frequency will change too fast or too slow, violating the standard. This may, in tum, damage the signal recovery by an external El processing module. Figure II shows the DELTA signal behavior for nominal clock operation, considering a FIFO with 64 bits, and maximum, medium and minimum hysteresis limits of 48, 32 and 16, respectively. Experimental data showed that the chosen hysteresis limits are adequate to respect ITU-T standards,
31
Cesar Figure 12 shows a partial simulation of the FIFO 1. DELTA is the difference between writeCont and operation. The readCount signal (read pointer) is readCount with regard to the FIFO size. Since the incremented at each CKEIOUT cycle, because the FIFO is buffer is implemented as a circular FiFO with 64 bits, always being read. The FIFO is written in bursts only when there are two formulae to compute DELTA: (i) If there is information to be dropped to the external EI readCount> writeCont, then DELTA f-FiFO size channel (i. e. when the flagwrite signal is equal to I).
readCount + writeCont (e.g. 64 -42 + 12 = 34); (ii) Salient features in the simulation are numbered in Figure 12 Else DELTA f-writeCont -readCount (e.g. 20 -0 = and explained below. Figure 13 shows another example simulation of the YC12Drop module, highlighting the frequency justification operation. The mark 1 shows the reference clock divided by 32 (limitCounterClock is 31, meaning that the range of the counter is from 0 to 31). Mark 2 shows the reference clock being divided by 33 (limi tCounterClock is 32). This new value of limi tCounterClock implies an increase of the clock period, meaning that fewer bits will be consumed by the E 1 output channel.
VC12ADD MODULE
The function of the VC12Add module is to insert data from an external El channel into the Telecom Bus. Data are received at CKElIN rate, an operating frequency that may vary. According to this variation, the VC12Add module executes the frequency justification through S\ and Sl justification opportunity bits, and C\ and C 1 justification control bits. The justification frequency is based on a hysteresis mechanism, analogous to the one used by the VC12Drop module. For data insertion, a 128-bit FIFO is needed, with minimum and maximum sizes of 32 and 96, respectively. The FIFO size of the VC12Add module is bigger than the corresponding FIFO of the VC12Drop module. This occurs as a result of the analysis of worst case synchronization conditions between PDH to SDH. These results pointed that during the Add operation the number of bits inserted may vary more widely than during the Drop operation.
When PDH and SDH clocks operate at their respective nominal frequencies, only one of SI or Sl is used as valid data bit. When the CKElIN frequency is higher than the nominal value, the amount of data written into the FIFO is increased and the DELTA signal reaches the maximum f hysteresis value. This increases the reading of El data input and adds to the amount of data into Telecom Bus, avoiding data loss. These extra Telecom Bus data must be inserted into S\ and Sl bits and the justification control bits must be set to 1. When the CKElIN frequency is below the nominal value, the amount of data written to the FIFO is less than the amount of data read. As a result, the DELTA signal reaches the minimum hysteresis value. To avoid reading incorrect data, it is necessary to decrease the Telecom Bus reading speed. Thus, S, and Sl bits are not filled with valid data and the justification control bits are set to O. Figure 14 illustrates the DELTA signal behavior for the VC12Add module, considering nominal (2.048 MHz), high (2.050 MHz) and low (2.046 MHz) EI input clock values. With CKElIN low frequency, there are more Telecom Bus data processing than EI input data generation. To avoid reading incorrect data, the VC12Add module reduces the number of valid data bits in each super-frame to 1023. Exactly the opposite happens when the EI input clock (CKElIN) has a frequency higher than the nominal value. To avoid data loss, the VC12Add module increases the number of valid data bits in each super-frame to 1025.
VALIDATION AND PROTOTYPING
One major difficulty with the EMS functional validation step is the number of simulation cycles needed to verify each design aspect, and the huge amount of output data produced and consumed during a simulation expected to provide even a moderate covering of the design characteristics. To alleviate the problem, external software was written. A parameterizable test pattem generator software creates test multiframes according to the parameters and the circuit under test. A test pattern analy::..er software compares simulation results (output files) against the input files and input parameters.
The validation process was conducted in three scenarios of increasing complexity. The first scenario is a loop verification, which considers the Telecom Bus and AddDrop circuits separately. The Telecom Bus circuit corresponds to the left side of Figure 5 , comprising the buffer, column address and multiplexer. The second scenario, BEMS verification, evaluates the mapping/demapping of one EI channel into/from an SDH frame. The last scenario, global verification, evaluates the effects of mapping/demapping multiple E I channels into/from an SDH frame.
In the loop verification, depicted in Figure IS , the goal is to test the VCI2 FIFO operations and the SDH bypass path. The testbench code in this scenario has also a set of VHDL assert conditions to detect exceptions and critical operations (e.g. DELTA errors).
Parameters
Result analysis Next, the BEMS ver({ication is performed, as is depicted in Figure 16 . From the EI input channel to the SOH frame, generated data is packed into SOH as detailed in Figure 2 . The pattern analyzer program extracts El information from the SOH output file and compares it to the input file. At the same time, the opposite flow (from the SOH frame to the EI output channel) is evaluated: the generated data is unpacked from SOH frame to El. The pattern analyzer then compares the unpacked data against the SOH input file. The results allow capturing data losses and timing. The global verification is a generalization of the BEMS verification, as depicted in Figure 17 . The global verification evaluates the FIFO behavior due to the differences of the SOH production/consumption according to all El channels consumption/production. Underflow and overflow FIFO conditions are also evaluated. In this case, pattern analyzers are able to evaluate each channel separately and the join effect of all channels. BEMS has been described in 2150 lines of RTL VHOL. The description is portable, except for the FIFOs circuits, implemented using Xilinx FPGAs Block SelectRAM primitives. Once the design was validated at the functional level, the EMS was prototyped and validated in hardware. The VCC VW-300 prototyping platform was employed. This board contains a 300,OOO-gate Virtex FPGA. The BEMS design occupies 314 slices of 3,072, i.e. 10% of the FPGA device. The system is operational, fulfilling all design constraints of the original specification. Since the SOH is relatively small and has small latency (9 clock cycles), it allows cascading several instances of it in a single system. 7 .
CONCLUSIONS AND FUTURE WORK
The main contributions of this work are: (i) the development of the EMS soft-core, which performs mapping and demapping of EI channels into/from SOH; (ii) The development of a buffer technique for enhanced frequency justification control; (iii) the validation technique that reduces design time.
Validating this otherwise small circuit proved to be a demanding task, which required the development of specific software tools. These tools allowed to explore the correctness of the generated outputs with a good degree of accuracy and coverage. This has been confirmed by running the circuit in a real world environment.
Our approach takes 9 cycles for frame propagation latency for any number of VC-12 implemented in STM-l, since all VC-12 circuits operate concurrently. Besides the low propagation latency, the EMS core has a small size, enabling the use of low-cost programmable devices. The main difficulties faced during this work were: (i) Understanding of the ITU-T rules for SOH systems; and (ii) The amount of data needed to generate and analyze for guaranteeing a minimum degree of coverage during the validation of the core.
Several directions for future work are being currently considered. One of these is the encapsulation of other POH hierarchy carriers in SOH, such as E2 and E3. Another is the prototyping of multiple BEMS modules in a single FPGA to confirm area overhead needs for communication and control of practical circuits. 
GLOSSARY OF ACRONYMS
