ABSTRACT
INTRODUCTION
A multi-bit bus with a global common clock has been widely used as a communication architecture for the most of VLSI designs. However, in a deep-submicron and high performance SoC, long and multi-bit bus lines have several problems such as skew, crosstalk, wiring difficulty, and large area. (Figure 1 ). Skew or jitter on multi-bit bus lines makes it difficult to increase the frequency of a global clock because a number of data on the bus must be synchronized with the common clock. Moreover, the crosstalk between adjacent bus lines causes data dependant signal delay and noise, thus finally makes the communication channel unreliable. The area cost of the wide-bit bus in multimedia SoC is serious. Therefore the multi-bit bus communication with a global common clock will reach its limit and make further performance enhancement expensive. Source-synchronous serial communication is one of the key technologies to overcome such problems. Serial communication occupies less area due to less communication lines [1] . Furthermore, the crosstalk problem can be mitigated by wide spacing of serial lines. The source-synchronous serial communication uses a sideband strobe signal along the serial data line as shown in Figure 2 . The strobe signal is a kind of clock signal but it is activated only when the serial data line is valid. It has an identical load and wire delay with the data line, so that the skew between clock and data is minimized. This technique is used in a high-performance memory I/O [2] , and also on-chip interconnection networks recently [1, 3, 8] .
As the die sizes and the number of subsystems on a chip increase, the power consumed by the interconnection structures takes significant portion of the overall power-budget. There were many researches about low-power bus coding but they are mostly for parallel bus [5] [6] . On the other hand, previous works on on-chip serial communications have not considered the power efficiency [1, 3] . In this paper, a low-energy transmission coding method is proposed for on-chip serial communication. This technique reduces the number of transitions on the serial data line to cut down the energy consumption of the transmitter and wires.
The remainder of this paper is structured as follows. We start off by describing power consumption on the serial wire in section 2. Section 3 presents low-energy transmission coding algorithm for the on-chip serial communication. Section 4 shows the implementation of its codec circuits and its performance analysis. Section 5 presents the energy reduction by the proposed technique in multimedia applications and also presents chip implementation results. Section 6 makes the conclusions.
POWER CONSUMPTION ON SERIAL WIRE
The most widely used form of signaling is the classical two inverter configuration with rail-to-rail signal swing. The power consumption of an interconnect channel is given by:
where α is a switching activity factor of the channel, f is a signaling frequency, C is a physical wire capacitance switched
during signal transitions, V DD is a supply voltage, V swing is a voltage swing across the wire, and N is a number of wires of the channel. In serial communications, the wire frequency is multiplied by serialization ratio to support the same bandwidth as in parallel communication but the N is divided by the serialization ratio. Thus, the product of f and N is the same in serial and parallel communication channels. However, the switching activity factor of serial wire, α, is different from that of parallel wires and the difference depends on the data patterns. Figure 3 shows an example for the comparison of activity factors in parallel and serial communications. In this example, 8bit parallel bus has 7 transitions. However, when the same data stream is serialized onto a single wire, the number of signal transitions on the wire increase up to 31 as shown in Figure 3 (b). If there is correlation between adjacent data words, some bits of the parallel bus stay calm without any transition. However, such correlation is not helpful in the serial communication because data bits are multiplexed onto the single wire. Therefore, the activity factor of the serial wire gets higher than that of parallel bus statistically. In common multimedia applications, the most significant bits tend to have high spatial and temporal correlations because of the sign extension or the locality characteristics of multimedia streams [4] . In these applications, the serial communication dissipates more energy than the parallel communication. In the next section, we propose a new coding method to reduce the activity factor on the serial wire.
LOW ENERGY TRANSMISSION CODING
Many parallel bus coding methods have been proposed to reduce the switching power on the address or data bus between a processor and memories [5] [6] . However, such conventional parallel bus coding methods cannot be employed in the serial bus. Therefore, we propose a serialized low-energy transmission (SILENT) coding technique to minimize the transmission energy on the serial wire. We first introduce the terminology and notation that will be used throughout this paper.
[n-1:0]: n-bit data word from a sender at time t B (t) [n-1:0]: n-bit encoded data word at time t
The encoder works as follows:
The encoded words, B (t) , are equivalent to the displacement or the difference between successive data words.
By serializing the encoded data words, the frequency of the appearance of zeros on the wire increases because of the correlation between the successive data words, b (t) . Figure 4 shows an example for the advantage of this coding method. All bits from B [7] to B[3] become zeros after these data words are encoded because those bits do not change with time. Serializing these encoded words reduces the number of transitions of the serial wire as shown in Figure 4 (d) and the wire looks silent. In this example, a conventional serial wire without the SILENT coding, shown in Figure 4 (c), has three times as many transitions from t+1 to t+4. By reducing the number of transitions on the serial wire, the transmission energy can be saved proportionally.
After deserialization at the receiver end, the decoder works as follows:
The original data word from a sender unit, b
, can be recovered by XORing the encoded word, B (t) , and a previously decoded word, b Figure 5 shows the circuit implementation of SILENT codec and the bold line indicates a critical path in the circuits. The implementation of the SILENT encoder and decoder is so lightweight that the area, power, and latency overhead due to the codec become negligible. The power consumption for 32bit data In order to analyze the energy efficiency of this coding scheme,
PERFORMANCE ANALYSIS

ON-CHIP NETWORK FOR A MULTIMEDIA
To evaluate the perform d SILENT coding in a ory access pattern, we evaluated the energy
we evaluate the energy consumption with various data patterns in the communication channel such as encoders, transmitters, 8mm serial wires with repeaters, receivers, and decoders. The energy consumption in the communications depends on the data patterns to be sent. So, we evaluate the power consumption with all possible variations from a random data word. Figure 6 shows the comparison of the average power consumption of the serial communication with and without SILENT coding at 100MHz operating frequency. The x-axis stands for the number of data displacement between successive 32bit data words, b(t). The 0 on the x-axis means that b(t) is the same as b(t-1), and the 16 means that arbitrary 16bits among 32bits, b(t), have changed from their previous values, b(t-1). In result, when the number of transitions between successive data is less than 11, the encoded data words contain many zeros, and thus the encoded serial wire has fewer transitions. Meanwhile, when the number of transitions is more than 22, the encoded data words contain many 1s, thus, the encoded serial wire also has fewer transitions. Therefore the region under 11 or above 22 in the x-axis is energy saving region due to the SILENT coding. However, there is some power overhead for random data transitions at most 14% in a region from 11 to 22. As shown here, the energy saving range is two times wider than the overhead range and the power saving is much larger than the overhead. Therefore, the SILENT coding has lots of opportunity to save energy in the most of data patterns.
In the next section we analyze the performance of the coding method in a real application.
APPLICATION
ance of the propose real application, we trace the transactions of the on-chip traffic between a RISC processor and system memories while a 3D Graphics application is running [7] . Full 3D Graphics pipelines of geometry and rendering operations are executed for 3D scenes with 5878 triangles. Figure 7 shows the distribution of the displacement of the memory address and data for the successive memory accesses. The instruction memory address is so sequential that the 99.5% of 6 million transactions are within the energy saving region. Although the instruction codes are quite random, the 60% is within the energy saving region. In the case of the data memory access, the 79% and 70% of 1.5 million data memory address and data transactions are within the energy saving region, respectively.
With this mem consumption for the serial communications in the environment. In result, Figure 8 shows the normalized average energy consumption on the serial wire with and without SILENT coding. The energy consumption with SILENT coding includes the energy dissipation in the codec circuits. The SILENT coding shows the best performance for instruction address, about 77% energy saving. Even in the random traffic, in the case of the instruction codes, 13% energy saving is achieved. It also saves 40 ~ 50% transmission energy for multimedia data traffic. In conclusion, the where the communication architecture is packet switched on-chip interconnection network [8] . The on-chip network (OCN) serializes a fixed size packet of 80 bits onto 8bit serial wires. Figure 9 shows the overall architecture of the SoC. We integrated two clusters as a prototype; a main cluster and a peripheral cluster. The main cluster contains a RISC processor, an application processor, FPGA, two 64kb memory arrays, and an off-chip gateway for off-chip connection. We assumed that the peripheral cluster is far from the main cluster to emulate a large SoC, thus, two clusters are connected with each other via 5mm small-swing differential serial wires. PLL generates 100MHz clock for main cluster and 1.6GHz network clock for the on-chip networks. The on-chip network supports 3.2GByte/s bandwidth for each processing unit.
The serializer, d integrated in each network interface (NI). The area of the on-chip network is reduced significantly due to the serialization technique [1] . In this implementation, the enable signal of the SILENT encoder is controlled by software, therefore the coding scheme can be turned on and off in runtime for the energy optimization according to each application. Then, the receiver unit should know the currently received packet is encoded or not. Therefore, the information is transferred to the receiver by embedding it into the header unit of each packet. The long error propagation due to the differential encoding can be controlled by disabling the coding periodically by software.
Die micrograph and a performance summar 10. By using the SILENT coding method, about 13% energy saving was obtained on the overall on-chip network architecture.
CONCLUSION
proposed a low-energy method applicable to a multimedia SoC incorporating on-chip serial communications. This coding technique reduces the number of transitions on serial wires using the data correlation between successive data words. We show that the coding method saves significant amount of the communication energy for 3D graphics applications, and reduces maximum 77% of energy for instruction memory access, and 40~50% of energy for data memory access. We implemented a prototype SoC with various processing units communication architecture by using 0.18µm CMOS process. By applying the coding method, about 13% power reduction has been obtained on the overall on-chip networks. For more energy optimization in each application, the coding can be turned on and off by software.
