Abstract-Clock skew and clock distribution are increasingly becoming a major design concern in synchronous pipelined systems. We present a novel high-speed hybrid wave-pipelined linearfeedback shift register that manages clock skew by penitting the clock to travel with its associated data through the pipeline. The wave-pipelined clock has a skew 8.34 times lower than that of a buffered clock and is 1.2 timesfaster.
I. INTRODUCTION
Technology scaling facilitates high clock frequencies and provides the ability to integrate many devices on a chip. Clocks are usually subjected to larger loads resulting in increased clock skew and longer delays in distributing these clocks and other global signals. In this paper, we discuss a high performance hybrid wavepipelined linear feedback shift register with a skew tolerant wave-pipelined clock. The hybrid wavepipelining scheme takes into consideration lntercqnnects delays and data path delays to improve system performance and optimize clock skew 111. In this scheme, the system's clock in conjunction with pipeline stage delays is used to generate wave-pipelined clocks that have short cycle times and accompany the data through the pipeline. Short clock cycle times are achieved by reducing the delay difference between the critical path and the shortest path of a system (2, 31. In hybrid wave pipelining, the intermediate latches allow for delay balancing and the system's delay variations are minimized per stage. The stage with the largest delay variation sets the system's clock cycle time [4] .
Linear feedback shift registers (LFSR) in general are constructed with D-type flip-flops in the forward path and linear XOR or XNOR logic in the feedback path. The 0-7803-8294-3/04/$20.00 @2004 IEEE Linear feedback shift registers sequence through 2"-l states, where n is the number of registers in the LFSR. At the edge of each clock cycle once loaded, the contents of the registers are shifted one place to the right. A 3-bit linear feedback shift register will go through seven clock cycles before the initially loaded pattern repeats again. There are 23-1=7 states for this 3-bit LSFR. The Q,=Q2=Q3=0 state is illegal as it constitutes a lock-up state when XOR gates are used in the feedback path. To further reduce the delays associated with the feedback path clock skew must be tightly controlled. For an LFSR with 16 registers and taps at nodes 4, 13, 15 and 16, it becomes imperative that the output at node 4 and node 16 be available at the same time when the clock makes a transition. We present a hybrid wavepipelined LFSR that manages clock skew and reducea the delays associated with the feedback path.
III. HYBRID WAVE-PIPELINED LSFR
The high performance, hybrid wave-pipelined linear feedback shift register has been designed to be flexible. The user has the ability to choose the length of the sequence, select taps, disable the clock, and perform initialization within a single clock cycle. The design uses a six-transistor D-type flip-flop as a basic cell in the forward path. The basic cell is clocked on both levels of the clock in an effort to reduce idle time within the logic by admitting new data into the flip-flop before the current data is latched at the output latch. This approach allows for at least two unrelated data waves within the flip-flop (a characteristic that distinguishes wave-pipelining from conventional pipelining).
The hybrid wave-pipelined LFSR is designed for high-speed applications such as built-in self test, encryption/decryption and direct spread spectrum just to name a few. The LFSR design accommodates environmental parameter variations and the reported maximum clock frequencies can be maintained over the range of temperatures supported by the technology. It is recognized that the basic register cell appearing in Figure 2 The wave-pipelined clock experiences the same delays as the data resulting in the design's operation frequency being limited by the data path delays, the register set-up and hold times and wire delays. The clock generating circuitry is designed to mimic the data path, setting the clock's amplitude and pulse width. Us" of logic to determine the amplitude of the clock allowr the design to detect logic levels resulting in possible reduction of current sourcing or sinking. Dynamic nodes can thus experience brief switching activities. The wave-pipelined clock is designed based on this data path to accommodate the clock skew. In the Figure 2 , the data path is nothing more than two pass transistors and two inverters all in series and therefore easy to mimic. Figure 3 shows the clock generation circuitry. The devices in this circuit are designed such that they determine the pulse width and amplitude of the clock as aforementioned. The clock frequency is not set based on the knowledge of the system's worst case operation, but determined by the logic of the system's data path. Allowing the clock to accompany the data through the stages of a pipeline reduces clock skew since the clock now experiences the same delays as the data.
The delayed system clock labeled A in Figure 3 provides a path to ground whenever it is high and places the wp-clk node at logic I whenever it is low through transistors N2 and PI respectively. The system clock (ref_clk) is input to transistor N, preventing the wp-clk signal from being the exact inverse of the delayed clock and serving to charge up node B just before evaluation occurs. It must be pointed out that the wave-pipelined clock (wp_clk) node has a brief window during which it floats. This occurs when signal ref_clk goes to logic 0 while the signal driving pass transistor N2 is at logic 1. This is a potential problem particularly for the physical chip's proper operation. The circuit therefore requires some minor modifications, as critical designs would not tolerate floating clocks. The wave-pipelined signal generated by the circuit above appears in Figure 4 . The inverse of the wavepipelined clock is also shown. The wave-pipelined clock has 2.5 volts as its highest value denoting logic I and its pulse width is far much shorter than that of the reference clock (refclk). Permitting the logic to determine the signal pulse widths and amplitudes results in a clock pulse width reduction of 55 percent. This implies that the clock cycle time can be improved considerably. Having the logic determine pulse widths and amplitudes of the clock permits the feedback logic to receive inputs early, thus reducing the delays associated with the feedback path to just the sum of the XOR gate delays in the path. This is a direct result of the hybrid wave-pipelining approach, where the delay differences are reduced per stage and the intermediate latches u*sed to balance the delay paths, Wavepipelining the clock allows the system to maintain high clock speeds even with added logic in the feedback path. An increase of logic stages in the feedback path can occur when the number of taps within the shift register is increased. The Q4 and Q16 outputs were considered because they are the two tap points far apart and having larger parasitic capacitance than the outputs without taps. The flip-flop outputs (Q4 and Q16) for the system with a wave-pipelined clock (Figure 6 ) need to be magnified in order to make the skew more visible since &061ps is such a small duration. These measurements represent the worst-case skew for both cases. The hybrid wavepipelined clock operates at a frequency 1.2 times higher than that of a buffered clock. It has also been shown in this paper that clock cycle time can be reduced by 20 percent when the hybrid wave-pipelining scheme is employed. This paper considers a configuration of the LFSR's feedback logic that enables for the reduction of delays associated with the feedback path. Some logic gates in the feedback path are placed in parallel instead of the typical series configuration. This contributes to the design's capability of maintaining high-speed operations with minor degradation in performance even when more logic gates are added in the feedback path. The recorded results associated with the feedback delays would be more meaningful when compared to those of an implementation without logic in the feedback path. Further studies are underway to evaluate the scheme's potential in reducing the power associated with. the clock networks. 
