This paper presents extremely low-complexity boolean logic for the generation of coefficients suitable for filtering or correlation of scalable complete complementary sets of sequences (SCCSS). As the unique autoand cross-correlation properties of SCCSS are of broad interest, the simplicity of the proposed coefficient generation technique allows arbitrarily long SCCSS to be used in resource constrained applications. Abstract-This paper presents extremely low-complexity boolean logic for the generation of coefficients suitable for filtering or correlation of scalable complete complementary sets of sequences (SCCSS). As the unique auto-and cross-correlation properties of SCCSS are of broad interest, the simplicity of the proposed coefficient generation technique allows arbitrarily long SCCSS to be used in resource constrained applications.
I. INTRODUCTION
Spreading sequences are fundamental to numerous signal processing applications such as spread-spectrum communication systems and image processing. Although the quality of a given spreading sequence is largely dependent on its autocorrelation and cross-correlation functions, the computational effort needed to generate the sequence is also an important consideration. For example, even though an evolutionary algorithm can find sequences that asymptotically meet almost any arbitrary requirement [1] , resource constraints force many applications to use general-purpose sequences such as WalshHadamard, Gold or Kasami sequences.
Spreading sequences are usually applied via a digital correlator or filter. To minimize gate count, it is often desirable to pool hardware for these kinds of expensive operations. For example, a single bank of hardware multipliers may realize different filters at different times to avoid unnecessarily duplicating logic.
With a goal of minimizing the cost needed to parameterize a shared filter or correlator, this paper shows how coefficients for powerful scalable complete complementary sets of sequences (SCCSS) [2] can be generated using as little as one or two boolean operations per filter tap. This is significant since lowcomplexity low-power applications are often unable to store large codesets in read-only memory (ROM).
As well as having very attractive auto-and cross-correlation functions, as detailed below, SCCSS have three other distinguishing characteristics. First, scalability means that each set includes all sets of smaller size. This is of particular benefit to communications systems as it can provide adaptable user-specific process gains. Second, completeness makes SCCSS more efficient than randomly generated sequences. For example, a SCCSS-based spread-spectrum mobile telephony system could offer higher data rates and/or support more users. Thirdly, the complementary property makes it possible to design novel synchronization algorithms that exploit the periodic zeros in the cross-correlation and auto-correlation functions for faster convergence. Fig. 1 and Fig. 2 show respective examples of the autocorrelation and cross-correlation functions of a SCCSS relative to sequences derived from Walsh-Hadamard and pseudorandom number generators (PRNGs).
Both examples use codesets comprised of 8 sequences of 64 chips each. Worst-case results are considered by taking the highest correlation for a given offset n over all 8 codes. It can be observed that while SCCSS and Walsh-Hadamard sequences are perfectly orthogonal at n = 0, the PRNG sequences are only ever approximately orthogonal. Further, when n > 0, we see that the Walsh-Hadamard sequences have significant sidelobes that are much higher than those of SCCSS. The complementary nature of SCCSS, further discussed in Section II, is also apparent by virtue of how the auto-correlation and cross-correlations are perfectly orthogonal about all offsets that are integer multiples of 8.
This paper is organized as follows. In Section II, we begin by reviewing the construction and properties of SCCSS. Then, in Section III, we derive novel combinatorial logic that can generate coefficients appropriate for use in reconfigurable finite impulse response (FIR) filters. Using a VHDL example, we evaluate the complexity of the proposed scheme in Section IV. Finally, Section V concludes the paper and identifies future work.
II. THEORY
A Hadamard matrix of order N is a square {−1, +1} matrix of dimension N × N that satisfies
where I N denotes the N × N identity matrix and H T N denotes the transpose of Hadamard matrix H N . Further, a Golaypaired Hadamard matrix is defined by [2] as
When H 1 = 1 and H N = H N , (2) defines the well-known Walsh-Hadamard family of matrices. Alternatively, when H N is constructed by commutating the upper-and lower-halves of H N , we have the basis to build sets of mutually orthogonal Golay-paired Hadamard matrices. Golay-paired Hadamard matrices can also be generated through a closed-form expression [2] . Specifically, the value at row i and column j of the k th matrix in a set of order N = 2 n is denoted by
where
and k r denotes the r th bit in the radix-2 expression 1 of k as per
and likewise for j r and i r .
1 All radix-2 notations are written MSB to LSB For example, the k = 0 Golay-paired Hadamard matrix in the set of order N = 4 is
A complete set of N mutually orthogonal Golay-paired Hadamard matrices of dimension N × N denotes a SCCSS of order N = 2 n with each sequence in the SCCSS 2 2n chips in length. For example, when n = 3, we can construct 2 n = 8 mutually orthogonal Golay-paired Hadamard matrices of dimension 8 × 8 via (3) with each matrix forming a single sequence in the SCCSS. If we denote the k th sequence of a SCCSS of order 2 n as c
2 n (t), then we can denote each chip in the sequence as
where 0 ≤ i < N and 0 ≤ j < N . The complementary property that distinguishes SCCSS means that each sequence in the set is orthogonal to all other sequences in the set about shifts of integer multiples of the set size. For example, if we choose two sequences c (x) 2 n (t) and c (y) 2 n (t) such that x = y, we can state that the normalized auto-correlation and cross-correlation are respectively
and
where K = {1, 2, ..., 2 n − 1}.
III. DIGITAL SEQUENCE GENERATION
As the realization of a digital filter can greatly increase a design's total gate count, it is often desirable to have mutually exclusive processes share hardware. In other words, rather than duplicate the filter logic, it can be more efficient to use a single digital filter with variable filter coefficients [3] . Fig. 3a shows a portion of a generic FIR filter where incoming data samples d(0) through d(t) are respectively multiplied by the filter coefficients c(0) through c(t), with the results added together to create the output y. Digital implementations of this filter structure are well understood [4] [5] and it can be easily seen that N multiplications and N − 1 additions will be required for N taps. If the filter coefficients c(t) are limited to {1, −1}, as is the case when correlating binary spreading codes like SCCSS, multiplication by c(t) is equivalent to addition and subtraction for c(t) = 1 and c(t) = −1 respectively.
d (2) d (3) d (4) d (5) d (6) d (7) d(0)
d (2) d (3) d (4) d (5) d (6) d (7) c (0) c (1) c (2) c (4) c (5) c (6) c (7) c ( Since efficient pipelined digital adder/subtractors are limited to two inputs, as shown in Fig. 3b , the large N port adder of Fig. 3a must be converted into an adder-tree with log 2 N levels as shown in Fig. 3c . Modified filter coefficientsĉ(t) = 0, 1 determine how each adder/subtractor is used. In other words, an adder/subtractor implements R = A + B ifĉ(t) = 0 and R = A−B ifĉ(t) = 1. Note that there is no loss of generality by limiting each adder/subtractor to A±B rather than ±A±B [6] .
With the cost of the shared filter hardware amortized over multiple operations, we can further reduce the gate count by storing each set of filter coefficients in as few gates as possible. In addition to their other properties, SCCSS are extremely attractive in this regard since it is possible to generate the modified 2-port adder/subtractor coefficientsĉ(t) used in Fig.  3c through simple combinatorial logic.
With the full derivation provided in the appendix, modified 2-port adder/subtractor coefficients for a SCCSS of order 2 n can be denoted aŝ
where l is the index of the least significant non-zero bit in the radix-2 representation of t. For example, consider the filter coefficient at t = 34 for code k = 5 in a set of order n = 3. Here, t = 34 = (100010) 2 and k = 5 = (101) 2 . The least significant non-zero bit in t is second from the right. Therefore, by substituting l = 1 into (10) we find that c
8 (34) = t l+n = t 1+3 = t 4 = 0. An example of a VHDL realization of an SCCSS generator appropriate for the FIR filter of Fig. 3c 
is provided in Listing
l i b r a r y IEEE ; u s e IEEE . STD LOGIC 1164 . ALL; e
n t i t y t o p i s p o r t ( c l k : i n s t d l o g i c ; t : i n s t d l o g i c v e c t o r ( 5 downto 0 ) ; k : i n s t d l o g i c v e c t o r ( 2 downto 0 ) ; c : o u t s t d l o g i c ) ; end t o p ;
a r c h i t e c t u r e r t l o f t o p i s b e g i n p r o c e s s ( c l k ) b e g i n i f ( c l k = '1 ' and c l k ' e v e n t ) t h e n i f t ( 0 ) = ' 1 ' t h e n c <= t ( 3 ) ; e l s i f t ( 1 ) = ' 1 ' t h e n c <= t ( 4 ) ; e l s i f t ( 2 ) = ' 1 ' t h e n c <= t ( 5 ) ; e l s i f t ( 3 ) = ' 1 ' t h e n c <= t ( 4 ) xor k ( 0 ) ; e l s i f t ( 4 ) = ' 1 ' t h e n c <= t ( 5 ) xor k ( 1 ) ; e l s i f t ( 5 ) = ' 1 ' t h e n c <= k ( 2 ) ; e l s e c <= ' 0 ' ; end i f ; end i f ; end p r o c e s s ; end r t l ; Listing 1. VHDL to generate filter coefficients for a SCCSS of length 64.
1. The complexity of this generator is very low; the modified filter coefficients are calculated via simple boolean logic and no internal counters are needed. However, despite the low complexity, this component is functionally equivalent to a 2 2n -bit wide ROM 2 n words deep that is preloaded with modified SCCSS-coefficients.
IV. RESULTS AND ANALYSIS
In this section, we compare the complexity of the modified SCCSS-coefficient generator of Listing 1 to that of ROMs and PRNGs. In all comparisons, logic utilization figures were obtained from Synplify Pro and the application specific integrated circuit (ASIC) gate count estimates were calculated as per [7] .
Pseudo-random sequences, as well as derivatives like Gold codes, can be very efficiently implemented in digital hardware [8] , [9] since a pseudo-random sequence 2 n chips long requires a linear-feedback shift-register (LFSR) with only n flip-flops. Although codes produced in this way are not perfectly orthogonal, the ease with which very long sequences can be generated makes this approach very common. Table I shows how a PRNG compares to the SCCSS generator. Although both techniques are low cost, the guaranteed orthogonality, scalability, and completeness of the SCCSS more than justifies its marginally higher complexity. Further, another weakness of LFSR-based codeset generation is that the code chips must be generated recursively. In other words, it is not possible to obtain the n th chip without first calculating the (n − 1) preceding chips. The SCCSS generator does not have this limitation since it uses a single counter, the cost of which SCCSS PRNG Code Length  64  256  1024  64  256  1024  Flip-Flops  10  13  16  6  8  10  4-Input LUTs  17  24  31  1  1  1  ASIC Gates  182  248  321  54  70  120   TABLE I   COMPARISON OF COMPLEXITY OF SCCSS AND PRNG GENERATION is included in the comparison, to identify the desired chip. Another alternative to dynamically generating the codeset is to pre-calculate the coefficients and store them in a ROM. Although this allows for sequences with arbitrary auto-and cross-correlation functions, the storage is grossly inefficient since roughly 2 3n ASIC gates are required to store 2 n sequences of length 2 2n . For example, in the case of an order 2 n = 32 SCCSS, where each sequence is 1024 chips long, a 32kb ROM would be needed. This is several orders of magnitude more gates than needed for the SCCSS generator.
V. CONCLUSION
An extremely efficient boolean expression to calculate FIR filter coefficients for SCCSS were derived. Since SCCSS have auto-correlation and cross-correlation functions that are superior to pseudo-random and Walsh-Hadamard sequences in several ways, devices such as spread-spectrum communication transceivers can exploit SCCSS for improved end-to-end performance with nominal increase in computational complexity.
APPENDIX
The relationship between the original coefficients c(t) and the modified 2-port adder/subtractor coefficientsĉ(t) can be derived by equating the Y outputs in Fig. 3a and 3c as per (6)⊕ĉ (4)⊕ĉ (0) Note that this equivalence is valid only for binary sequences, i.e. for original filter coefficients c(t) = {+1, −1} and modified filter coefficientsĉ(t) = {0, 1}.
The like-terms of d(t) can be equated and the resulting simultaneous equations solved to yield the modified filter coefficientsĉ(t). By expressing the index t in radix-2 notation, we can state that Through induction, we can therefore denote the general case as
where m is equivalent to t with the least significant non-zero bit flipped. When the filter coefficients denote a SCCSS, i.e. c(t) = c (k) 2 n (t), the modified filter coefficients can be calculated using simple boolean logic. We begin this derivation by observing that the least significant n bits of the radix-2 expression of t denote the row i while the most significant n bits denote the column j. In other words,
By substituting (3), (13) and (14) into (12) we obtain five distinct cases that are delineated by the index t.
A. Case 1: t=0
The first modified filter coefficients, where t = i = j = 0, of any sequence in a SCCSS iŝ
We conclude that the first modified filter coefficient for all SCCSS sequences is 1.
For all chips t > 0, the modified filter coefficients can be denotedĉ
where the values of i and j depend on the location of the least significant '1' bit in the radix-2 representation of t. We now expand (16) into (22) by substituting (3) . By definition, this requires the least significant '1' bit in t to be in the lower half of the radix-2 representation. This places the focus on the row index i since i will differ from i by exactly one bit whereas the column indices are equivalent with j = j. We denote the changed bit as i l = 1 and i l = 0, which allows us to simplify (22) tô
As the radix-2 representation of j is a subset of the radix-2 representation of t as per (14), we conclude that c(t) = t l+n (18) C. Case 3: l = n − 1
Beginning again with (22), we note that this case is similar to the previous one since i r = i r = 0 and j r = j r for all 0 ≤ r < n − 1. The same logic also reveals that i n−1 = 1 and i n−1 = 0, allowing us to concludê c(t) = (i n−1 ⊕ k n−1 )j n−1 ⊕ (i n−1 ⊕ k n−1 )j n−1 = (1 ⊕ k n−1 )j n−1 ⊕ (0 ⊕ k n−1 )j n−1 = j n−1 = t l+n .
Since the simplified expression is unchanged from the previous case, we can combine them to yield a single expressionĉ(t) = t l+n for all 0 ≤ l ≤ n − 1.
D. Case 4: n ≤ l ≤ 2n − 2
We have now, by a process of elimination, established that the least significant '1' bit in t must be in the upper half of its radix-2 representation. This means that the row indices are equivalent, with i = i = 0, and the column indices differ, with j n−1 = j n−1 . We therefore simply (22) tô c(t) = n−2 r=0 (j r+1 ⊕ k r ) j r ⊕ j r+1 ⊕ k r j r = (j l−n+1 ⊕ k l−n ) j l−n ⊕ j l−n+1 ⊕ k l−n j l−n = j l−n+1 ⊕ k l−n = t l+1 ⊕ k l−n (20)
E. Case 5: l = 2n − 1
In this final case, i = i = 0 and j r = j r = 0 for r < n − 1. With the least significant '1' bit forced to be the MSB, we can conclude that j n−1 = 1 and j n−1 = 0. Eq. (22) is therefore simplified toĉ (t) = k n−1 (j r+1 ⊕ i r ⊕ k r ) j r ⊕ j r+1 ⊕ i r ⊕ k r j r ⊕ (i n−1 ⊕ k n−1 ) j n−1 ⊕ i n−1 ⊕ k n−1 j n−1
