Abstract-This paper presents the first compact hardware implementation of a digital code-shifted reference (CSR) ultrawideband (UWB) transceiver. The security of the transmission is based on changing the physical properties of the transmission without the use of higher level security options. The software models of the designed transceiver are simulated and verified in both floating-point and fixed-point numerical representations. The synthesizable Verilog description of the transceiver architecture is simulated and verified against its fixed-point simulation model. The secure transceiver is implemented on our custom-developed field-programmable gate array (FPGA) board. The characteristic and implementation results of the secure transceiver architecture on the FPGA are presented. The bit error rate performance of the transceiver is measured in real time on the FPGA using an accurate on-chip Gaussian noise generator and is compared with that of the software simulation model. An ASIC architecture of the CSR-UWB transceiver is estimated to occupy 0.019 mm 2 and dissipate 0.63 mW from a 1.0 V supply while operating at 82 MHz in a standard 32-nm CMOS technology.
Design and Implementation of a Digital
Secure Code-Shifted Reference UWB Transmitter and Receiver
Andrew Hennessy and Amirhossein Alimohammad
Abstract-This paper presents the first compact hardware implementation of a digital code-shifted reference (CSR) ultrawideband (UWB) transceiver. The security of the transmission is based on changing the physical properties of the transmission without the use of higher level security options. The software models of the designed transceiver are simulated and verified in both floating-point and fixed-point numerical representations. The synthesizable Verilog description of the transceiver architecture is simulated and verified against its fixed-point simulation model. The secure transceiver is implemented on our custom-developed field-programmable gate array (FPGA) board. The characteristic and implementation results of the secure transceiver architecture on the FPGA are presented. The bit error rate performance of the transceiver is measured in real time on the FPGA using an accurate on-chip Gaussian noise generator and is compared with that of the software simulation model. An ASIC architecture of the CSR-UWB transceiver is estimated to occupy 0.019 mm 2 and dissipate 0.63 mW from a 1.0 V supply while operating at 82 MHz in a standard 32-nm CMOS technology.
Index
Terms-Code-shifted reference ultra-wideband (CSR-UWB), field-programmable gate array (FPGA), baseband architecture.
I. INTRODUCTION

F
OR applications in which the transceiver must manage sensitive data, it is important for the transceiver to offer robust protection from both passive and active potential adversaries. The passive adversary would be considered an eavesdropper that would attempt to collect the transmitted information intended for the receiver. An active adversary would attempt to mimic the transceiver and send its own messages to the receiver while posing as the transmitter. The addition of the security scheme is intended to prevent unwanted eavesdropping while also deterring the mimicry of the legitimate transmitter.
Computational security conventionally uses a security key to encrypt data before it is transmitted. The strict size and power requirements of certain devices, such as brain-implantable transceivers, prevent the use of conventional cryptography algorithms. In an effort to drastically reduce the power consumed, the security strategy relies on the changing of the physical properties of the transmission scheme to mask the transmitted data. Ultra-wideband (UWB) is considered to provide some inherent advantages when it comes to the security of its transmissions because of the extremely large bandwidth. The facts that UWB pulses are incredibly narrow and reside at the noise level for other transmission methods, allow UWB pulses to have a naturally low probability of detection and interception [1] . This inherent level of security is however not an adequate substitution for cryptography and for potentially sensitive applications, such as bio-medical devices, additional security is necessary.
The security of the transmitted reference (TR) trasnceiver in [1] is based on separating a data and a reference pulse in time so only a legitimate receiver would be able to locate the data pulse. For the receiver to interpret the data received, the receiver must know the separation and use an exact timing delay to correlate the data with its reference. A precise timing delay is difficult to implement in digital hardware, especially with strict area limitations. In the code-shifted reference (CSR) scheme [2] , [3] , the transmitter sends groups of data pulses simultaneously with a reference pulse added together in the time domain. Orthogonal codes allow data and reference pulses to be separated in the code domain in a way that is only discernible by the legitimate receiver. Fortunately, physical properties of the CSR scheme can be manipulated to resemble the security provided in [1] by a precise timing offset that is impractical to implement.
For a compact and low complexity architecture, the CSR-UWB transceiver is designed as a non-coherent transceiver [5] using the energy collection method [6] . Choosing the CSR scheme provides the transmission with additional inherent security. The low probability of detection as well as the masking of the number of bits transmitted simultaneously are basic properties of CSR-UWB before the inclusion of a security key. Incorporating the security key into the CSR scheme bolsters security by allowing only the intended receiver to know the separation in the code domain between the data and reference. The transceiver is implemented to stay as compact as possible while introducing a way to vary the separation between transmitted data and reference pulses [1] .
We first modeled the secure IR-UWB transceiver in floatingpoint representation in Matlab. The fixed-point representation of the transceiver is then modeled using a custom-developed library of parameterizable fixed-point operations in MEX-C. The Verilog description of the transceiver is developed and the cycle-accurate bit-true implementation of the transceiver is analyzed and verified against its fixed-point model. The transceiver is implemented on a custom-developed fieldprogrammable gate array (FPGA) board hosting a relatively small Xilinx Spartan-6 FPGA. The on-chip bit error rate performance measurement of the transceiver using our accurate GNG [7] is compared with the fixed-point software simulation results. The secure transceiver architecture is synthesized in a standard 32-nm CMOS process and the silicon area, performance, and power consumption of the transceiver are presented.
The rest of this article is organized as follows. Section II presents the CSR modulation scheme and describes the use of orthogonal codes. Section III discusses the operation of the secure transmitter along with its hardware architecture. The receiver operation and its hardware architecture are described in Section IV. Section V presents the software and hardware simulation results of the designed secure CSR-UWB transceiver and its implementation characteristics on a FPGA and also in a standard 32-nm CMOS process. Section VI makes some concluding remarks.
II. CSR TRANSMITTER ALGORITHM
Successfully separating the reference pulse from the data pulses is based on the use of orthogonal shifting codes on both sides of the transmission. In order to successfully detect and decode transmitted data, the transmitter and receiver must be in agreement on the shifting and reference codes used during the transmission. These codes are composed of a series of either positive or negative ones. To successfully detect transmitted data in the CSR scheme, the codes used to shift data orthogonally must satisfy the following three conditions [8] :
where c il and c in denote any possible shifting code, c ik denotes a detection code made by multiplying a shifting code with a reference code, and c i0 denotes a reference code. M and N f denote the number of bits grouped with a reference and the number of frames used, respectively. Changing the shifting codes and the reference code continuously will result in the creation of multiple detection codes that must satisfy the above three conditions. Shifting and reference codes are selected from a set of Walsh codes that are orthogonal [8] . For the fixed reference and shifting codes we have selected, all of the possible detection codes that can be created satisfy the above three conditions.
Shifting the data from the reference in the code domain is accomplished by multiplying each of the bits transmitted simultaneously by a shifting code. The reference pulse is generated to be orthogonal by multiplying the samples of a pulse by the reference code. The results of adding the reference and data are pulses separated in the code domain and combined in the time domain. The data is shifted using one of the set of orthogonal codes known as the Walsh codes [2] . Each code is orthogonal to every other code in both the shifting and reference matrices. Successful transmission and reception require one orthogonal code per bit in the group as well as one code used for the reference. Each code must be the same length as the number of frames that make up one symbol period [2] . The symbol period defines the time it takes to transmit one group of bits transmitted simultaneously. The simultaneous transmission of a group of N bits can be done using any N + 1 orthogonal codes. The following shifting and reference codes are selected because of their effect on the transmitter output. This specific and static transmitter output is used for the preamble to facilitate the detection and synchronization at the symbol level.
Shifting Codes
It was shown in [2] that simultaneously transmitting the information of four bits in one symbol period gives an optimal performance at the receiver. In this scheme, four bits as well as the reference are transmitted simultaneously as pulses over one symbol period consisting of eight frames. These pulses are orthogonal in the code domain but are combined in the time domain. The basic pulse used for the reference and data pulses is the collection of samples that make the pulse train used in [6] . The pulse train is represented as 99 consecutive voltage samples. The CSR transmitter fills a symbol period using eight pulse trains separated by a buffer of no data between pulses over eight frame periods.
The CSR modulation scheme resembles pulse amplitude modulation (PAM) because of the use of different pulse amplitudes to represent values. This is realized when pulse train samples are multiplied by a scalar generated by the transmitter. The amplitude of the output pulse is determined by the addition of positive and negative scalar values created by applying the binary PAM scheme to the bits in the group. A negative multiplication result of a bit and its shifting code is represented by the addition of a negative one to the scalar while a positive value is represented by the addition of a positive one. Fig. 1 shows a group of frame periods, T f , filled with second order Gaussian pulses to demonstrate the modulation. Each frame period contains a pulse scaled to a specific amplitude by the transmitter. The output pulse distribution shown in Fig. 1 is the result of transmitting a group of four ones using the shifting codes shown in the Shifting Code matrix with reference C 8 . During the transmission of data, the pulses seen in each frame period may change amplitude based on the calculation of the scalar from input parameters. The modulation scheme combines eight frame periods to complete one symbol period, T s . The scalars generated using specific input parameters to create the waveform distribution shown in Fig. 1 are important during the preamble where the transmitter repeatedly creates this pulse distribution. This pulse distribution fixes the largest pulse in the first frame of the symbol period. This facilitates the detection and synchronization to incoming transmissions as shown in Section IV Algorithm 1 shows the process of transmitting data using the CSR scheme. Bits in a group denoted by B i are multiplied by the k th element of their respective shifting code, where k denotes the frame number. The result of the multiplications are added with the reference code adjusted for the number of bits in a group. Before the value of the addition is used as a scalar given to the pulse generator, the absolute value of the result is taken. A scalar value is generated for each frame in a symbol period and applied to a pulse train before the scaled pulse train of each frame in the symbol period is represented at the output of the transmission antenna.
Algorithm 1 CSR Transmission
The CSR transmitter architecture is shown in Fig. 2 . This datapath multiplies the data by the shifting code before adding the results to the reference pulse [2] . B i denotes a bit from the group of bits transmitted simultaneously and C si denotes the an element of the i th shifting code. When zeros are to be transmitted, the bit values are substituted with negative ones before they are multiplied by the shifting codes. The reference code C r is always multiplied by the square root of the number of data bits transmitted simultaneously, M. Unlike the data bits, the reference bit is always a constant value of 1 and hence the multiplier to multiply the reference code value is not required. In this case, four bits are transmitted simultaneously so the reference pulse is doubled before it is added to the data pulses. The result of the absolute value is a scalar that is multiplied with a sample value. Waveform Samples stores and passes sample values of the pulse train that are scaled before they are seen on the output of the transmit antenna.
The result of applying Algorithm 1 using a data matrix h = [1 1 0 1] and reference C 6 is shown below.
The result of multiplying the reference C 6 by the square root of the length of the data array h is given by the array below.
Reference Vector
All of the values of one column of the Code Shifted Data Matrix are added along with the value in the same column of the Reference Vector. After the assessment of the absolute value, the columns in the resulting vector t represent the scalar values of frames within the symbol period. The properties of orthogonal codes also significantly limit the possible output waveforms seen at the output of the transmitter. Having a limited number of output waveforms is beneficial as these waveforms are repeated in the time domain for different separations of the inputs in the code domain. There are considerable overlaps of the output of the transmitter given the use of different reference codes along with the different ordering of the shifting codes. The use of the security key allows the transceiver to continually change which codes are used to vary the separation between pulses in the code domain. Using two different groups of four input bits with the same reference code and shifting codes, the same output to the transmitter can be created by changing the order of the shifting codes fed to the transmitter. This is shown in Fig. 4 . The inclusion of the security key allows the transmitter and receiver to continually change the codes used to mask the data with overlapping transmitter outputs. Using the shared security key, only the intended receiver will be able to continually generate the detection codes needed to decode the incoming pulses.
In addition to the transmitter overlap, CSR bolsters security by grouping pulses in the time domain. While the number of bits used to form one group of data is fixed, there are no distinguishable traits recoverable as a result of using a varied number of bits to transmit simultaneously. CSR transmission is based on the generation of scalar values to manipulate pulse amplitudes. The number of scalars available to generate pulses of different amplitudes is limited and does not change. The fixed number of possible scalars prevents the deduction of the number of pulses that make up a group.
A potential eavesdropper may employ brute force by using the pulse output from the transmitter to attempt to ascertain the data bits that were used to create transmitted pulses. In order to determine the data bits that were used, the eavesdropper must know the shifting codes and the reference code used. To employ the brute force approach, a potential adversary would use all combinations of possible code separations to create a list of possible transmitter outputs in the time domain. The potential adversary would use the output of the transmitter to compare with the comprehensive list and attempt to create a guess of what bits were used during the transmission. The list of possible overlapping outputs to the transmitter is relatively large considering the number of variations of available code separations.
The worst case scenario is considered to get an indication of the level of security provided to the transmission by the security scheme. In the worst possible scenario, the potential adversary has prior knowledge of information about system parameters that would not be readily available to an eavesdropper. The most relevant pieces of information to the discerning of transmitted data are the codes used as references, the shifting codes and their order, and the number of bits that make up one group of bits transmitted simultaneously. With access to all of this information, the adversary could conclude that the transmitted group of bits is one of two groups if the group contained an even number of zeros or one of four groups if it contained an odd number of zeros. Even with the limitation to only changing the reference code and an unreasonable amount of system knowledge, the brute force approach still does not provide certainty of the transmitted group of bits. The number of groups of four bits that could be used to make one of the transmitter output distributions rises outside of this worst case scenario.
III. CSR-UWB TRANSMITTER ARCHITECTURE
For a compact CSR implementation, the designed transceiver only uses the security key to change the reference pulse, while keeping the shifting codes fixed. By keeping the value of the shifting codes constant, the control logic to manage the security key is simplified and a multiplexer is removed for each bit transmitted simultaneously. Fig. 5 shows the architecture of the secure CSR transmitter. The security key determines the reference code used to create the reference pulse orthogonal to all four of the bits transmitted simultaneously. This implementation uses four possible reference codes that can be specified using a two-bit section of the security key. Updating only the reference code reduces the size of the transceiver while maintaining overlaps available at the output of the transmitter. In addition to the presence of the overlaps at the transmitter, changing the reference code impacts the detection codes used at the time of decoding. The fixed location of the pulse train at the start of the frame period allows the transmitter to be made of a storage location for the waveform samples along with a scalar generator. The hardware seen in Fig. 5 prior to the final multiplier serves as a generator of a scalar by which to multiply the pulse train in every frame. The waveform scalar is generated using the group of transmitted bits, the shifting codes, and the reference code. The scalar is multiplied by the samples of the pulse train to determine the amplitude of the samples seen at the output of the transmitter. Key Rotation provides a portion of the security key that changes at every symbol boundary. Rotating which portion of the key is used provides the transmitter with a new reference code used during the transmission With a fixed four bits per group, the multiplier for the reference pulse is a constant value of two. The limitation of the orthogonal codes as well as the presence of the absolute value allow for additional simplifications of the transmitter. Given that the reference code is always a one or a negative one, the contribution of the reference to the final scalar can only ever be a two or a minus two. Because the possible output can only be two values, it is simpler to represent the outputs using a single bit. The single bit of the reference code uses a one to indicate the addition of a negative two while a zero to indicate the addition of a positive two. This simplification is accounted for when the reference is passed to the addition with the data pulses. Fig. 6 shows the utilization of a combinational logic tree made of XNOR gates as well as NOR gates to replace all of the multipliers as well as the two adders required to sum the multiplication results together in the transmitter datapath. To make the transmitter more compact, the shifting and reference codes are passed in as a single bit using the zero to represent the negative one. Each of the XNOR gates along the top row takes in one element of a shifting code as well as one of the bits in the group of bits transmitted simultaneously. The possible inputs to the multipliers are either one or negative one, which limits the possible output to a one or negative one. The addition of each two of the multiplication results can result in a two, negative two, or zero. Even though the range of these results span from negative two to two, the presence of only three possible output values allows the result of the addition to be expressed using two bits.
The remainder of calculating the scalar used when generating the pulse is done by the addition of both of the two bit results from the multiplier and adder replacement a, b, c, and d, as well as the single bit e from the reference code as shown in Fig 7. The combinational logic in Fig. 7 reduces the two adders needed to combine the reference and two multiplication results, where e denotes the impact of the reference and S1 and S2 represent the final scale calculated for multiplication with the sample values. The use of the absolute value block restricting the possible scalar to zero, two, four, or six allows the final scalar to be represented using two bits.
The limited number of possible scalar values lends itself to the reduction of the final stage multiplier that applies the scalar to the pulse train from the pulse generator. Fig. 8 shows the datapath of the pulse train scaling without a multiplier. The pulse sample is shifted to the left once, denoted by , to get the sample scaled by two and shifted to the left again to get the sample scaled by four. The results of one shift and two shifts are provided as the input to an adder to create the pulse sample scaled by six. The replacement of the multiplier uses multiplexers to then select which scaled value is passed out of the transmitter. The multiplexers are provided with the two bits indicating the value of the scalar generated and pass an appropriately scaled sample value.
IV. CSR-UWB RECEIVER ARCHITECTURE
The implementation of the CSR-UWB transceiver is based on our basic non-coherent transceiver presented in [6] . The performance of the design is improved using cutset retiming. Cutset retiming is a transformation technique used to add pipeline registers and/or change the location of the delay elements without affecting the input/output characteristics of the design [9] . A cutset intersects a set of edges of a graph such that if these edges are removed from the graph, the graph becomes disjoint.
The block diagram of our designed secure CSR receiver is shown in Fig. 9 . Incoming samples are first squared in the autocorrelator to increase the separation between signal and noise values. The detector integrates the incoming data to find the preamble and the frame boundaries. The synchronizer finds the symbol boundaries that determine which frames are used to constitute one group of four bits transmitted simultaneously. The receiver requires both frame level and symbol level synchronization in order to correctly decode the received samples. Symbol level synchronization is necessary to evaluate which security key section is used, while frame level synchronization determines which element of the detection codes to use. In order to adapt to secure CSR, the security key is now an input to a code generator in the decoder. The code generator uses a two-bit section of the security key to determine the reference code for the current incoming data. The reference code is then multiplied by each of the shifting codes to make the detection codes for the decoder. The detection codes created by the code generator allow the decoder to create the bit stream of output data. The decoder includes a module to identify the reception of the Barker code, allowing the decoder to differentiate between the preamble and transmission data.
The finite state machine (FSM) controls the operations of components of the receiver. To better manage power consumption, the FSM is responsible for making some components of the receiver idle while waiting for samples to become available. The FSM for the CSR adaptation must also manage the rotation of the security key used to pass orthogonal reference codes to the decoder.
In addition to the shifting codes, the receiver and transmitter must agree on the packet structure used to transmit the data. The packet structure for the CSR transceiver starts with a preamble and a start frame delimiter (SFD) before the header and the payload data as seen in [6] . The use of this packet structure facilitates the detection and synchronization stages, however, it also affords a potential adversary the same opportunity for detection and synchronization as it does to the legitimate receiver. The potential eavesdropper can use information from the transmitter to make assumptions about the data transmitted. For the detector to find the packet, the preamble must be constant and repetitive which also allows a potential eavesdropper some insight into the transmission scheme. Similar to the intended receiver, a potential eavesdropper will be able to use the preamble to synchronize to both the frame and symbol period of the data. The ability to find the data in the transmission does not however give the potential eavesdropper insight into the content of the transmitted data. This is because of the variation of pulse separations in the code domain that arise from continuously changing the codes used during transmission.
The repetitive nature of the packet structure also allows the active adversary the threat of a mimicry of the preamble. An active adversary can create its own messages that are picked up by the legitimate receiver. The active adversary thus poses a threat by the creation of messages that give the intended receiver false instructions or by creating messages to prevent legitimate messages from being received. A continuous transmission of the preamble would severely hinder the receiver's ability to find legitimate transmissions as it would spend most of the time in the decoding state looking for the Barker code [10] . This makes the detection of a legitimate message unlikely. A frequency shifted reference transceiver [11] can be used to prevent active preamble mimicry. The choice to use the CSR scheme over a frequency hopped scheme means that the transceiver is more susceptible to active jamming, but provides eavesdropping and active impersonation protection at a considerably lower design complexity. The CSR scheme does help prevent interference from an active adversary by masking transmitted data which hinders the creation of fake packets read by the intended receiver. A potential adversary attempting to create its own messages must find all of the input conditions including the orthogonal codes and number of bits in a group used by the transmitter and the receiver to match the legitimate detection codes of the receiver.
A. Detection
The detection process establishes the presence of a transmission as well as the start of each frame within the symbol period. The architecture of the detector for the CSR transceiver is based on the detection process described in [6] . The preamble used to indicate the presence of a packet is a block of transmitted ones at the beginning of the transmission. The detection of the preamble is done using sweeping integration phases that integrate over the different sections of the frame space. Each section of the frame space is accumulated over one full symbol period. The detector is looking for the maximum integration phase over several symbol periods. The detector determines if it has found a preamble when one integration phase has the highest integration value in 6 out of 11 symbol periods. The detector resets if no phase has met this threshold after evaluating 11 symbol periods. The presence of multiple integration phases inside of a frame period gives the detector a reasonable approximation of the start of the frame period, within 22 samples. The detection process essentially synchronizes to the frame period prior to the start of the symbol synchronization stage.
The detector is made up of an integrator block, a rotating shift register, and phase comparison logic to determine if a preamble has been detected. The integrator block and rotating shift register manage the collection of overlapping phases for comparison. The architectures of the integrator block and rotating shift register are described in [6] and are updated for the bitwidths necessary for secure CSR. The datapath is adapted to use a (10,1) autocorrelator output and (15,1) rotating shift register, where (WI, WF) denotes the number of integer bits WI and the number of fractional bits WF of a signal. The detector in this implementation sweeps 9 integration phases over each frame. The integration windows are separated by a phase space of 22 samples. Fig. 10 shows the datapath of the detector after the integration block and the rotating shift register. The detector continually updates the maximum integration phase as they are presented by the integrator block. The comparator is responsible for determining the maximum integration phase and indicating the index inside of the rotating shift register. The rough synchronization to the boundary of the frame is done by evaluating which integration phase has been a maximum over a series of symbol periods and storing the maximum index. When the preamble has been detected, the detector stores the value of the maximum phase index in a register, Waste Cycles. The detector informs the FSM that it has found the preamble only when the phase counter returns to the value of the maximum integration phase. The addition of this delay means that the starting point of the synchronizer will roughly match the frame boundary.
B. Synchronization
The main challenge of the CSR transceiver is synchronizing the transmitter and receiver so the same shifting and reference codes are used in the appropriate part of the transmission. The synchronization stage finds the symbol boundaries within the transmission data. For this implementation, the reference and shifting codes were selected to place a large pulse at the beginning of the symbol period. The process of synchronization is completed by integration over a range of 10 symbol periods of incoming squared samples. The synchronizer finds the beginning of the symbol period using sweeping integration phases over the symbol period. Each integration phase lasts for the duration of one frame. Fig. 11 shows the integration phases to evaluate the eight frames in one symbol period. A repeated maximum integration phase over several consecutive symbol periods allows the synchronizer to recognize this frame as the start of the symbol period. Because the synchronizer is responsible for selecting the frame with the largest integration total, the integration phases do not need to overlap. The removal of the overlapping integration phases is reflected in the implementation as the synchronizer does not use the integration block seen in the detector.
The datapath of the synchronizer is shown in Fig. 12 . The synchronization datapath uses an accumulator to integrate the samples of a frame, a rotating shift register, and comparison logic to determine which frame is largest. The rotating shift register allows the synchronizer to manage the integration of individual frames over several symbol periods. This allows the synchronizer to actively track which integration phase has been the largest over a set number of symbol periods. The newest value to the rotating shift register is passed to comparison logic responsible for identifying the highest integration phase. The rotating shift register has one storage location for every integration phase. In this implementation, there is an integration phase for every frame so the rotating shift register has eight indexes. This matches the rotating shift register for synchronization presented in [6] .
C. Decoder
To interpret a symbol period of incoming samples, the receiver must first generate detection codes by multiplying the shifting codes with the reference code. Because four bits are received simultaneously, the decoder generates a detection code for each bit in the group using its shifting code. The detection codes match the length of the shifting and reference codes. The detection codes are made of eight elements that are each applied to one frame of the symbol period. As shown in Algorithm 2, the detection codes are applied to the total of one accumulated frame. These frames are accumulated to create a total for the symbol period of each bit in the group. The sign of each symbol period accumulation is used to determine the transmitted bits.
Algorithm 2 CSR-UWB Decoder Algorithm
for k = 0; k < 8;
To decode the pulses of received vector t, the decoder uses detection codes generated by multiplying the shifting and the reference used to create t. The detection codes for reference C 6 are given as the following matrix. The energy collection method shown in [6] is applied so the incoming samples are squared during reception. The result of squaring the incoming transmission t during autocorrelation results in the array shown as r . Fig. 13 . Frame periods are accumulated before they are multiplied with the detection code generated by multiplying the shifting and reference codes. The length of the accumulated frames is equal to 198 samples or twice the width of the pulse. The multiplied frame periods are accumulated for a full symbol period before the sign of the accumulation is evaluated to determine the bit added to the bit stream.
Detection Codes
The decoder begins immediately after the establishment of synchronization and translates the incoming data pulses into a bit stream. Every addition to the output bit stream is a group of four bits transmitted simultaneously. The decoder is also responsible for finding the Barker code, determining the number of bits in the transmission data from the header, and decoding the transmission data. The architecture of identifying the Barker code, utilized in our non-secure UWB transceiver, is described in [6] . To decode the Barker code, the first group of transmitted bits includes the last bit of the preamble. If the Barker code is successfully found, the decoder uses the set number of header bits to evaluate the number of bits in the transmission data. The decoder then uses that number to evaluate the transmitted data bits. The FSM is responsible for tracking the number of samples accumulated as the decoding progresses to manage the security key. The FSM rotates the section of the security key provided to the decoder following every symbol period. For the code generator to create detection codes, the security key is continually rotated providing two-bit sections of the security key to the code generator. The two-bit section indicates which reference code is used to multiply with the shifting codes. Hence, the security key provides a continually changing reference code from the start of the header until the end of the transmission data.
Fig. 14 shows the datapath of the implemented decoder that generates the output bit stream. The accumulator on the left is responsible for the integration of all the samples that make up one frame. Based on the value of the shifting and reference codes, the application of the detection codes will always keep the incoming samples constant or negate them. The multiplication of the detection code has been replaced with a multiplexer that selects either the integration value or the two's compliment of the integration based on the detection code. The second accumulator is responsible for accumulating the frame values for the samples in one symbol period multiplied by their respective detection codes. The final part of the decision before the four bits are added to the bit stream is to evaluate the sign of the accumulated symbol period. The evaluation of the bit added to the bit stream is done by inverting the most significant bit (MSB) of the signed accumulator result. The multiplexer, accumulator, and inverter make up the Bit Evaluation block responsible for generating a single bit. For four bits transmitted simultaneously the decoder architecture requires three additional bit evaluations to generate a group of four bits simultaneously.
V. PERFORMANCE EVALUATION AND IMPLEMENTATION RESULTS
We modeled the CSR-UWB transceiver in both floatingpoint and fixed-point representations in Matlab/C as well as in a fully-parameterizable synthesizable Verilog hardware description. We used a library of custom developed, parameterizable fixed-point operations. Fig. 15 shows the bit error rate (BER) performance of the secure transceiver over a range of signal-to-noise ration (SNR) values modeled in software as well as the result of the FPGA simulation. The BER performance of the transceiver is measured over a White Guassian noise channel. We used our custom-developed FPGA board hosting a Spartan-6 Xilinx FPGA. The BER performance is evaluated using our on-chip accurate and scalable GNG presented in [7] . The close BER performance of the double-precision software simulation and fixed-point hardware simulation on a FPGA verifies the accuracy of our design parameters. The hardware results are created using the fractional bitwidths of components as depicted in their respective figures with no component ever keeping more than one fractional bit. It is important to note that the CSR scheme uses scaled up pulse amplitude values for the transmitted data. A pulse value can be scaled up three times the value of the smallest non-zero pulse. These larger values become considerably larger when the autocorrelation or squaring of the incoming samples is computed. Because of large values, the fractional part of the amplitude values will have smaller impacts on the value of the BER. Our bit-true simulation results verify that keeping a single fractional bit results in a small degradation in the BER performance when compared to larger fractional lengths. Using four bits transmitted simultaneously over eight frames in a symbol period, the BER performance results are very similar to the performance of the design in [4] . Fig. 16 shows the BER performance of the basic (non-secure) UWB transceiver presented in [6] , the secure CSR transceiver, and a secure CSR transceiver with a frame synchronizer utilizing a smaller phase space. The phase space was reduced from 22 samples to 6 samples using an additional frame synchronizer that sweeps 33 integration windows over the frame. The higher precision of the secure CSR transceiver with a smaller phase space in the frame synchronization stage allows for shorter windows of the integration used in the decision statistic relative to the time of the pulse. With increased precision, the decision statistic was reduced to cover 99 samples, or the length of the pulse. As a result, the BER decreases at the cost of more hardware resources. One can see that using a smaller phase space for frame synchronization brings the performance of the secure CSR transceiver closer to that of the non-secure UWB transceiver, however, the additional level of synchronization increases the size of the transceiver as shown in Tables I and II.  Table I gives the characteristics and implementation results of the secure CSR-UWB transceiver on a Spartan-6 Xilinx FPGA. The results are based on using a single fractional bit datapath throughout the transceiver. Table II gives the characteristics and implementation results of the secure CSR transceiver with the addition of a synchronization unit for a better estimate of the frame boundaries and improved BER performance. The sub-components of the transceiver that were affected by the addition of a new synchronizer are listed in Table II . While improving the BER performance, when a frame synchronizer with enhanced precision is used, the receiver requires an additional 3% of the available LUT resources and an additional 2% of the registers. Table III and Table IV give the power consumption and area utilization of the secure transceiver, respectively, synthesized in a 32-nm CMOS technology. The power consumption was measured after place and route using IC Compiler by synopsys. The receiver is about three times the size of the transmitter and consumes about two and a half times more power at 82 MHz from a 1.0-V supply. Fig. 17 shows the chip layouts of the transmitter and receiver. Our synthesis results show that changing the orthogonal codes continuously would have almost no impact on the size of the transceiver as this feature requires only one extra 4-to-1 multiplexer at the transmitter and four additional 4-to-1 multiplexers at the receiver while naturally enhancing the security of the communication.
VI. CONCLUSION
The design and implementation of a compact, secure code-shifted reference ultra-wideband (CSR-UWB) transceiver was presented. The use of CSR provided security by allowing the simple manipulation of the physical properties of the transmitted signal without severe design complexity limitations, such as precise timing or the need for multiple oscillators. The separation of the reference and data pulses in the code domain, while they are combined in the time domain, results in a limited number of output pulse distributions using a wide array of possible inputs to make those distributions. Limitations on the possible transmitter outputs also allowed for the significant reduction in the size of the amplitude scalar generator in the transmitter. The reduction in the scalar generator completely removed the need for multipliers making the transmitter considerably more compact. The real-time bit error rate performance of the secure transceiver implemented on our custom-developed field-programmable gate array (FPGA) board was evaluated using our scalable and accurate Guassian noise generator. The transceiver was also simulated in Matlab/C in double-precision floating-point and fixed-point formats. The transmitter runs at 117 MHz while the receiver runs at 82 MHz. The transmitter and receiver used about 21% of the configurable resources while using 6% and 62% of the DSP blocks available on a Xilinx Spartan-6 FPGA, respectively. This is equivalent to a chip area of 0.019 mm 2 in a standard 32-nm CMOS process, as estimated from chip synthesis.
