Abstract-Turbo codes have recently been considered for energy-constrained wireless communication applications, since they facilitate a low transmission energy consumption. However, in order to reduce the overall energy consumption, lookup table-log-BCJR (LUT-Log-BCJR) architectures having a low processing energy consumption are required. In this paper, we decompose the LUT-Log-BCJR architecture into its most fundamental add compare select (ACS) operations and perform them using a novel low-complexity ACS unit. We demonstrate that our architecture employs an order of magnitude fewer gates than the most recent LUT-Log-BCJR architectures, facilitating a 71% energy consumption reduction. Compared to state-of-the-art maximum logarithmic Bahl-Cocke-Jelinek-Raviv implementations, our approach facilitates a 10% reduction in the overall energy consumption at ranges above 58 m.
I. INTRODUCTION

W
IRELESS SENSOR NETWORKs (WSNs) can be considered to be energy constrained wireless scenarios, since the sensors are operated for extended periods of time, while relying on batteries that are small, lightweight and inexpensive. In environmental monitoring WSNs for example, despite employing low transmission duty cycles and low average throughputs of less than 1 Mbit/s [1] , [2] , the sensors' energy consumption is dominated by the transmission energy (measured in J/bit), since they may be separated by up to 1 km. For this reason, turbo codes have recently found application in these scenarios [3] , [4] , since their near-capacity coding gain facilitates reliable communication when using a reduced transmission energy . Note however that this reduction in is offset by the turbo decoder's energy consumption , as well as the (typically negligible) energy consumption of the turbo encoder [4] . Therefore, turbo codes designed for energy constrained scenarios have to minimize the overall energy consumption . Recent application-specific integrated circuit (ASIC)-based turbo decoder architectures [5] - [7] have been designed for achieving a high transmission throughput, rather than for a low transmission energy. For example, turbo codes have facilitated transmission throughputs in excess of 50 Mbit/s in cellular standards, such as the 3rd Generation Partnership Project 3GPP Long Term Evolution (LTE) and recent ASIC turbo decoder architectures have been designed for throughputs that are in excess of 100 Mbit/s [5] , [6] . This has been achieved by employing the Max-Log-BCJR turbo decoding algorithm, which is a low-complexity approximation of the optimal Logarithmic Bahl-Cocke-Jelinek-Raviv (Log-BCJR) algorithm [8] .
The Max-Log-BCJR algorithm appears to lend itself to both high-throughput scenarios, as well as to the above-mentioned energy-constrained scenarios. This is because a low turbo decoder energy consumption is implied by Max-Log-BCJR algorithm's low complexity. However, this is achieved at the cost of degrading the coding gain by 0.5 dB compared to the optimal Log-BCJR algorithm [9] , increasing the required transmission energy by 10%. As we shall demonstrate in Section IV, this disadvantage of the Max-Log-BCJR outweighs its attractively low complexity, when optimizing the overall energy consumption of sensor nodes that are separated by dozens of meters.
This motivates the employment of the lookup-table-log-BCJR (LUT-Log-BCJR) algorithm [8] in energy-constrained scenarios, since it approximates the optimal Log-BCJR more closely than the Max-Log-BCJR and therefore does not suffer from the associated coding gain degradation. However, to the best of our knowledge, no LUT-Log-BCJR ASICs have been specifically designed for energy-constrained scenarios. Previous LUT-Log-BCJR turbo decoder designs [10] - [13] were developed as a part of the on-going drive for higher and higher processing throughputs, although their throughputs have since been eclipsed by the Max-Log-BCJR architectures. This opens the door for a new generation of LUT-Log-BCJR ASICs that exchange processing throughput for energy efficiency.
As we shall discuss in Section II, the energy consumption of conventional LUT-Log-BCJR architectures cannot be significantly reduced by simply reducing their clock frequency and throughput. This motivates our novel architecture of Section III, which is specifically designed to have a minimal hardware complexity and hence a low energy consumption . In Section IV, we validate our architecture in the context of an LTE turbo decoder and demonstrate that it has an order of magnitude lower chip area, hence reducing the energy consumption of the state-of-the-art LUT-Log-BCJR implementation by 71%. Compared to state-of-the-art Max-Log-BCJR implementations, our approach facilitates a 10% reduction in the overall energy consumption of at transmission ranges above 58 m. Finally, Section V concludes this paper. 
II. CONVENTIONAL LUT-LOG-BCJR ARCHITECTURE
As shown in Fig. 1 , a turbo encoder [14] comprises a parallel concatenation of two convolutional encoders, each of which has a structure comprising number of memory elements, where is used in the LTE encoders, for example. Each encoder converts an uncoded bit sequence into the corresponding encoded bit sequence , where is the length of the input bit sequences. Correspondingly, Fig. 1 depicts a turbo decoder [15] , [16] , which comprises a parallel concatenation of two decoders, that employ the LUT-Log-BCJR algorithm. Rather than operating on bits, each LUT-Log-BCJR decoder processes Logarithmic Likelihood Ratios (LLRs) [14] , where each LLR quantifies the decoder's confidence concerning its estimate of a bit from the bit sequences and . Each LUT-Log-BCJR decoder processes two a priori LLR sequences, namely and , which are converted into the extrinsic LLR sequence . This extrinsic LLR sequence is iteratively exchanged with that generated by the other LUT-Log-BCJR decoder, which is used as the a priori LLR sequence in the next iteration [17] . Fig. 2 (a) depicts the conventional LUT-Log-BCJR architecture, which employs the sliding-window technique [18] , [19] to generate the LLR sequence as the concatenation of equal-length sub-sequences. Each of these windows is generated separately, using a forward, a pre-backward and a backward recursion, as shown in Fig. 2 . These three different recursions are performed concurrently for three different windows, as exemplified in Fig. 2(b) for . This schedule results in the completion of the windows in their natural order, starting with that containing the first LLR and ending with the one containing the last LLR . When the forward recursion is performed for a particular window, one pair of its corresponding a priori LLRs and is read from Mem 1 of Fig. 2(a) and processed per clock cycle, in the ascending order of the bit index . The forward recursion of the LUT-Log-BCJR algorithm can be performed in two pipelined steps using the corresponding dedicated hardware components of Fig. 2(a) . 1) First, the transition metrics [20, (2) ], that correspond to the current window are generated. Here, each transition metric is set either equal to the corresponding a priori LLR or to zero, depending on the particular pair of states and that the transition is between and on the Generator Polynomials (GPs) of the encoder. 2) Next, the state metrics A [20, (3) ] that correspond to the current window are generated. Here, each state metric is given by (1) where represents the set of all states that can transition into the state , depending on the GPs of the encoder. Note that the forward recursion for the first window is initialized independently. By contrast, the forward recursion for the other windows is initialized using state metrics that were obtained during the forward recursion of the preceding window. It is for this reason that the windows must be processed in their natural order, as shown in Fig. 2 . The operation is used to represent the Jacobian logarithm detailed in [21] , which may be approximated using a LUT [17] for the parameters and according to if if if otherwise (2) and can be extended to three or more parameters using associativity. Here, we assume the employment of a twos complement fixed-point LLR representation, which includes a 5-bit integer part and a -bit fraction part. As a result, there are entries in the LUT, each of which has values that are multiples of . As we will show in Fig. 7 , this arrangement yields a near-ideal BER performance [22] , provided that the integer parts of the LLR values are clipped to the range that can be represented using three bits. During the forward recursion, one set of state metrics is written to Mem 2 of Fig. 2(a) per clock cycle in the ascending order of the bit index .
When the backward recursion is performed for a particular window, one pair of its corresponding a priori LLRs is read from Mem 1 of Fig. 2(a) and processed per clock cycle, in the descending order of the bit index . Simultaneously, the corresponding set of state metrics are read from Mem 2 and processed per clock cycle. As a result, a particular window's backward recursion cannot be performed until after its forward recursion has been completed, as shown in Fig. 2(b) . The backward recursion of the LUT-Log-BCJR algorithm can be performed in four pipelined steps using the corresponding dedicated hardware components of Fig. 2(a) .
1) First, the transition metrics that correspond to the current window are regenerated, as described above. 2) Next, the state metrics [20, (4) ] that correspond to the current window are generated. Here, each state metric is given by (3) Note that the backward recursion for the last window is initialized independently. By contrast, the backward recursion for the other windows is initialized using state metrics that were previously obtained during the pre-backward recursion of the next window. This is achieved using step 1 and 2 of the backward recursion and initializing the latter independently. It is for this reason that the pre-backward recursions of Fig. 2 (b) are performed before the backward recursions of the preceding windows. 3) Next, the transition metrics [20, (5) ] that correspond to the current window are generated, according to (4) 4) Finally, the value of each extrinsic LLR in the current window of the sequence is generated according to (5) where is the set of transitions that imply has a binary value of . As shown in Fig. 2(b) , one extrinsic LLR is output per clock cycle in descending order of the bit index . By pipelining the forward, pre-backward and backward recursions using separate dedicated hardware for implementing the operations of (1), (3)- (5), the conventional architecture generates one extrinsic LLR per clock cycle, as shown in Fig. 2 . Therefore, it achieves a high throughput, provided that it can be operated at a high clock frequency. However, the recursions involve calculations that must be performed in series. Therefore, conventional architectures typically employ additional hardware 1 during synthesis to achieve a short critical path, a high clock frequency and a high throughput [24] . A number of variants of the LUT-Log-BCJR architecture of Fig. 2 have been proposed for further increasing the decoding throughput. For example, [25] employs parallel repetitions of the blocks shown in Fig. 2 (a) to "parallel-process" the schedule of Fig. 2(b) . Alternatively, [12] employs a radix-4 variant, which processes two sets of or state metrics at a time. In summary, conventional LUT-Log-BCJR architectures achieve high throughputs by employing substantial hardware, which imposes a high chip area and consequently a high energy consumption, as quantified later in Section IV.
Note that the energy consumption of the conventional LUT-Log-BCJR architecture cannot be significantly reduced by simply reducing the clock frequency, in order to meet the lower throughput demands of energy-constrained scenarios. While this would allow voltage scaling and a corresponding reduction of energy consumption, this approach would waste energy by powering the additional hardware that was introduced to manage the critical path. On the other hand, if voltage scaling is not employed, the limit on the critical path is relaxed, allowing the removal of the additional hardware that was introduced to manage it. While this facilitates a corresponding reduction in the dynamic energy consumption, the reduced throughput implies an increased static energy consumption, particularly in the case of high-density technologies. Furthermore, the lengthening of the critical path implies a greater variety of path lengths, particularly since the backward recursion path of Fig. 2(a) is significantly longer than those of the other recursions. This in turn implies that a greater fraction of the static energy consumption can be considered to be wasted, by giving short data paths more time to settle than necessary. In summary, efforts to slow down the conventional LUT-Log-BCJR architecture result in energy wastage, which cannot be avoided without completely redesigning the architecture.
III. PROPOSED LUT-LOG-BCJR ARCHITECTURE
In this section, we propose a novel LUT-Log-BCJR architecture for energy-constrained scenarios, which avoids the wastage of energy that is inherent in the conventional architecture of Section II. Our philosophy is to redesign the timing of the conventional architecture in a manner that allows its components to be efficiently merged. This produces an architecture comprising only a low number of inherently low-complexity functional units, which are collectively capable of performing the entire LUT-Log-BCJR algorithm. Further wastage is avoided, since the critical paths of our functional units are naturally shortand equally-lengthened, eliminating the requirement for additional hardware to manage them. Furthermore, our approach naturally results in a low area and a high clock frequency, which implies a low static energy consumption. As we will show in Section III-A, the LUT-Log-BCJR algorithm is naturally suited to this philosophy, since it can be decomposed into classic ACS operations. In Section III-B we tackle the challenge of devising an architecture that is sufficiently flexible for performing the entire LUT-Log-BCJR algorithm, using only a small number of functional units. Furthermore, Section III-C proposes a functional unit that is capable of performing ACS operations, while maintaining a short critical path and a low complexity. Finally, in Section III-D, we will design a controller for our architecture, using the LUT-Log-BCJR decoder of the 3GPP LTE turbo decoder as an application example.
A. Decomposition of the LUT-Log-BCJR Algorithm
Observe that (1), (3)- (5) of the LUT-Log-BCJR algorithm comprise only additions, subtractions and the calculation of (2) . While each addition and subtraction constitutes a single ACS operation, each calculation can be considered equivalent to four ACS operations, as shown in Table I . In the general case, where fraction bits are employed in the twos complement fixed-point LLR representation, a total of ACS operations are required to carry out the calculation. By contrast, only a single ACS operation is required when or when employing the Max-Log-BCJR algorithm, which approximates the by the operation. Similarly, fewer ACS operation are required, when employing the Constant-Log-BCJR [26] algorithm. These alternative algorithms reduce the hardware complexity and increase the throughput, therefore reducing the energy consumption . However, this is achieved at the cost of requiring a higher transmission energy to achieve the same BER performance. As a result, these transformations are typically detrimental to the overall energy consumption of , as discussed in Section I.
B. Proposed Energy-Efficient LUT-Log-BCJR Architecture
Inspired by the analysis of Section III-A, the proposed energy-efficient LUT-Log-BCJR architecture is shown in Fig. 3 . Unlike conventional architectures, it does not use separate dedicated hardware for the three recursions shown in Fig. 2 . Instead, our architecture implements the entire algorithm using ACS units in parallel, each of which performs one ACS operation per clock cycle. Furthermore, the proposed architecture employs a twin-level register structure to minimize the highly energy-consuming main-memory access operations. At the first register level, each ACS unit is paired with a set of general purpose registers R1, R2, and R3. These are used to store intermediate results that are required by the same ACS unit in consecutive clock cycles. For example, this allows the four ACS operations equivalent to a calculation to be performed in four consecutive clock cycles using a single ACS unit, as detailed in Section III-C. The second register level comprises REG bank 1 and REG bank 2 of Fig. 3 , which are used to temporarily store the LUT-Log-BCJR variables between consecutive values of the bit index during the recursions decoding processes. The REG bank 1 comprises registers for the a priori LLRs and and dummy registers for the required LUT constants of (2) . Meanwhile, the sets of or metrics are stored in REG bank 2 of Fig. 3 . The main memory stores all the required a priori LLR sequences and extrinsic LLR sequences during the decoding process and the state metrics from the previous window, which facilitates the processing of the entire LUT-Log-BCJR algorithm. Since the proposed architecture supports a fully parallel arrangement of an arbitrary number of ACS units of Fig. 3 , it may be readily applied to any LUTLog-BCJR decoder, regardless of the specific convolutional encoder parameters 2 employed. Note that in contrast to the different-length data paths of Fig. 2(a) , the identical parallel data paths shown in Fig. 3 have equal lengths, which avoids energy wastage, as described above.
C. Novel ACS Unit
In this section we propose the novel low-gate-count ACS unit of Fig. 4 , which performs one ACS operation per clock cycle. The control signals of the ACS unit are provided by the operation code , which can be used to perform the functions listed in Table II . Note that the operation code approximates the absolute difference between two operands, as required by (2) . Its result is equivalent to for . However, for , the result is given by . In the two's complement operand representation employing fraction bits, this is equivalent to decrementing the binary representation of , which is equivalent to subtracting . Note that a simpler ACS unit implementation is facilitated by this deliberately introduced inaccuracy, which can be trivially canceled out during the calculation. More specifically, a calculation can be performed with the following four operations, which store intermediate results in the registers , and , of Fig. 3 . Op 1 In this clock cycle the calculation is activated by using the operation code of Table II and loading operands and from the registers and of Fig. 3 , respectively. The result is then stored in register , which is the approximated as . The result determines . Op 2 The LUT comparison performed during the second ACS operation is activated by the operation code of Table II . Operand uses the constant decimal value 0.75, which is provided by the register bank 1 in the architecture of Fig. 3 . Operand takes value from , which is the approximated that was obtained in the previous clock cycle. In this clock cycle, the result is not stored, while the result stored in provides the outcome of the test , as required by the second ACS operation described in Section III-A. Op 3 Similarly to the previous clock cycle, the result of the test or of the test is determined depending on whether it was previously decided that . More specifically, we employ the operation code of Table II, use the value  stored in for the ACS unit's operand and substitute the constant value of 0 or 2 for , as appropriate. As shown in (2), these constant values are the first and third entries of the LUT. Op 4 The calculation of (2) is completed in the fourth clock cycle by using the operation code of Table II . Here the operand is provided by the maximum of and , as identified by of Fig. 4 . Meanwhile, a value for the operand is selected from the set , depending on the contents of and of Fig. 4 . As a result, we have (6) as required by (2). 
D. Example Controller Design
As described in Section III-B, the proposed architecture can be readily applied to any LUT-Log-BCJR decoder, regardless of the corresponding convolutional encoder parameters employed. This is achieved by specifically designing a controller for the LUT-Log-BCJR decoder. To exemplify this, we designed a controller for a sliding-window implementation of the LTE turbo code's LUT-Log-BCJR decoder, which corresponds to an encoder having memory elements. Since the proposed architecture employs parallel ACS units, it facilitates the parallel processing of or state metrics at a time. As a result, "just-in-time" processing of the forward and backward recursions may be achieved, dispensing with the need for additional registers. This facilitates a reasonable throughput and a low energy consumption, as shown later in Section IV.
Our controller meets the timing diagram of Fig. 5 , which was designed to implement the sliding-window based LUT-Log-BCJR algorithm. To reduce the memory required for storing the state metrics of (1), the sliding-window implementation performs the forward and backward recursions of the LUT-Log-BCJR algorithm for windows of just bit indices . For the pre-backward recursion, windows of 24 bit indices are employed, as advocated in [27] . As shown in the columns of Fig. 5 , both the forward and pre-backward recursions require 7 clock cycles per bit index , while the backward recursion requires 24 clock cycles. Observe that a total of clock cycles are required for processing a window of LLRs, which gives an average of 32.31 clock cycles per LLR. The activities of the ACS units and the two register banks are shown in the rows of Fig. 5 , where both additions and subtractions require a single clock cycle, while the calculations require four clock cycles. The hardware inactivity during the extrinsic LLR calculation is caused by the data dependencies that are implied by (5) , requiring an implementation using a binary tree structure of operations. As shown in Fig. 5 , the proposed architecture performs the pre-backward recursion for just 24 of the 128 bit-indices in each window. By contrast, the conventional architectures typically perform the pre-backward recursion for all bit-indices in each window, as shown in Fig. 2(b) . This therefore represents wastage, which is eliminated in the proposed architecture, giving an energy saving as discussed above. Moreover, the proposed architecture can be readily scaled to include either more or less ACS units, as well as reconfigured by adjusting the controller design. It can therefore be readily applied to other turbo code designs or decoding algorithms, such as the Viterbi algorithm or other variations of the Log-BCJR algorithm.
For example, for a turbo code employing convolutional encoders having an input bit sequence and an output bit sequence , but a different number of memory elements , the optimal number of ACS units to include in the architecture is given by . Regardless of , the calculation of the state metrics or will still require the same seven clock cycles, as in Fig. 5 , since the ACS units are capable of computing these in parallel, each employing one and two addition operations. Similarly, the calculation of the transition metrics will still require the same four clock cycles, as shown in Fig. 5 , since each of the ACS units is capable of calculating a pair of transition metrics using three addition operations. Finally, the LLR calculation of Fig. 5 requires clock cycles, which is the duration required for carrying out operations and one subtraction. Since the specific choice of has little effect on the timing diagram of Fig. 5 , it may be readily employed as the basis of the controller design for a wide variety of turbo code configurations.
IV. TURBO DECODER COMPLEXITY AND ENERGY ANALYSIS
To analyze the complexity and the energy efficiency of the proposed LUT-Log-BCJR architecture, we implemented an LTE turbo decoder using Taiwan Semiconductor Manufacturing Company (TSMC) 90 nm technology. The turbo decoder comprises four parts, namely a LUT-Log-BCJR decoder, an interleaver , a controller and the memory. The interleaver was implemented according to the latest low-complexity LTE interleaver designs [28] , [29] . The memory employs one (128 64)-bit on-chip single-port SRAM module for storing the state metrics. Similarly, it employs five (6144 6)-bit on-chip single-port SRAM modules for storing the two sets of a priori LLRs, the two sets of extrinsic LLRs and the single set of systematic LLRs. The layout of the decoder is provided in Fig. 6 . As shown in Fig. 6 , the hardware complexity of the proposed architecture is so low that the chip area is actually dominated by the memory module, which consumes 40% of the overall energy consumption according to our post-layout simulation results. By contrast, the chip area of conventional LUT-Log-BCJR architectures is typically dominated by the decoder, despite employing similar amounts of memory.
In Table III , we compare the proposed architecture to the latest LUT-Log-BCJR and Max-Log-BCJR decoder architectures [5] , [6] , [10] , [11] , [13] . The area and energy consumptions are estimated based on post-layout simulations. The implementation results arising from different technologies are also scaled 3 to give a fair comparison. As shown in Table III, the energy  consumption of the proposed architecture is significantly lower than that of the conventional LUT-Log-BCJR architectures. Furthermore, our proposed architecture has a similar energy consumption to that of the recent Max-Log-BCJR decoders, but facilitates a 10% lower transmission energy , as discussed in Section I.
To analyze the overall energy consumption of the LUT-Log-BCJR and the Max-Log-BCJR decoders, the BER performance of the proposed architecture and the ideal performance of the two types of the decoders are quantified in Fig. 7 . 4 Here, BPSK modulation is assumed, since it is widely adopted in the existing wireless sensor networks [30] . Furthermore, we assumed transmissions over a non-dispersive uncorrelated worst-case Rayleigh fading channel. As shown in Fig. 7 , the BER performance of the proposed LUT-Log-BCJR architecture is within a tiny fraction of a decibel from that achieved by the ideal Log-BCJR algorithm. Furthermore, as discussed in Section I, the low complexity of the Max-Log-BCJR is achieved at the cost of requiring a 0.5 dB higher transmission energy per bit to achieve a BER of , as shown in Fig. 7 . As a result, the LUT-Log-BCJR algorithm facilitates an overall energy consumption-including the energy consumed during both transmission and decoding-that is 10% lower than that of the Max-Log-BCJR at long transmission ranges, where the energy consumption of the turbo decoder is negligible compared to the transmission energy required. Indeed, the analysis 5 of [3] , [31] reveals that a small difference in BER performance has a significant effect on the overall energy consumption . As a result, the proposed architecture offers the lowest overall energy consumption when the transmission distance is beyond 39 m, as shown in Table III . Compared to the most energy efficient Max-Log-BCJR design [6] in Table III , which has an energy consumption of 0.16 nJ/bit/iteration, the proposed LUT-Log-BCJR decoder achieves more than 10% overall energy savings when the transmission distance reaches 58 m, as shown in Table III . Indeed, Fig. 8 shows the overall energy consumption difference between the Max-Log-BCJR of [6] and the proposed architecture, which is formulated as . As indicated by negative values of in Fig. 8 , the Max-Log-BCJR decoder of [6] has a (slightly) lower overall energy consumption than the proposed decoder when transmitting across short ranges of less than 39 m. By contrast, the proposed architecture offers a significant overall energy saving that increases exponentially beyond a range of 39 m, relative to the state-of-the-art Max-Log-BCJR decoder [6] .
As discussed in Section III, the proposed architecture achieves an energy saving, because it efficiently employs a novel low-complexity ACS unit having a short critical path, which avoids the energy wastage that occurs in conventional architectures. As discussed in Section III-D, this principle may be generally applied to any arbitrary turbo code configuration, for achieving similar energy savings to those demonstrated for our example of the topical LTE LUT-Log-BCJR turbo decoder.
V. CONCLUSION
In this paper, we demonstrated that upon aiming for a high throughput, conventional LUT-Log-BCJR architectures may have wasteful designs requiring high chip areas and hence high energy consumptions. However, in energy-constrained applications, achieving a low energy consumption has a higher priority than having a high throughput. This motivated our low-complexity energy-efficient architecture, which achieves a low area and hence a low energy consumption by decomposing the LUT-Log-BCJR algorithm into its most fundamental ACS operations. In addition, the proposed architecture may be readily reconfigured for different turbo codes or decoding algorithms. We validated the architecture by implementing an LTE turbo decoder, which was found, in Table III , to have an order-of-magnitude lower area than conventional LUT-Log-BCJR decoder implementations and an approximately 71% lower energy consumption of 0.4 nJ/bit/iteration. Compared to state of the art Max-Log-BCJR implementations, our approach facilitates a 10% reduction in the overall energy consumption at transmission ranges above 58 m. Furthermore, we demonstrated that our implementation has a throughput of 1.03 Mb/s, which is appropriate for energy-constrained applications, such as in environmental monitoring WSNs [2] , [32] . Dr. Al-Hashimi is the coauthor of two Best Paper Awards: the James Beausang at the ITC 2000, relating to low power BIST for RTL data paths, and at the CODES-ISSS Symposium 2009, relating to low-energy fault-tolerance techniques. He is a co-author of a paper on test data compression which has been selected for a Springer book featuring the most influential work over the ten years of the DATE conference. He 
