Approved for public release; distribution unlimited.
iv

List of Figures
Introduction
Body sensors show promise in healthcare and the greater biomedical sector. Mobility and portability are desirable for wearable sensors to increase personal care in diagnosis, prevention, and response. A key challenge to this mobility lies in the lack of "real-world neuroimaging" tools that can accurately and appropriately collect low signal-to-noise ratio (SNR) data outside of controlled conditions such as specialized laboratories or clinical environments. 1 The development of such tools will enable optimization of brain-computer interactive technologies (BCIT) in a fieldable form factor. 2 The major barrier for achieving such a device that can handle low SNR data and wirelessly transmit the information is power consumption. Current batteries add substantial bulk and weight, restrict application space, prevent implementation of implantable devices to the system, and also introduce maintenance issues and lifetime costs. Energy harvesting from ambient energy sources presents a potentially indefinite solution, but requires system design that consumes less power than is harvested.
Recent advances in ultra-low power chip design techniques have enabled a push in the development of small devices capable of complex computations but with long lifetime. These dramatically extended lifetimes are feasible without the need for attached batteries or are sustainable using energy harvesting techniques. For example, Zhang et al. 3 present a reconfigurable system-on-chip (SoC) that is capable of processing data, including wireless transmissions, in electrocardiography (ECG), electromyography (EMG), and electroencephalography (EEG) applications that operate using thermoelectrically generated energy.
While suitable for measuring ECG signals, the approaches used to date are not sufficient for use in EEG measurements. Brain-source signals are significantly smaller (1-10 µV as compared to millivolts) and have an extremely low SNR. EEG data acquisition systems will require high sensitivity, strong amplification of the target signal, and high bit-depth resolution of the analogto-digital converter (ADC) in order to properly resolve the small signal from the large voltage fluctuations. In order to increase system/signal fidelity, the analog front-end (AFE) was redesigned as seen in Fig. 1 , resulting in a 12-bit successive approximation register (SAR)/24-bit sigma-delta modulator option with digital hardware deciding the duty cycling, or on/off, of the devices based on power requirements. The challenge here is to design a low power digital library that is robust across process corners while balancing area consumption with the speed and frequency of operation for the logic gates. We explore body biasing as an option to increase the drive strength. We also investigate the use of transmission gate logic to improve the area consumption and robustness across process corners as compared to static complementary metal-oxide semiconductor (CMOS) implementations. We validate such techniques through the design and simulation of inverters, full adders, and a five-stage cascaded integrator-comb (CIC) filter (inverter, XOR, NAND, flip flop, full adder, ripple carry adder, 26 bits).
Circuit Topology/Gate Design/Inverter and Gate Design Trade-Offs
In order to improve system power consumption, performance, and particularly power consumption during digital processing, and make more power available for analog computations, the digital portion of this SoC was redesigned. This section provides a brief overview of subthreshold characteristics of digital logic.
Inverter
A body biasing scheme, called the dynamic threshold metal-oxide semiconductor (DTMOS) inverter, is examined and compared to a standard (non-body biased) inverter. Transmission gate logic is compared to static CMOS gates. We compare delays, noise margin, and power consumption across process variation and with V dd scaling. The library is used to develop flip flops and full adders. A CIC filter is completed and laid out.
Arbitrary Logic Gates
Subthreshold operation exhibits a PUN/PDN imbalance. Stacking emphasizes the problem even more as the stacking factor is greater (causes more imbalance) in subthreshold than above threshold. We avoid stacking by using pass gate or transmission gate logic. This also results in the added benefit of smaller and simpler implementation of XOR/XNOR, making for a more modular nature to implement the common logic gates.
Different 1-bit full adder topologies were studied to make trade-offs in area, power, and propagation delay. Transmission gate logic, also called complementary pair logic, is similar to pass gate logic in that the propagation delay is improved in exchange for increased dynamic power. While pass gate logic using n-type metal-oxide-semiconductor (NMOS) logic has difficulty passing a high signal (p-type metal-oxide-semiconductor [PMOS] logic/low), transmission gate logic uses a complementary pair to average the two device performances for better noise margins.
Generally, above threshold, the weakness of pass gate logic is a V T loss at the output when passing a high input for an NMOS or a low input for a PMOS. By operating subthreshold, the voltage transfer curve is improved because the drain-to-source voltage requirement to keep the device in saturation is reduced to ~3 kT/q or ~78 mV at (273 K/room temperature). Thus, the penalties in noise margin of transmission gate logic versus static CMOS are reduced.
Transmission gate also uses a complementary NMOS/PMOS to average/make up for the loss in drive strength.
Pass gate implementations show improved propagation delay in exchange for increased dynamic power. Logic for a 1-bit full adder can vary, resulting in tradeoffs in area, power, and propagation delay. Pass gate implementations of the 1-bit full adder show improved propagation delay and area at the cost of increased dynamic power. Transmission gate logic with buffered outputs use more area than pass gates, but eliminate the undesirable threshold voltage effects. When using XOR/XNOR and multiplexors, transmission gate is easily implemented due to its modular nature. Pass gate logic implementation of transmission gate logic is very convenient for logic blocks like a 1-bit full adder.
In a subthreshold mode of operation, the weak inversion current has exponential dependencies on V GS and V T . Consequences include sensitivity to supply, process, and temperature variation. Body biasing is one solution to this problem. Body biasing schemes are explored as options to reduce effects due to process variation in Kim et al. (2003) . 5 (This system explores the use of DTMOS inverters, where the gate and body of the metal-oxide-semiconductor field-effect transistors [MOSFETs] are connected together). DTMOS inverters use body biasing to take advantage of the exponential on V T . The p-n junction is "forward biased" below the turn-on voltage, and forcing the resulting increase in inversion charge leads to greater gate capacitance, but also significantly higher current drive in DTMOS inverters.
One consequence of subthreshold operation is the reduction of I ON /I OFF , namely, in the case of stacked transistors. When transistors are in series, their overall strength is lower than that of a single transistor. Transmission gate logic is one solution. Instead of having different strengths in the pull up and pull down network, the circuit has to face issues of leakage and nodes conflict.
Simulation Results
Spice simulations were done on the proposed topologies. A DTMOS inverter was compared to a CMOS inverter. The CMOS inverter was sized to have the same rise and fall time as the DTMOS when loaded with the same capacitance. The DTMOS used less active area, but overall consumed a much larger area. The results were as follows:
• DTMOS: 810 nm x 120 nm + 210 nm x 120 nm = 122400 nm 2
• Static: 1.48 µm x 120 nm + 480 nm x 120 nm = 235200 nm 2 XOR gates were constructed using transmission gates with outputs buffered by inverters. The voltage transfer characteristic (VTC) as the output goes from low to high and from high to low are shown in Fig. 2 . The inverters reduce the variation at the output, by decreasing the range of input voltages where the output is indeterminate. There are two distinct transition points, instead of a symmetrical curve due to the different threshold values for the PMOS/NMOS in the transmission gate. A sine wave signal reference signal with Vpk-pk of 200 mV was processed by a sigma-delta modulator and the resulting signal input to the 26-bit CIC filter to reconstruct the signal. Fig. 3 is a graph of the transient power consumption, and the total energy consumption of the CIC filter. The average power for the system is only 67.34 nW. 
Conclusions
The CIC filter demonstrates lower power consumption than other systems before it. However, while the active area consumed is lower than static CMOS equivalents, the triple well process on a bulk CMOS process significantly increases the amount of wasted area. In a fully depleted SOI, this method would eliminate the need for triple wells, as well as improve the performance of the transistors. Therefore, future work should be aimed at developing this digital logic library at the smaller transistor nodes.
