ABSTRACT During recent years, researchers throughout academia and industry have been advancing the theory, designing, and applications of mobile service computing through the Internet of Things (IoT). Research interest in mobile service computing stems from its performance, security, reliability, and power consumption. Hence, ultra-low power integrated circuits are essential for mobile service computing that can offer the advantages of low power for computational tasks in the IoT that is driven by the restricting constraints of power consumption and autonomy in both computation and idle phases. To attain the benefits of ultra-low power circuits, the energy-consuming and computational intense demands are imposed by the underlying processing and memory devices on which the conventional ultra-low power integrated circuit can benefit substantially from innovative hardware designs. Logic-in-Memory (LIM) architectures are considered as the potential approaches to attain goals within area and energy constraints starting with the lowest layers of the hardware stack. In this paper, we propose and implement the LIM asynchronous computing paradigm for energy-efficient mobile service computing. The results indicate that the proposed design achieves 38% leakage reduction and 30% accuracy improvement compared to the state-of-the-art non-volatile asynchronous circuits. At the system level, we compare our designs with various commercial microprocessors. The experimental results show that the asynchronous processors attain a four-fold throughput increase relative to their synchronous counterparts under these operating constraints. Therefore, the proposed design offers an approach toward tangible benefits of the battery-constrained embedded mobile service computing.
I. INTRODUCTION
The enabling infrastructure for the Internet of Things (IoT) is its network of interrelated computing devices which can communicate with each other without significant human interaction with these devices [1] - [3] . By utilizing the IoT devices and wireless technologies, service computing is attracting more and more attention to provide rapid, low
The associate editor coordinating the review of this manuscript and approving it for publication was Tie Qiu. energy consuming, secured, and reliable applications [4] - [6] . Such IoT computing devices usually have access to a very limited source of power, and this makes efficiency and power consumption an essential topic for them. These devices usually spend most of their time in power-off/sleep mode utilizing a limited source of power [7] - [10] , the circuit's leakage current has a critical issue not only during the computational interval but also during the idle phases. Therefore, an ultralow power integrated circuit is required to meet constraints such as long idle duration, rapid wake-up operation, data retention during the circuit's sleep mode, and support for ultra-low voltage power supply operation. Although design of current IoT devices is predominantly employing synchronous circuits, asynchronous counterparts offer potential advantages for all of the above IoT-specific operating characteristics. Compared to synchronous circuits, the asynchronous circuit offers intrinsic and automatically-quiescent sleep mode during transition-free intervals. However, to attain the benefits of the asynchronous design, the static power consumption due to transistor leakage has become an intensive focus of industry, and academic research [11] , [12] . Recently, non-volatile devices are sought to attain these goals by reducing static power during the sleep intervals. In the practical applications, numerous devices in the IoT domain such as wearable devices, home automation systems, smart grids networks, sensor clusters, and connected vehicles may benefit from asynchronous circuitry leveraging non-volatile devices to meet limited power budgets [13] . To date, some related works outlined in [12] propose hybrid CMOS/magnetic technologies for use in the asynchronous circuits.
In this paper, we propose a non-volatile asynchronous circuit with ultra-low power consumption, low voltage operation, and high reliability. The proposed non-volatile asynchronous circuit embraces and exploits physics of Differential Spin Hall MRAM (DSH-MRAM) and asynchronous circuit logic for implementing the ultra-low power applications for the future IoT devices. Relative to these related approaches, the major contributions herein include:
• Novel non-volatile asynchronous circuit for operations using DSH-MRAM devices are proposed. Such design embraces the intrinsic physical switching characteristics of DSH-MRAM device to achieve lower energy, faster execution and higher accuracy.
• Development of non-volatile asynchronous circuits through the DSH-MRAM designs using an innovative in-memory approach to reduce data transfer between memory and logic elements.
• Experimental results show that the proposed non-volatile asynchronous circuit can achieve 10-fold lower energy consumption, 1.6-fold faster execution and 1.4-fold higher accuracy compared to the counterpart. The remainder of this paper is organized as follows: Section II discusses the fundamentals of these technologies, and how they are beneficial for this new architecture. Next, in Section III, the details of the implementation of the proposed C-element and simulation results are presented. In Section IV, the proposed non-volatile asynchronous circuit designs within simulation results for this architecture and its comparison to other implementations are presented. Finally, Section VI describes the conclusion and further works.
II. RELATED WORKS A. POST-MOORE DEVICES
As CMOS technology continues scaling downward, leakage current influencing static power consumption for sub 65nm technology nodes becomes an increasing concern. Although advancements in device fabrication technologies such as FinFET [14] by TSMC and INTEL, and FD-SOI technologies [15] by SOI industry consortium, the leakage current from OFF-devices is still a challenge to achieving a low power design. Some other approaches have been proposed to reduce leakage power dynamically through scaling the supply voltage. The method requires monitoring the power supply through controller circuits which can dynamically decrease supply voltage without degrading the performance of the system. However, the chip area is increased due to extra control circuitry. Another effective approach proposes to reduce leakage power through partially or completely turning off the circuits from the power source. However, the charge-based devices lose their information when the power is OFF, especially happening in various memory elements such as SRAMs, flip-flops, and registers. This technique requires an extra recovery phase on power up which degrades the wake-up time of the circuit. Recently, non-volatile memory (NVM) has been employed to overcome this issue which maintains data values even without the supply voltage present. Several emerging NVM technologies such as Ferroelectric Random Access Memory [16] , Phase Change Mmeory (PCM) [17] , Resistive RAM (ReRAM) with [18] , and Magnetic RAM (MRAM) [19] have been proposed to overcome this issue. These proposed devices for NVMs have various physical structures [20] and result in a range of physical characteristics. ReRAM and MRAM are considered as dominant devices in the emerging NVM devices. However, these devices face different drawbacks due to their physical structures. For example, ReRAM and MRAM demand high writing energy compared to volatile technologies [18] .
B. ASYNCHRONOUS CIRCUITS
As explained earlier, asynchronous circuits can offer new tradeoffs for dynamic power compared to their synchronous counterparts. Foremost, the include reduced transition activity and avoidance of worst-case design constraints. In addition, asynchronous logic automatically enters an idle state whenever computation ceases. Furthermore, dynamic power consumption is reduced due to the absence of a clock tree network.
C. LOGIC-IN-MEMORY COMPUTATION PLATFORM
NVM technology can find applicability to many memory applications, such as cache, storage memory, and embedded memory cells such as registers. The traditional digital design shown in Fig. 1(a) which is the fundamental architecture for many computing devices, can be used in a more advanced architecture such as Logic-in-Memory in Fig. 1(b) . In this architecture, the distribution and connection of the memory cells leads to standby power savings and potential reductions of the memory access time. Also, stacking MRAM technology over the CMOS circuit, can desirably reduce chip area and fabrication costs. [21] is composed of two MTJs and a spin Hall metal (SHM) which is a non-magnetic conductor with spin-orbit interaction as shown in Fig. 2 (a) . The stored information is represented by the magnetizations of the free layers (m 1 and m 2 ) which are adjacent to the SHM. The pinned layer magnetization of the two MTJs is constant in one direction. The coupling between orbital motion and electron spin (spinorbit coupling) in SHM averts the −X and +X aligned spins to the bottom and top sides of the SHM. The spin current is calculated from the equation shown below:
III. PROPOSED NON-VOLATILE C-ELEMENT
where, A MTJ and A SHM are the cross sectional area of the MTJ and SHM respectively, θ SHM is the spin Hall angle, λ sf is the spin flip length (λ sf = 1.5nm), and I e is the charge current flowing through the SHM (extractable from SPICE based circuit simulation). According to equation (1), as t SHM reduces, the spin Hall current density is also reduced [22] . The switching dynamics of m 1 and m 2 can be inspected by using the eq. (1) and the generalized Landau-Lifshitz-Gilbert (LLG) equation together [21] , [23] . Since the two free layers are splitted by thin SHM, the dipolar coupling between m 1 and m 2 has to be considered. The magnetization of dynamics of m i (i = 1, 2) can be written using macrospin approximation:
where the effective field (H eff ,i ) has self-demagnetization field which is lead by shape anisotropy. The dipolar coupling field is given by H Dip,i = −N dip M s m j where N dip is defined as effective dipolar coupling factors which is given by micro-magnetic simulations [24] . Due to the non-uniform of the dipolar flown inside a magnet, the dipolar flown from m j and m i is calculated by the average of m i volume. The simulation parameters are listed in Table 1 . In conclusion, the efficient magnetic mechanism leads to the writing complementary bits consuming same amount of energy. Also, since this technique doesn't require the tunneling current, the writing speed can be improved to less than 1ns by increasing the writing current without reliability limitation. Utilizing spin hall effect in writing, the writing energy can be improved compared with the standard 1T1R STT-MRAMs which is explained as follows. In the standard STT-MRAM, spin-polarized electrons are inserted into the free-layer and each of them applies only one quantum of angular momentum to the free-layer. On the contrary, in writing with spin Hall effect, an electron can apply STT continuously scattering at the interface between the free-layer and the SHM [25] . Therefore, energy efficient writing can be achieved through the injection of the spin current which is larger than the charge current. The reading operation in DSH-MRAM device is described in Fig. 2 (c) , which exhibits the SHM plays as the terminal of the reading path. Thus, differential resistance of MTJ 1 and MTJ 2 which represents parallel (P) and anti-parallel (AP) states are depended on the orientation of the free layer and fixed layer. Consequently, the difference between two sensing currents which pass through MTJ 1 and MTJ 2 can be read out. Compared to the standard STT-MRAM, the proposed device is self-reference without a global reference cell.
B. DSH-MRAM SINGLE-BIT CELL
Writing and reading logic bits using DSH-MRAM device are performed by applying proper voltages to the read word line (RWL), write word line (WWL), two bit-lines (BL/BLB), and source-line (SL). In the bit-cell shown in Fig. 3 (a), RWL and WWL which control the access transistors, are asserted high during reading and write operation, respectively. For example, to write '0', an appropriate write voltage (V WRITE) is applied to the BL/BLB, and SL is grounded to provide the path for the charge current to flow from BL/BLB to SL through the SHM. In contrast, to write '1', the write current has to flow in the opposite direction and this can be done by reversing the voltage polarity of BL/BLB and SL. To out the state of MTJ 1 and MTJ 2 , while SL is at the ground, the RWL is asserted high, and the reading current is injected to BL/BLB. The result of the voltage difference over the BL and BLB can be used to determine the data value stored in the MTJs. The DSH-MRAM shows three main advantages over the standard 1T1R bit-cell which exhibits as a good candidate for the C-element. Firstly, the DSH-MRAM can switch faster without any reliability issues facing two-terminal STT-MRAM. Secondly, the resistance in DSH-MRAM is lower than the standard STT-MRAM and results in smaller transistor width for the same writing current. Lastly, the efficiency of the writing operation using SHE is improved. The biasing conditions for reading and write operations are shown in Fig. 3 (b) . Fig. 3 (c) shows the layout of the DSH-MRAM cell. The length and width are employed as the half half of the minimum feature size of transistor technology. In CMOS design rules, when the width of the access transistor is sufficiently small, the minimum area of DSH-MRAM cell is upon to the metal pitch [26] .
The equivalent circuit during a read operation is shown in Fig. 4 (a) . In Fig. 4 (a) , the differential read operation utilized in the memory structure is shown. In this differential read, the read currents go through the SHM which behaves as the common terminal for them. MTJ 1 and MTJ 2 are in the parallel (P) and anti-parallel (AP) sates, respectively. The alignment of the free layer concerning the fixed layer determines the resistance of the MTJs. Therefore, the voltage difference produced by the two read currents can be used to assess the stored bit. During the writing operation, the equivalent circuit of the DSH-MRAM bit-cell is composed of a resistor serially connected with an access transistor as shown in Fig. 4 (b) .
C. NON-VOLATILE (NV) DSH-MRAM-BASED C-ELEMENT
A conventional circuit implementation of a static C-element with two inputs and its truth table are shown in Fig. 5 (a) (b) . In Fig. 5 (a) , the truth table of C-element is depicted. The schematic of the conventional C-element is shown in Fig. 5 (b) . The circuit of conventional C-element consists of the pull-up, pull-down, and latch network. Similar to the conventional C-element, the proposed Differential Spin C-element employ a DSH-MRAM as non-volatile memory. In Fig. 5 (d) , the proposed Spin C-element is presented with writing and reading circuits. In this paper, the Spin-based C-element symbol is defined and used in the logic level, as shown in Fig. 5 (c) .
The proposed Spin C-element operates in three operational modes: holding, writing and restore mode. In the remainder of the section, we describe the three operational modes in details.
1) HOLDING AND WRITING OPERATION
To back-up the data, the writing circuitry requires extra transistors for management of writing operation as shown in Fig. 5 (d) . The control signal (V wr ) remains high while other control signals keep low. Consequentially, the writing current is applied to the two terminals of the Giant Spin hall effect Metal to write the top and bottom MTJs. In Fig. 6 (a) , the current signal I write is marked as red arrows that passes through Q and Q n . The active transistors during the writing operation are also exhibited in Fig. 6 (a) . Simulation results with all control signals are plotted in Fig. 6 (e) . To simulate the proposed Spin C-element, a 45 nm FINFET transistor [27] is used to construct CMOS logic. Since the DSH-MRAM device requires smaller writing current than the conventional spintronic device such as MTJ, minimum feature transistor in advance technologies such as FINFET can generate sufficient writing current without a writing buffer. Similar to write ''1'', the write ''0'' procedure requires extra transistors to generate an opposite current loop. In Fig. 6 (c) , active transistors in write ''0'' process are presented. Fig. 6 (e) presents the simulation results for writing ''0''. In the holding mode, the DSH-MRAM device does not involve in the operation. Thus, the proposed Spintronic C-element behaves like a conventional C-element realized using CMOS logic.
2) RESTORE OPERATION
To restore the data, the reading circuitry with several transistors are utilized to control the restore procedure as shown in Fig. 5 (d) . The auto-zero procedure is applied to the proposed circuit in order to erase the previous data at the output. Fig. 6 (b) and Fig. 6 (f) exhibit the simulation and active circuit. Then, a 40 µA current is injected into the bottom and top MTJs to read their resistances which can be done through measuring the voltage of the top and bottom MTJs, while the control signal (V rd ) remains high. Consequently, after powering up, the restore operation turns on. Two voltages read from the top and bottom MTJs (V stored n , V stored ) updates the Q(V ) and Q n (V ) as shown in Fig. 5 (d) and (f) .
IV. NON-VOLATILE ASYNCHRONOUS CIRCUITS DESIGNS
To illustrate the benefits of NV functionality, a Quasi Delay Insensitive (QDI) asynchronous pipeline is chosen which is equipped with a dual rail 4-phase handshake protocol (Fig. 7) . This system can be implemented by using Weak-Conditional Half-Buffers (WCHB) separated by some logical blocks as shown in Fig. 7 (b) . The function of half-buffers (HB) is to save the data on the stages of processing and organize the communication protocol as shown in Fig. 7 (a) . The logical blocks are used to do the processing. In this figure, the NV functionality back-ups the intermediate data stored in the HBs.
A. HALF BUFFER DESIGN
One of the fundamental building blocks of asynchronous circuits of this type is the Half-Buffer. The circuitry of WCHB is composed of a NOR gate (for generation of acknowledgment Ack signal) and two C-elements (Fig. 8) . To transfer a volatile version of HB to NV version is not readily achieved by replacing the volatile C-elements with NV C-elements and it requires extra circuitry. By looking closer at waveforms shown in (Fig. 5) , we can notice the intermediate phase during the restore phase. Since that level is very near to the logic high level, downstream circuits might misinterpret it. In addition, There will be high states both on 0 and 1 rails of the HB output which should always be prohibited based on a NULL Convention Logic encoding [12] . As a result, doing gating during the restore phase prohibit the propagation of this high state to the output of the logic cell.
B. PIPELINING ARRANGEMENTS
As shown in Fig. 9 , the pipeline with Spin C-element replaces the conventional half-buffers. However, this straightforward approach raises two issues. The first issue is alignment of the asynchronous operations of the HB with the back-up/restore phases. The appropriate time to back-up information to the NV memory is when all outputs have a stable logical level. The second issue is arises from the writing time of the VOLUME 7, 2019 magnetic memory. As previously mentioned, compared to the other NV technologies, the write accessibility of STT-MRAM is faster and more reliable. However, it is still slower compared to a CMOS-based C-element by order of magnitude. These two issues can be effectively solved by utilizing a mechanism which slows down the pipeline in the back-up phases and keeping the states in a well-defined status till the finish of the writing procedure. The writing signal can be synchronized with the NV cells in either asynchronous or synchronous way.
While the synchronization can be achieved through gating the acknowledge signals by WR (global writing signal), asynchronization can be done by integrating synchronizers in the design. For writing, the acknowledge signal is gated at the input of the pipeline for a time. In Fig. 10 , the simulation of the designed NV pipeline is shown. The standard operation mode begins when tokens propagate from stage to stage. The back-up procedure is initiated by WR (global writing signal) when the third token arrives. Until all stages of the pipeline stabilize their states, the acknowledgment signal ack0 is blocked and the source of token keep its state. When the writing in DSH-MRAM is done, the ack0 is released by the writing signal, and the pipeline carries on its operation in a standard mode. After some time, due to some circumstances such as limited power supply, the power supply will be disconnected. As soon as the system becomes stable, the process of recovery is initiated by global signals AZ and RD. The information is restored from NV memory and pipeline continues its operation from the saved earlier state.
C. BODY BIASING AND SUPPLY VOLTAGE CONTROL
A benefit of using FD-SOI technology is that we can have control over the supply voltage and Body Biasing (BB). In this paper, simulation has been done for both normal operation (under nominal supply voltage and without BB), and an operation where the supply voltage and body bias is varied. The last operation is done during the idle, back-up and restores phases of the circuit. The reduction in supply voltage is limited by the need to reliably sustain the data. In the back-up phase, to reduce the switching time of the DSH-MRAM and increase the writing current in the NV cell, we can utilize forward biasing and increase the supply voltage. By considering these scenarios, four algorithms of BB and supply voltage control have been developed as listed in Table 2 . The benchmarks used for simulation are conventional implementation with/without BB and the proposed NV architecture with/without BB.
D. MEASUREMENT RESULTS
The proposed Spin C-element and conventional designs have been studied for evaluation: 1-bit pipeline, 4-bit, 8-bit and 16-bit 4-stage adders. All of them have been simulated in terms of leakage, write, reading energy. In Fig. 12 (a) , the leakage power of the conventional volatile circuits with and without body biasing and Vdd under 0.5V is shown. According to the simulations in test circuits, by using body biasing, the leakage power can roughly be reduced by 36%. The energy consumed by the back-up and recovery (EWR+ERD) with and without body biasing is shown in Fig. 12 (b) . In the pipeline case, the body biasing feature can reduce up to 29% of the energy consumption on average for adder designs. Fig. 12 (c) shows threshold time. Although the body biasing has a positive impact on energy used in back-up/recovery, there isn't a noticeable reduction of the threshold time in the circuits using the body biasing technique. The reason for this behavior is that the effect of reverse body biasing on the reduction of the leakage current in conventional circuits overcome on the impact of the forward body biasing on the energy reduction in the proposed method.
There are two effects which prohibit the pipeline circuit from having optimal total writing energy. The first effect is a nonlinear behavior between power and current of the MTJ. The second effect is an increased switching time when the writing current gets close to the critical current I C0 . As a result, to achieve optimal writing energy, proper techniques have to been employed before or after fabrication. However, it is preferable to consider process-voltage-temperature variations after the layout and fabrication. This calibration technique requires some extra on-chip circuitry which can control the energy behavior of the non-volatile memory.
An interesting idea which is based on detection of the activity is presented in [28] . As shown in Fig. 13 , this idea can be extended to the pipeline stage. Herein, we can locally control the writing energy of the Spin C-element. When the global writing signal is high, the circuit locally detects and analyze the status of the stages. Once the stage receives the token, it sets the local writing signal. Through this technique, the unnecessary writing operations are stopped, and it results in a considerable power saving in writing energy.
E. TUNNEL MAGNETORESISTANCE (TMR) IMPACT ANALYSIS
In order to analyze the impact of different tunnel magnetoresistance (TMR) ratios, 1000 MC simulations were performed on a single C-element cell in order to quantify the effects of PV on the proposed device structure. In this simulation, various deviations for the CMOS threshold voltage (V th ) and TMR ratio were varied in the SPICE netlist following the Gaussian distribution with the mean equal to the nominal model card, which is provided in [29] . Fig. 11 (a) shows 1000 MC simulations for the bit-error rate (BER) average. If we assume 10% variation on TMR, the maximum BER is exhibited by both designs for TMR = 100%. With TMR increases, BER reduces and reliability is improved. Compared to the STT-MRAM-based C-element, the proposed Differential Spin C-element conveys improved BER results due to employing the DSH-MRAM device. Fig. 11 (b) (c) delineate the results by considering the (W/L) ratio of PMOS and NMOS, respectively. If the transistor of PMOS W/L is 4 and NMOS W/L is 2, the BER is improved with larger TMR in Fig. 11 (b) . If we change the PMOS W/L= is 4 and NMOS W/L is 2, the enhanced reliability of the proposed architecture is obtained in Fig. 11 (c) .
V. APPLICATION TO IoT DEVICES
In conventional commercial processors, the clock runs according to the worst timing path. This path doesn't need to be always activated. In the case of asynchronous circuits, however, the completion of each operation is signaled by completion of the output signals. Thus, the system generally operates as fast as when the operation finishes which means circuit achieves the average case propagation delay. In creating an asynchronous design without a clock, the more robust design would be one which is insensitive to delays of the interconnect. In this design, whenever the circuit starts from a valid state, it cannot become unstable and reach an invalid state. In small circuits this design is possible, but as circuits get larger and have more paths, there will be more chance for the circuit to get benefit from being self-timed.
The microprocessor introduced here uses a quasi-delay insensitive (QDI) system which makes no assumption regarding the delays in operators and wires, except isochronic forks which can be assumed that any fanout has negligible delay difference. In terms of delays, QDI circuits are considered the most conservative circuits. However, this dependence on delay is negligible and as a result makes them robust regarding PVT(process, voltage, and temperature) variations. This tolerance also protects QDI circuits from timing errors since there is no clock in the circuit.
To remove other timing assumptions, the information about a signal's validity is encoded in the signal itself by using a request-acknowledge wire communicating one bit of data and one dual-rail channel, consisting of two data wires (one representing 0; the other, 1) [30] .
To overcome these issues, several asynchronous microprocessors have been developed and fabricated by [31] . To test our approach in system level, we compare the proposed method with other commercial processors in Table 3 . Since our proposed Spin-C-processor employs the similar architecture to MiniMIPS, word length is not changed in the given simulation. Our SPICE simulation is based on HP's 0.6µ CMOS process. The SPICE simulation shows that the proposed Spin-C-processor can achieve similar throughput as MiniMIPS. However, the fabricated MiniMIPS has lower performance due to layout and timing tolerances. Compared to commercial CPUs, asynchronous professors are four-fold faster than the similar synchronous commercial microprocessors which are shown in Table 3 . In terms of energy and power measurements, it is noted that the proposed Spin-Cprocessor consumes less energy due to its efficient logic-inmemory architecture which reduces the transfer of the data from memory to ALU.
VI. CONCLUSION
This paper has presented a new implementation of non-volatile C-element logic gates and using them to make non-volatile Half-Buffers. The simulation results show that by using body biasing techniques, the supply voltage can be reduced as low as 220mV in volatile mode and 540mV in non-volatile mode. In addition, a 38% reduction in leakage current can be achieved by utilizing reverse body biasing. However, this non-volatility brings some drawbacks such as degradation in the timing parameters of the cells and adding extra circuitry which increases the leakage current. In the system level simulation, the proposed circuit can be widely used in modern processor designs to meet the power constraints of IoT devices. YOONSUK CHOI received the bachelor's and master's degrees from Korea University, Seoul, South Korea, and the Ph.D. degree in electrical engineering from the University of Nevada at Las Vegas, with a focus on computer engineering. Prior to his doctoral studies, he was a Research Engineer/Scientist in the fields of nanotechnology and semiconductor with LIG Nex1, South Korea, the Korea Institute of Science and Technology, the Inter-University Semiconductor Research Center, Seoul National University, and the Institute of Quantum Information Processing and Systems, University of Seoul. He is currently an Assistant Professor of computer engineering with the College of Engineering and Computer Science, California State University at Fullerton. He conducted abundant research on nano-devices, such as nano-wire transistor, single-electron transport, nano-in-plane-gate transistor, and quantum dot transistor. His current research is focused on a novel multimodal image fusion method that has various applications, which are contributable to many aspects of human life, such as urban mapping, target detection, concealed weapon identification, natural disaster monitoring, security, surveillance, and early detection of medical symptoms such as cancer. His current research interests include multimodal data fusion and compression, hyperspectral image fusion and analysis, remote sensing image fusion and target detection, and multidimensional signal processing.
YU BAI received the B.S. degree in electrical engineering from National Aviation University, Ukraine, in 2008, the M.S. degree in electrical and computer engineering from the University of Texas Pan American, in 2011, and the Ph.D. degree in electrical engineering from the University of Central Florida, in 2016. Prior to his academic career, he was with Siemens Energy, Inc. He is currently an Assistant Professor with the Computer Engineering Program, College of Engineering and Computer Science, California State University at Fullerton. His research interests include stochastic computing, neuromorphic computing, FPGA design, nano-scale computing system with novel silicon and post-silicon devices, and low-power digital and mixed-signal CMOS circuit design.
