278 research outputs found
MFPA: Mixed-Signal Field Programmable Array for Energy-Aware Compressive Signal Processing
Compressive Sensing (CS) is a signal processing technique which reduces the number of samples taken per frame to decrease energy, storage, and data transmission overheads, as well as reducing time taken for data acquisition in time-critical applications. The tradeoff in such an approach is increased complexity of signal reconstruction. While several algorithms have been developed for CS signal reconstruction, hardware implementation of these algorithms is still an area of active research. Prior work has sought to utilize parallelism available in reconstruction algorithms to minimize hardware overheads; however, such approaches are limited by the underlying limitations in CMOS technology. Herein, the MFPA (Mixed-signal Field Programmable Array) approach is presented as a hybrid spin-CMOS reconfigurable fabric specifically designed for implementation of CS data sampling and signal reconstruction. The resulting fabric consists of 1) slice-organized analog blocks providing amplifiers, transistors, capacitors, and Magnetic Tunnel Junctions (MTJs) which are configurable to achieving square/square root operations required for calculating vector norms, 2) digital functional blocks which feature 6-input clockless lookup tables for computation of matrix inverse, and 3) an MRAM-based nonvolatile crossbar array for carrying out low-energy matrix-vector multiplication operations. The various functional blocks are connected via a global interconnect and spin-based analog-to-digital converters. Simulation results demonstrate significant energy and area benefits compared to equivalent CMOS digital implementations for each of the functional blocks used: this includes an 80% reduction in energy and 97% reduction in transistor count for the nonvolatile crossbar array, 80% standby power reduction and 25% reduced area footprint for the clockless lookup tables, and roughly 97% reduction in transistor count for a multiplier built using components from the analog blocks. Moreover, the proposed fabric yields 77% energy reduction compared to CMOS when used to implement CS reconstruction, in addition to latency improvements
Accelerating Time Series Analysis via Processing using Non-Volatile Memories
Time Series Analysis (TSA) is a critical workload for consumer-facing
devices. Accelerating TSA is vital for many domains as it enables the
extraction of valuable information and predict future events. The
state-of-the-art algorithm in TSA is the subsequence Dynamic Time Warping
(sDTW) algorithm. However, sDTW's computation complexity increases
quadratically with the time series' length, resulting in two performance
implications. First, the amount of data parallelism available is significantly
higher than the small number of processing units enabled by commodity systems
(e.g., CPUs). Second, sDTW is bottlenecked by memory because it 1) has low
arithmetic intensity and 2) incurs a large memory footprint. To tackle these
two challenges, we leverage Processing-using-Memory (PuM) by performing in-situ
computation where data resides, using the memory cells. PuM provides a
promising solution to alleviate data movement bottlenecks and exposes immense
parallelism.
In this work, we present MATSA, the first MRAM-based Accelerator for Time
Series Analysis. The key idea is to exploit magneto-resistive memory crossbars
to enable energy-efficient and fast time series computation in memory. MATSA
provides the following key benefits: 1) it leverages high levels of parallelism
in the memory substrate by exploiting column-wise arithmetic operations, and 2)
it significantly reduces the data movement costs performing computation using
the memory cells. We evaluate three versions of MATSA to match the requirements
of different environments (e.g., embedded, desktop, or HPC computing) based on
MRAM technology trends. We perform a design space exploration and demonstrate
that our HPC version of MATSA can improve performance by 7.35x/6.15x/6.31x and
energy efficiency by 11.29x/4.21x/2.65x over server CPU, GPU and PNM
architectures, respectively
Recommended from our members
ENERGY EFFICIENCY EXPLORATION OF COARSE-GRAIN RECONFIGURABLE ARCHITECTURE WITH EMERGING NONVOLATILE MEMORY
With the rapid growth in consumer electronics, people expect thin, smart and powerful devices, e.g. Google Glass and other wearable devices. However, as portable electronic products become smaller, energy consumption becomes an issue that limits the development of portable systems due to battery lifetime. In general, simply reducing device size cannot fully address the energy issue.
To tackle this problem, we propose an on-chip interconnect infrastructure and pro- gram storage structure for a coarse-grained reconfigurable architecture (CGRA) with emerging non-volatile embedded memory (MRAM). The interconnect is composed of a matrix of time-multiplexed switchboxes which can be dynamically reconfigured with the goal of energy reduction. The number of processors performing computation can also be adapted. The use of MRAM provides access to high-density storage and lower memory energy consumption versus more standard SRAM technologies. The combination of CGRA, MRAM, and flexible on-chip interconnection is considered for signal processing. This application domain is of interest based on its time-varying computing demands.
To evaluate CGRA architectural features, prototype architectures have been pro- totyped in a field-programmable gate array (FPGA). Measurements of energy, power, instruction count, and execution time performance are considered for a scalable num- ber of processors. Applications such as adaptive Viterbi decoding and Reed Solomon coding are used for evaluation. To complete this thesis, a time-scheduled switchbox was integrated into our CGRA model. This model was prototyped on an FPGA. It is shown that energy consumption can be reduced by about 30% if dynamic design reconfiguration is performed
3次元型トランジスタを用いたLSIの設計法
Design technology of LSI such as system LSI ana memory using 3 dimensional transistors has been described. By using 3 dimensional transistors, FinFET, double gate transistor and stacked double gate transistor, pattern area of logic gate and full adder circuit can be reduced drastically compared with that with conventional planar transistor. By using double gate transistor and Carbon Nano Tube transistor the reconfigurable circuit with many logic functions can be realized with small pattern area. Furthermore, staked NAND MRAM with 3 dimensional spin transistor has been newly proposed. This stacked NAND MRAM is a promising candidate which replaces currently available DRAM and NAND flash memory.Design technology of LSI such as system LSI ana memory using 3 dimensional transistors has been described. By using 3 dimensional transistors, FinFET, double gate transistor and stacked double gate transistor, pattern area of logic gate and full adder circuit can be reduced drastically compared with that with conventional planar transistor. By using double gate transistor and Carbon Nano Tube transistor the reconfigurable circuit with many logic functions can be realized with small pattern area. Furthermore, staked NAND MRAM with 3 dimensional spin transistor has been newly proposed. This stacked NAND MRAM is a promising candidate which replaces currently available DRAM and NAND flash memory
Heterogeneous Reconfigurable Fabrics for In-circuit Training and Evaluation of Neuromorphic Architectures
A heterogeneous device technology reconfigurable logic fabric is proposed which leverages the cooperating advantages of distinct magnetic random access memory (MRAM)-based look-up tables (LUTs) to realize sequential logic circuits, along with conventional SRAM-based LUTs to realize combinational logic paths. The resulting Hybrid Spin/Charge FPGA (HSC-FPGA) using magnetic tunnel junction (MTJ) devices within this topology demonstrates commensurate reductions in area and power consumption over fabrics having LUTs constructed with either individual technology alone. Herein, a hierarchical top-down design approach is used to develop the HSCFPGA starting from the configurable logic block (CLB) and slice structures down to LUT circuits and the corresponding device fabrication paradigms. This facilitates a novel architectural approach to reduce leakage energy, minimize communication occurrence and energy cost by eliminating unnecessary data transfer, and support auto-tuning for resilience. Furthermore, HSC-FPGA enables new advantages of technology co-design which trades off alternative mappings between emerging devices and transistors at runtime by allowing dynamic remapping to adaptively leverage the intrinsic computing features of each device technology. HSC-FPGA offers a platform for fine-grained Logic-In-Memory architectures and runtime adaptive hardware. An orthogonal dimension of fabric heterogeneity is also non-determinism enabled by either low-voltage CMOS or probabilistic emerging devices. It can be realized using probabilistic devices within a reconfigurable network to blend deterministic and probabilistic computational models. Herein, consider the probabilistic spin logic p-bit device as a fabric element comprising a crossbar-structured weighted array. The Programmability of the resistive network interconnecting p-bit devices can be achieved by modifying the resistive states of the array\u27s weighted connections. Thus, the programmable weighted array forms a CLB-scale macro co-processing element with bitstream programmability. This allows field programmability for a wide range of classification problems and recognition tasks to allow fluid mappings of probabilistic and deterministic computing approaches. In particular, a Deep Belief Network (DBN) is implemented in the field using recurrent layers of co-processing elements to form an n x m1 x m2 x ::: x mi weighted array as a configurable hardware circuit with an n-input layer followed by i ≥ 1 hidden layers. As neuromorphic architectures using post-CMOS devices increase in capability and network size, the utility and benefits of reconfigurable fabrics of neuromorphic modules can be anticipated to continue to accelerate
Design of Adiabatic MTJ-CMOS Hybrid Circuits
Low-power designs are a necessity with the increasing demand of portable
devices which are battery operated. In many of such devices the operational
speed is not as important as battery life. Logic-in-memory structures using
nano-devices and adiabatic designs are two methods to reduce the static and
dynamic power consumption respectively. Magnetic tunnel junction (MTJ) is an
emerging technology which has many advantages when used in logic-in-memory
structures in conjunction with CMOS. In this paper, we introduce a novel
adiabatic hybrid MTJ/CMOS structure which is used to design AND/NAND, XOR/XNOR
and 1-bit full adder circuits. We simulate the designs using HSPICE with 32nm
CMOS technology and compared it with a non-adiabatic hybrid MTJ/CMOS circuits.
The proposed adiabatic MTJ/CMOS full adder design has more than 7 times lower
power consumtion compared to the previous MTJ/CMOS full adder
- …