924 research outputs found
펄스 기반 피드 포워드 이퀄라이저를 갖춘 고용량 DRAM을 위한 컨트롤러 PHY 설계
학위논문 (박사) -- 서울대학교 대학원 : 공과대학 전기·정보공학부, 2020. 8. 김수환.A controller PHY for managed DRAM solution, which is a new memory structure to maximize capacity while minimizing refresh power, is presented. Inter-symbol interference is critical in such a high-capacity DRAM interface in which many DRAM chips share a command/address (C/A) channel. A pulse-based feed-forward equalizer (PB-FFE) is introduced to reduce ISI on a C/A channel. The controller PHY supports all the training sequences specified in the DDR4 standard. A glitch-free DCDL is also adopted to perform link training efficiently and to reduce training time.
The DQ transmitter adopts quarter-rate architecture to reduce output latency. For the quarter-rate transmitters in DQ, we propose a quadrature error corrector (QEC), in which clock signal phase errors are corrected using two replicas of the 4:1 serializer of the output stage. Pulse shrinking is used to compare and equalize the outputs of these two replica serializers.
A controller PHY was fabricated in 55nm CMOS. The PB-FFE increases the timing margin from 0.23UI to 0.29UI at 1067Mbps. At 2133Mbps, the read timing and voltage margins are 0.53UI and 211mV after read training, and the write margins are 0.72UI and 230mV after write training.
To validate the QEC effectiveness, a prototype quarter-rate transmitter, including the QEC, was fabricated to another chip in 65nm CMOS. Adopting our QEC, the experimental results show that the output phase errors of the transmitter are reduced to a residual error of 0.8ps, and the output eye width and height are improved by 84% and 61%, respectively, at a data-rate of 12.8Gbps.본 연구에서 용량을 최대화하면서도 리프레시 전력을 최소화할 수 있는 새로운 메모리 구조인 관리형 DRAM 솔루션을 위한 컨트롤러 PHY를 제시하였다. 이와 같은 고용량 DRAM 인터페이스에서는 많은 DRAM 칩이 명령 / 주소 (C/A) 채널을 공유하고 있어서 심볼 간 간섭이 발생한다. 본 연구에서는 이러한 C/A 채널에서의 심볼 간 간섭을 줄이기 위해 펄스 기반 피드 포워드 이퀄라이저 (PB-FFE)를 채택하였다. 또한 본 연구의 컨트롤러 PHY는 DDR4 표준에 지정된 모든 트레이닝 시퀀스를 지원한다. 링크 트레이닝을 효율적으로 수행하고 트레이닝 시간을 줄이기 위해 글리치가 발생하지 않는 디지털 제어 지연 라인 (DCDL)을 채택하였다.
컨트롤러 PHY의 DQ 송신기는 출력 대기 시간을 줄이기 위해 쿼터 레이트 구조를 채택하였다. 쿼터 레이트 송신기의 경우에는 직교 클럭 간 위상 오류가 출력 신호의 무결성에 영향을 주게 된다. 이러한 영향을 최소화하기 위해 본 연구에서는 출력 단의 4 : 1 직렬 변환기의 두 복제본을 사용하여 클록 신호 위상 오류를 수정하는 QEC (Quadrature Error Corrector)를 제안하였다. 복제된 2개의 직렬 변환기의 출력을 비교하고 균등화하기 위해 펄스 수축 지연 라인이 사용되었다.
컨트롤러 PHY는 55nm CMOS 공정으로 제조되었다. PB-FFE는 1067Mbps에서 C/A 채널 타이밍 마진을 0.23UI에서 0.29UI로 증가시킨다. 읽기 트레이닝 후 읽기 타이밍 및 전압 마진은 2133Mbps에서 0.53UI 및 211mV이고, 쓰기 트레이닝 후 쓰기 마진은 0.72UI 및 230mV이다.
QEC의 효과를 검증하기 위해 QEC를 포함한 프로토 타입 쿼터 레이트 송신기를 65nm CMOS의 다른 칩으로 제작하였다. QEC를 적용한 실험 결과, 송신기의 출력 위상 오류가 0.8ps의 잔류 오류로 감소하고, 출력 데이터 눈의 폭과 높이가 12.8Gbps의 데이터 속도에서 각각 84 %와 61 % 개선되었음을 보여준다.CHAPTER 1 INTRODUCTION 1
1.1 MOTIVATION 1
1.1.1 HEAVY LOAD C/A CHANNEL 5
1.1.2 QUARTER-RATE ARCHITECTURE IN DQ TRANSMITTER 7
1.1.3 SUMMARY 8
1.2 THESIS ORGANIZATION 10
CHAPTER 2 ARCHITECTURE 11
2.1 MDS DIMM STRUCTURE 11
2.2 MDS CONTROLLER 15
2.3 MDS CONTROLLER PHY 17
2.3.1 INITIALIZATION SEQUENCE 20
2.3.2 LINK TRAINING FINITE-STATE MACHINE 23
2.3.3 POWER DOWN MODE 28
CHAPTER 3 PULSE-BASED FEED-FORWARD EQUALIZER 29
3.1 COMMAND/ADDRESS CHANNEL 29
3.2 COMMAND/ADDRESS TRANSMITTER 33
3.3 PULSE-BASED FEED-FORWARD EQUALIZER 35
CHAPTER 4 CIRCUIT IMPLEMENTATION 39
4.1 BUILDING BLOCKS 39
4.1.1 ALL-DIGITAL PHASE-LOCKED LOOP (ADPLL) 39
4.1.2 ALL-DIGITAL DELAY-LOCKED LOOP (ADDLL) 44
4.1.3 GLITCH-FREE DCDL CONTROL 47
4.1.4 DUTY-CYCLE CORRECTOR (DCC) 50
4.1.5 DQ/DQS TRANSMITTER 52
4.1.6 DQ/DQS RECEIVER 54
4.1.7 ZQ CALIBRATION 56
4.2 MODELING AND VERIFICATION OF LINK TRAINING 59
4.3 BUILT-IN SELF-TEST CIRCUITS 66
CHAPTER 5 QUADRATURE ERROR CORRECTOR USING REPLICA SERIALIZERS AND PULSE-SHRINKING DELAY LINES 69
5.1 PHASE CORRECTION USING REPLICA SERIALIZERS AND PULSE-SHRINKING UNITS 69
5.2 OVERALL QEC ARCHITECTURE AND ITS OPERATION 71
5.3 FINE DELAY UNIT IN THE PSDL 76
CHAPTER 6 EXPERIMENTAL RESULTS 78
6.1 CONTROLLER PHY 78
6.2 PROTOTYPE QEC 88
CHAPTER 7 CONCLUSION 94
BIBLIOGRAPHY 96Docto
Solutions and application areas of flip-flop metastability
PhD ThesisThe state space of every continuous multi-stable system is bound to contain one or more
metastable regions where the net attraction to the stable states can be infinitely-small.
Flip-flops are among these systems and can take an unbounded amount of time to decide
which logic state to settle to once they become metastable. This problematic behavior is
often prevented by placing the setup and hold time conditions on the flip-flop’s input.
However, in applications such as clock domain crossing where these constraints cannot
be placed flip-flops can become metastable and induce catastrophic failures. These
events are fundamentally impossible to prevent but their probability can be significantly
reduced by employing synchronizer circuits. The latter grant flip-flops longer decision
time at the expense of introducing latency in processing the synchronized input.
This thesis presents a collection of research work involving the phenomenon of
flip-flop metastability in digital systems. The main contributions include three novel
solutions for the problem of synchronization. Two of these solutions are speculative
methods that rely on duplicate state machines to pre-compute data-dependent states
ahead of the completion of synchronization. Speculation is a core theme of this thesis
and is investigated in terms of its functional correctness, cost efficacy and fitness for
being automated by electronic design automation tools. It is shown that speculation
can outperform conventional synchronization solutions in practical terms and is a viable
option for future technologies. The third solution attempts to address the problem of
synchronization in the more-specific context of variable supply voltages. Finally, the
thesis also identifies a novel application of metastability as a means of quantifying
intra-chip physical parameters. A digital sensor is proposed based on the sensitivity
of metastable flip-flops to changes in their environmental parameters and is shown to
have better precision while being more compact than conventional digital sensors
Driving the Network-on-Chip Revolution to Remove the Interconnect Bottleneck in Nanoscale Multi-Processor Systems-on-Chip
The sustained demand for faster, more powerful chips has been met by the
availability of chip manufacturing processes allowing for the integration of increasing
numbers of computation units onto a single die. The resulting outcome,
especially in the embedded domain, has often been called SYSTEM-ON-CHIP
(SoC) or MULTI-PROCESSOR SYSTEM-ON-CHIP (MP-SoC).
MPSoC design brings to the foreground a large number of challenges, one of
the most prominent of which is the design of the chip interconnection. With a
number of on-chip blocks presently ranging in the tens, and quickly approaching
the hundreds, the novel issue of how to best provide on-chip communication
resources is clearly felt.
NETWORKS-ON-CHIPS (NoCs) are the most comprehensive and scalable
answer to this design concern. By bringing large-scale networking concepts to
the on-chip domain, they guarantee a structured answer to present and future
communication requirements. The point-to-point connection and packet switching
paradigms they involve are also of great help in minimizing wiring overhead
and physical routing issues. However, as with any technology of recent inception,
NoC design is still an evolving discipline. Several main areas of interest
require deep investigation for NoCs to become viable solutions:
• The design of the NoC architecture needs to strike the best tradeoff among
performance, features and the tight area and power constraints of the onchip
domain.
• Simulation and verification infrastructure must be put in place to explore,
validate and optimize the NoC performance.
• NoCs offer a huge design space, thanks to their extreme customizability in
terms of topology and architectural parameters. Design tools are needed
to prune this space and pick the best solutions.
• Even more so given their global, distributed nature, it is essential to evaluate
the physical implementation of NoCs to evaluate their suitability for
next-generation designs and their area and power costs.
This dissertation performs a design space exploration of network-on-chip architectures,
in order to point-out the trade-offs associated with the design of
each individual network building blocks and with the design of network topology
overall. The design space exploration is preceded by a comparative analysis
of state-of-the-art interconnect fabrics with themselves and with early networkon-
chip prototypes. The ultimate objective is to point out the key advantages
that NoC realizations provide with respect to state-of-the-art communication
infrastructures and to point out the challenges that lie ahead in order to make
this new interconnect technology come true. Among these latter, technologyrelated
challenges are emerging that call for dedicated design techniques at all
levels of the design hierarchy. In particular, leakage power dissipation, containment
of process variations and of their effects. The achievement of the above
objectives was enabled by means of a NoC simulation environment for cycleaccurate
modelling and simulation and by means of a back-end facility for the
study of NoC physical implementation effects. Overall, all the results provided
by this work have been validated on actual silicon layout
Design of variation-tolerant synchronizers for multiple clock and voltage domains
PhD ThesisParametric variability increasingly affects the performance of electronic circuits as
the fabrication technology has reached the level of 32nm and beyond. These
parameters may include transistor Process parameters (such as threshold
voltage), supply Voltage and Temperature (PVT), all of which could have a
significant impact on the speed and power consumption of the circuit, particularly
if the variations exceed the design margins. As systems are designed with more
asynchronous protocols, there is a need for highly robust synchronizers and
arbiters. These components are often used as interfaces between communication
links of different timing domains as well as sampling devices for asynchronous
inputs coming from external components. These applications have created a need
for new robust designs of synchronizers and arbiters that can tolerate process,
voltage and temperature variations.
The aim of this study was to investigate how synchronizers and arbiters should be
designed to tolerate parametric variations. All investigations focused mainly on
circuit-level and transistor level designs and were modeled and simulated in the
UMC90nm CMOS technology process. Analog simulations were used to measure
timing parameters and power consumption along with a “Monte Carlo” statistical
analysis to account for process variations.
Two main components of synchronizers and arbiters were primarily investigated:
flip-flop and mutual-exclusion element (MUTEX). Both components can violate the
input timing conditions, setup and hold window times, which could cause
metastability inside their bistable elements and possibly end in failures. The
mean-time between failures is an important reliability feature of any synchronizer
delay through the synchronizer.
The MUTEX study focused on the classical circuit, in addition to a number of
tolerance, based on increasing internal gain by adding current sources, reducing
the capacitive loading, boosting the transconductance of the latch, compensating
the existing Miller capacitance, and adding asymmetry to maneuver the metastable
point. The results showed that some circuits had little or almost no improvements,
while five techniques showed significant improvements by reducing τ and
maintaining high tolerance.
Three design approaches are proposed to provide variation-tolerant
synchronizers. wagging synchronizer proposed to First, the is significantly
increase reliability over that of the conventional two flip-flop synchronizer. The
robustness of the wagging technique can be enhanced by using robust τ latches or
adding one more cycle of synchronization. The second approach is the
Metastability Auto-Detection and Correction (MADAC) latch which relies on swiftly
detecting a metastable event and correcting it by enforcing the previously stored
logic value. This technique significantly reduces the resolution time down from
uncertain
synchronization technique is proposed to transfer signals between Multiple-
Voltage Multiple-Clock Domains (MVD/MCD) that do not require conventional
level-shifters between the domains or multiple power supplies within each
domain. This interface circuit uses a synchronous set and feedback reset protocol
which provides level-shifting and synchronization of all signals between the
domains, from a wide range of voltage-supplies and clock frequencies.
Overall, synchronizer circuits can tolerate variations to a greater extent by
employing the wagging technique or using a MADAC latch, while MUTEX tolerance
can suffice with small circuit modifications. Communication between MVD/MCD
can be achieved by an asynchronous handshake
without a need for adding level-shifters.The Saudi Arabian Embassy in London,
Umm Al-Qura University, Saudi Arabi
Simulation Studies of Digital Filters for the Phase-II Upgrade of the Liquid-Argon Calorimeters of the ATLAS Detector at the High-Luminosity LHC
Am Large Hadron Collider und am ATLAS-Detektor werden umfangreiche Aufrüstungsarbeiten vorgenommen. Diese Arbeiten sind in mehrere Phasen gegliedert und umfassen unter Anderem Änderungen an der Ausleseelektronik der Flüssigargonkalorimeter; insbesondere ist es geplant, während der letzten Phase ihren Primärpfad vollständig auszutauschen. Die Elektronik besteht aus einem analogen und einem digitalen Teil: während ersterer die Signalpulse verstärkt und sie zur leichteren Abtastung verformt, führt letzterer einen Algorithmus zur Energierekonstruktion aus. Beide Teile müssen während der Aufrüstung verbessert werden, damit der Detektor interessante Kollisionsereignisse präzise rekonstruieren und uninteressante effizient verwerfen kann.
In dieser Dissertation werden Simulationsstudien präsentiert, die sowohl die analoge als auch die digitale Auslese der Flüssigargonkalorimeter optimieren. Die Korrektheit der Simulation wird mithilfe von Kalibrationsdaten geprüft, die im sog. Run 2 des ATLAS-Detektors aufgenommen worden sind. Der Einfluss verschiedener Parameter der Signalverformung auf die Energieauflösung wird analysiert und die Nützlichkeit einer erhöhten Abtastrate von 80 MHz untersucht. Des Weiteren gibt diese Arbeit eine Übersicht über lineare und nichtlineare Energierekonstruktionsalgorithmen. Schließlich wird eine Auswahl von ihnen hinsichtlich ihrer Leistungsfähigkeit miteinander verglichen.
Es wird gezeigt, dass ein Erhöhen der Ordnung des Optimalfilters, der gegenwärtig verwendete Algorithmus, die Energieauflösung um 2 bis 3 % verbessern kann, und zwar in allen Regionen des Detektors. Der Wiener Filter mit Vorwärtskorrektur, ein nichtlinearer Algorithmus, verbessert sie um bis zu 10 % in einigen Regionen, verschlechtert sie aber in anderen. Ein Zusammenhang dieses Verhaltens mit der Wahrscheinlichkeit fälschlich detektierter Kalorimetertreffer wird aufgezeigt und mögliche Lösungen werden diskutiert.:1 Introduction
2 An Overview of High-Energy Particle Physics
2.1 The Standard Model of Particle Physics
2.2 Verification of the Standard Model
2.3 Beyond the Standard Model
3 LHC, ATLAS, and the Liquid-Argon Calorimeters
3.1 The Large Hadron Collider
3.2 The ATLAS Detector
3.3 The ATLAS Liquid-Argon Calorimeters
4 Upgrades to the ATLAS Liquid-Argon Calorimeters
4.1 Physics Goals
4.2 Phase-I Upgrade
4.3 Phase-II Upgrade
5 Noise Suppression With Digital Filters
5.1 Terminology
5.2 Digital Filters
5.3 Wiener Filter
5.4 Matched Wiener Filter
5.5 Matched Wiener Filter Without Bias
5.6 Timing Reconstruction, Optimal Filtering, and Selection Criteria
5.7 Forward Correction
5.8 Sparse Signal Restoration
5.9 Artificial Neural Networks
6 Simulation of the ATLAS Liquid-Argon Calorimeter Readout Electronics
6.1 AREUS
6.2 Hit Generation and Sampling
6.3 Pulse Shapes
6.4 Thermal Noise
6.5 Quantization
6.6 Digital Filters
6.7 Statistical Analysis
7 Results of the Readout Electronics Simulation Studies
7.1 Statistical Treatment
7.2 Simulation Verification Using Run-2 Data
7.3 Dependence of the Noise on the Shaping Time
7.4 The Analog Readout Electronics and the ADC
7.5 The Optimal Filter (OF)
7.6 The Wiener Filter
7.7 The Wiener Filter with Forward Correction (WFFC)
7.8 Final Comparison and Conclusions
8 Conclusions and Outlook
AppendicesThe Large Hadron Collider and the ATLAS detector are undergoing a comprehensive upgrade split into multiple phases. This effort also affects the liquid-argon calorimeters, whose main readout electronics will be replaced completely during the final phase. The electronics consist of an analog and a digital portion: the former amplifies the signal and shapes it to facilitate sampling, the latter executes an energy reconstruction algorithm. Both must be improved during the upgrade so that the detector may accurately reconstruct interesting collision events and efficiently suppress uninteresting ones.
In this thesis, simulation studies are presented that optimize both the analog and the digital readout of the liquid-argon calorimeters. The simulation is verified using calibration data that has been measured during Run 2 of the ATLAS detector. The influence of several parameters of the analog shaping stage on the energy resolution is analyzed and the utility of an increased signal sampling rate of 80 MHz is investigated. Furthermore, a number of linear and non-linear energy reconstruction algorithms is reviewed and the performance of a selection of them is compared.
It is demonstrated that increasing the order of the Optimal Filter, the algorithm currently in use, improves energy resolution by 2 to 3 % in all detector regions. The Wiener filter with forward correction, a non-linear algorithm, gives an improvement of up to 10 % in some regions, but degrades the resolution in others. A link between this behavior and the probability of falsely detected calorimeter hits is shown and possible solutions are discussed.:1 Introduction
2 An Overview of High-Energy Particle Physics
2.1 The Standard Model of Particle Physics
2.2 Verification of the Standard Model
2.3 Beyond the Standard Model
3 LHC, ATLAS, and the Liquid-Argon Calorimeters
3.1 The Large Hadron Collider
3.2 The ATLAS Detector
3.3 The ATLAS Liquid-Argon Calorimeters
4 Upgrades to the ATLAS Liquid-Argon Calorimeters
4.1 Physics Goals
4.2 Phase-I Upgrade
4.3 Phase-II Upgrade
5 Noise Suppression With Digital Filters
5.1 Terminology
5.2 Digital Filters
5.3 Wiener Filter
5.4 Matched Wiener Filter
5.5 Matched Wiener Filter Without Bias
5.6 Timing Reconstruction, Optimal Filtering, and Selection Criteria
5.7 Forward Correction
5.8 Sparse Signal Restoration
5.9 Artificial Neural Networks
6 Simulation of the ATLAS Liquid-Argon Calorimeter Readout Electronics
6.1 AREUS
6.2 Hit Generation and Sampling
6.3 Pulse Shapes
6.4 Thermal Noise
6.5 Quantization
6.6 Digital Filters
6.7 Statistical Analysis
7 Results of the Readout Electronics Simulation Studies
7.1 Statistical Treatment
7.2 Simulation Verification Using Run-2 Data
7.3 Dependence of the Noise on the Shaping Time
7.4 The Analog Readout Electronics and the ADC
7.5 The Optimal Filter (OF)
7.6 The Wiener Filter
7.7 The Wiener Filter with Forward Correction (WFFC)
7.8 Final Comparison and Conclusions
8 Conclusions and Outlook
Appendice
Design and Validation of Network-on-Chip Architectures for the Next Generation of Multi-synchronous, Reliable, and Reconfigurable Embedded Systems
NETWORK-ON-CHIP (NoC) design is today at a crossroad. On one hand, the
design principles to efficiently implement interconnection networks in the
resource-constrained on-chip setting have stabilized. On the other hand,
the requirements on embedded system design are far from stabilizing. Embedded
systems are composed by assembling together heterogeneous components featuring
differentiated operating speeds and ad-hoc counter measures must be adopted
to bridge frequency domains. Moreover, an unmistakable trend toward enhanced
reconfigurability is clearly underway due to the increasing complexity of applications.
At the same time, the technology effect is manyfold since it provides unprecedented
levels of system integration but it also brings new severe constraints
to the forefront: power budget restrictions, overheating concerns, circuit delay and
power variability, permanent fault, increased probability of transient faults.
Supporting different degrees of reconfigurability and flexibility in the parallel
hardware platform cannot be however achieved with the incremental evolution of
current design techniques, but requires a disruptive approach and a major increase
in complexity. In addition, new reliability challenges cannot be solved by using
traditional fault tolerance techniques alone but the reliability approach must be
also part of the overall reconfiguration methodology.
In this thesis we take on the challenge of engineering a NoC architectures for
the next generation systems and we provide design methods able to overcome the
conventional way of implementing multi-synchronous, reliable and reconfigurable
NoC. Our analysis is not only limited to research novel approaches to the specific
challenges of the NoC architecture but we also co-design the solutions in a single
integrated framework. Interdependencies between different NoC features are
detected ahead of time and we finally avoid the engineering of highly optimized solutions
to specific problems that however coexist inefficiently together in the final
NoC architecture. To conclude, a silicon implementation by means of a testchip
tape-out and a prototype on a FPGA board validate the feasibility and effectivenes
Low power digital baseband core for wireless Micro-Neural-Interface using CMOS sub/near-threshold circuit
This thesis presents the work on designing and implementing a low power digital baseband core with custom-tailored protocol for wirelessly powered Micro-Neural-Interface (MNI) System-on-Chip (SoC) to be implanted within the skull to record cortical neural activities. The core, on the tag end of distributed sensors, is designed to control the operation of individual MNI and communicate and control MNI devices implanted across the brain using received downlink commands from external base station and store/dump targeted neural data uplink in an energy efficient manner. The application specific protocol defines three modes (Time Stamp Mode, Streaming Mode and Snippet Mode) to extract neural signals with on-chip signal conditioning and discrimination. In Time Stamp Mode, Streaming Mode and Snippet Mode, the core executes basic on-chip spike discrimination and compression, real-time monitoring and segment capturing of neural signals so single spike timing as well as inter-spike timing can be retrieved with high temporal and spatial resolution. To implement the core control logic using sub/near-threshold logic, a novel digital design methodology is proposed which considers INWE (Inverse-Narrow-Width-Effect), RSCE (Reverse-Short-Channel-Effect) and variation comprehensively to size the transistor width and length accordingly to achieve close-to-optimum digital circuits. Ultra-low-power cell library containing 67 cells including physical cells and decoupling capacitor cells using the optimum fingers is designed, laid-out, characterized, and abstracted. A robust on-chip sense-amp-less SRAM memory (8X32 size) for storing neural data is implemented using 8T topology and LVT fingers. The design is validated with silicon tapeout and measurement shows the digital baseband core works at 400mV and 1.28 MHz system clock with an average power consumption of 2.2 μW, resulting in highest reported communication power efficiency of 290Kbps/μW to date
Techniques of Energy-Efficient VLSI Chip Design for High-Performance Computing
How to implement quality computing with the limited power budget is the key factor to move very large scale integration (VLSI) chip design forward. This work introduces various techniques of low power VLSI design used for state of art computing. From the viewpoint of power supply, conventional in-chip voltage regulators based on analog blocks bring the large overhead of both power and area to computational chips. Motivated by this, a digital based switchable pin method to dynamically regulate power at low circuit cost has been proposed to make computing to be executed with a stable voltage supply. For one of the widely used and time consuming arithmetic units, multiplier, its operation in logarithmic domain shows an advantageous performance compared to that in binary domain considering computation latency, power and area. However, the introduced conversion error reduces the reliability of the following computation (e.g. multiplication and division.). In this work, a fast calibration method suppressing the conversion error and its VLSI implementation are proposed. The proposed logarithmic converter can be supplied by dc power to achieve fast conversion and clocked power to reduce the power dissipated during conversion. Going out of traditional computation methods and widely used static logic, neuron-like cell is also studied in this work. Using multiple input floating gate (MIFG) metal-oxide semiconductor field-effect transistor (MOSFET) based logic, a 32-bit, 16-operation arithmetic logic unit (ALU) with zipped decoding and a feedback loop is designed. The proposed ALU can reduce the switching power and has a strong driven-in capability due to coupling capacitors compared to static logic based ALU. Besides, recent neural computations bring serious challenges to digital VLSI implementation due to overload matrix multiplications and non-linear functions. An analog VLSI design which is compatible to external digital environment is proposed for the network of long short-term memory (LSTM). The entire analog based network computes much faster and has higher energy efficiency than the digital one
- …