10 research outputs found
Hybrid CMOS/memristor circuits
Abstract — This is a brief review of recent work on the prospective hybrid CMOS/memristor circuits. Such hybrids combine the flexibility, reliability and high functionality of the CMOS subsystem with very high density of nanoscale thin film resistance switching devices operating on different physical principles. Simulation and initial experimental results demonstrate that performance of CMOS/memristor circuits for several important applications is well beyond scaling limits of conventional VLSI paradigm. I
A Phase Change Memory and DRAM Based Framework For Energy-Efficient and High-Speed In-Memory Stochastic Computing
Convolutional Neural Networks (CNNs) have proven to be highly effective in various fields related to Artificial Intelligence (AI) and Machine Learning (ML). However, the significant computational and memory requirements of CNNs make their processing highly compute and memory-intensive. In particular, the multiply-accumulate (MAC) operation, which is a fundamental building block of CNNs, requires enormous arithmetic operations. As the input dataset size increases, the traditional processor-centric von-Neumann computing architecture becomes ill-suited for CNN-based applications. This results in exponentially higher latency and energy costs, making the processing of CNNs highly challenging.
To overcome these challenges, researchers have explored the Processing-In Memory (PIM) technique, which involves placing the processing unit inside or near the memory unit. This approach reduces data migration length and utilizes the internal memory bandwidth at the memory chip level. However, developing a reliable PIM-based system with minimal hardware modifications and design complexity remains a significant challenge.
The proposed solution in the report suggests utilizing different memory technologies, such as Dynamic RAM (DRAM) and phase change memory (PCM), with Stochastic arithmetic and minimal add-on logic. Stochastic computing is a technique that uses random numbers to perform arithmetic operations instead of traditional binary representation. This technique reduces hardware requirements for CNN\u27s arithmetic operations, making it possible to implement them with minimal add-on logic.
The report details the workflow for performing arithmetical operations used by CNNs, including MAC, activation, and floating-point functions. The proposed solution includes designs for scalable Stochastic Number Generator (SNG), DRAM CNN accelerator, non-volatile memory (NVM) class PCRAM-based CNN accelerator, and DRAM-based stochastic to binary conversion (StoB) for in-situ deep learning. These designs utilize stochastic computing to reduce the hardware requirements for CNN\u27s arithmetic operations and enable energy and time-efficient processing of CNNs.
The report also identifies future research directions for the proposed designs, including in-situ PCRAM-based SNG, ODIN (A Bit-Parallel Stochastic Arithmetic Based Accelerator for In-Situ Neural Network Processing in Phase Change RAM), ATRIA (Bit-Parallel Stochastic Arithmetic Based Accelerator for In-DRAM CNN Processing), and AGNI (In-Situ, Iso-Latency Stochastic-to-Binary Number Conversion for In-DRAM Deep Learning), and presents initial findings for these ideas.
In summary, the proposed solution in the report offers a comprehensive approach to address the challenges of processing CNNs, and the proposed designs have the potential to improve the energy and time efficiency of CNNs significantly. Using Stochastic Computing and different memory technologies enables the development of reliable PIM-based systems with minimal hardware modifications and design complexity, providing a promising path for the future of CNN-based applications
Abusing Hardware Race Conditions for High Throughput Energy Efficient Computation
We propose a novel computing approach, called “Race Logic”, which utilizes a new data representation to accelerate a broad class of optimization problems, such as those solved by dynamic programming algorithms. The core idea of Race Logic is to deliberately engineer race conditions in a circuit to perform useful computation. In Race Logic, information, instead of being represented as logic levels (as is done in conventional logic), is represented as a timing delay. Computations can then be performed by observing the relative propagation times of signals injected into a configurable circuit (i.e. the outcome of races through the circuit).In this dissertation I will introduce Race Based computation and talk about multiple VLSI implementations. We first begin by considering a synchronous approach, which uses simple clocked delay elements. Though this synchronous implementation outperforms highly optimized conventional implementations of the well-studied, DNA sequence alignment problem, its third order energy scaling with problem size and limited dynamic range of timing delays are its major pitfalls. Next, in the search for energy efficiency, we study asynchronous designs in order to understand the performance trade-offs and applicability of this new architecture. Finally, I will present the results of a prototype asynchronous Race Logic chip and demonstrate that Race-Based computations can align up to 10 million 50 symbol long DNA sequences per second, about 2-3 orders of magnitude faster than the state of the art general purpose computing systems
Memristores
Mestrado em Engenharia Eletrónica e TelecomunicaçõesThe memristor was proposed by Leon Chua in 1971 only for the sake of
mathematical complement, an idea that was not widely accepted by the
scientific community. Only decades later, after HP’s announcement in 2008 is
that the memristors started to be seen as realizable elements and not as mere
mathematical curiosities. These devices feature distinct characteristics from the
other known electronic devices. Besides being passive, they are characterized
by the following postulates: the existence of a characteristic voltage-current
loop with hysteresis and single valued in the origin, gradual decrease of the
area defined by the loop with the increasing of the frequency and simply
resistive behaviour for infinite frequency.
As a memristive device’s response depends greatly on the amplitude and
frequency characteristics of the input signal and its own internal characteristics.
Therefore there is a clear need to find procedures and attributes that allow to
classify and categorize various memristive devices. These attributes, in their
essence, similar to the figures of merit of devices like diodes and transistors,
will allow in the near future to better choose memristive devices for specific
applications. To try to obtain these attributes, a morphologic analysis of the
voltage-current loops’ area and length of several theoretical memristive devices
models was made in MATLAB changing its internal characteristics, for arrays of
frequency and amplitude values of the input signal. Afterwards, a memristor
device emulator was built to corroborate the theoretical results obtained. To this
end the voltage-current loops for several input values were measured and the
calculation of the loops’ areas and lengths was effectuated.O memristor foi proposto por Leon Chua em 1971 apenas por uma questão de
complemento matemático, uma ideia que não teve grande aceitação na
comunidade científica. Só décadas mais tarde, depois do anúncio da HP em
2008 é que os memristors começaram a ser vistos como elementos realizáveis
e não como meras curiosidades matemáticas. Estes dispositivos apresentam
características distintas dos demais dispositivos eletrónicos conhecidos. Além
de serem elementos passivos, são caracterizados pelos seguintes postulados:
existência de uma curva característica tensão-corrente com histerese e valor
único na origem, diminuição gradual da área definida por esta curva com o
aumento da frequência e comportamento puramente resistivo do memristor
quando a frequência tende para infinito.
A resposta dos dispositivos memristivos depende bastante das características
de amplitude e frequência do sinal de entrada e das suas próprias
características internas. Por isso, há uma clara necessidade de descobrir
procedimentos e atributos que permitam classificar e categorizar diferentes
dispositivos memristivos. Estes atributos, na sua essência, semelhantes às
figuras de mérito de dispositivos como díodos ou transístores, permitirão num
futuro próximo selecionar dispositivos memristivos para aplicações específicas.
Para tentar obter estes atributos, realizou-se uma análise morfológica da área
e comprimento das curvas tensão-corrente de vários modelos teóricos de
dispositivos memristivos em MATLAB variando as suas características
internas, para conjuntos de valores de frequência e amplitude do sinal de
entrada. De seguida construiu-se um emulador de um dispositivo memristivo
para corroborar os resultados teóricos obtidos. Para tal mediram-se as curvas
de tensão-corrente para vários valores de entrada e efetuou-se o cálculo das
áreas e comprimentos dessas curvas
CMOL FPGA circuits
Abstract — This paper describes an architecture of FPGAlike fabric for future hybrid “CMOL ” circuits. Such circuits will combine a semiconductor-transistor (CMOS) stack and a two-level nanowire crossbar with molecular-scale two-terminal nanodevices (programmable diodes) formed at each crosspoint. We have developed a custom set of tools for CMOL FPGA circuit design automation, and used it for the evaluation of performance of these circuits for the Toronto 20 benchmark set, so far without optimization of several parameters including the power supply voltage, nanowire pitch and maximum NOR fan-in. The results show that even without such optimization, CMOL FPGA circuits may provide a density advantage of more than two orders of magnitude over the traditional CMOS FPGA with the same CMOS design rules, at comparable time delay, acceptable power consumption, and potentially high defect tolerance. I