978 research outputs found
Memory and information processing in neuromorphic systems
A striking difference between brain-inspired neuromorphic processors and
current von Neumann processors architectures is the way in which memory and
processing is organized. As Information and Communication Technologies continue
to address the need for increased computational power through the increase of
cores within a digital processor, neuromorphic engineers and scientists can
complement this need by building processor architectures where memory is
distributed with the processing. In this paper we present a survey of
brain-inspired processor architectures that support models of cortical networks
and deep neural networks. These architectures range from serial clocked
implementations of multi-neuron systems to massively parallel asynchronous ones
and from purely digital systems to mixed analog/digital systems which implement
more biological-like models of neurons and synapses together with a suite of
adaptation and learning mechanisms analogous to the ones found in biological
nervous systems. We describe the advantages of the different approaches being
pursued and present the challenges that need to be addressed for building
artificial neural processing systems that can display the richness of behaviors
seen in biological systems.Comment: Submitted to Proceedings of IEEE, review of recently proposed
neuromorphic computing platforms and system
Low-Power Heterogeneous Graphene Nanoribbon-CMOS Multistate Volatile Memory Circuit
Graphene is an emerging nanomaterial believed to be a potential candidate for post-Si nanoelectronics, due to its exotic properties. Recently, a new graphene nanoribbon crossbar (xGNR) device was proposed which exhibits negative differential resistance (NDR). In this paper, a multi-state memory design is presented that can store multiple bits in a single cell enabled by this xGNR device, called Graphene Nanoribbon Tunneling Random Access Memory (GNTRAM). An approach to increase the number of bits per cell is explored alternative to physical scaling to overcome CMOS SRAM limitations. A comprehensive design for quaternary GNTRAM is presented as a baseline, implemented with a heterogeneous integration between graphene and CMOS. Sources of leakage and approaches to mitigate them are investigated. This design is extensively benchmarked against 16nm CMOS SRAMs and 3T DRAM. The proposed quaternary cell shows up to 2.27x density benefit vs. 16nm CMOS SRAMs and 1.8x vs. 3T DRAM. It has comparable read performance and is power-efficient, up to 1.32x during active period and 818x during stand-by against high performance SRAMs. Multi-state GNTRAM has the potential to realize high-density low-power nanoscale embedded memories. Further improvements may be possible by using graphene more extensively, as graphene transistors become available in future
Accelerate & Actualize: Can 2D Materials Bridge the Gap Between Neuromorphic Hardware and the Human Brain?
Two-dimensional (2D) materials present an exciting opportunity for devices
and systems beyond the von Neumann computing architecture paradigm due to their
diversity of electronic structure, physical properties, and atomically-thin,
van der Waals structures that enable ease of integration with conventional
electronic materials and silicon-based hardware. All major classes of
non-volatile memory (NVM) devices have been demonstrated using 2D materials,
including their operation as synaptic devices for applications in neuromorphic
computing hardware. Their atomically-thin structure, superior physical
properties, i.e., mechanical strength, electrical and thermal conductivity, as
well as gate-tunable electronic properties provide performance advantages and
novel functionality in NVM devices and systems. However, device performance and
variability as compared to incumbent materials and technology remain major
concerns for real applications. Ultimately, the progress of 2D materials as a
novel class of electronic materials and specifically their application in the
area of neuromorphic electronics will depend on their scalable synthesis in
thin-film form with desired crystal quality, defect density, and phase purity.Comment: Neuromorphic Computing, 2D Materials, Heterostructures, Emerging
Memory Devices, Resistive, Phase-Change, Ferroelectric, Ferromagnetic,
Crossbar Array, Machine Learning, Deep Learning, Spiking Neural Network
A Survey on Low-Power Techniques with Emerging Technologies: From Devices to Systems
Nowadays, power consumption is one of the main limitations of electronic systems. In this context, novel and emerging devices provide us with new opportunities to keep the trend to low-power design. In this survey paper, we present a transversal survey on energy efficient techniques ranging from devices to architectures. The actual trends of device research, with fully-depleted planar devices, tri-gate geometries and gate-all-around structures, allows us to reach an increasingly higher level of performance while reducing the associated power. In addition, beyond the simple device properties enhancements, emerging devices also lead to innovations at circuit and architectural levels. In particular, devices whose properties can be tuned through additional terminals enable a fine and dynamic control of device threshold. They also enable designers to realize logic gates and to implement power-related techniques in a compact way unreachable to standard technologies. These innovations reduce the power consumption at the gate level and unlock new means of actuation in architectural solutions like adaptive voltage and frequency scaling
Memory Management for Emerging Memory Technologies
The Memory Wall, or the gap between CPU speed and main memory latency, is ever increasing. The latency of Dynamic Random-Access Memory (DRAM) is now of the order of hundreds of CPU cycles. Additionally, the DRAM main memory is experiencing power, performance and capacity constraints that limit process technology scaling. On the other hand, the workloads running on such systems are themselves changing due to virtualization and cloud computing demanding more performance of the data centers. Not only do these workloads have larger working set sizes, but they are also changing the way memory gets used, resulting in higher sharing and increased bandwidth demands. New Non-Volatile Memory technologies (NVM) are emerging as an answer to the current main memory issues.
This thesis looks at memory management issues as the emerging memory technologies get integrated into the memory hierarchy. We consider the problems at various levels in the memory hierarchy, including sharing of CPU LLC, traffic management to future non-volatile memories behind the LLC, and extending main memory through the employment of NVM.
The first solution we propose is “Adaptive Replacement and Insertion" (ARI), an adaptive approach to last-level CPU cache management, optimizing the cache miss rate and writeback rate simultaneously. Our specific focus is to reduce writebacks as much as possible while maintaining or improving miss rate relative to conventional LRU replacement policy, with minimal hardware overhead. ARI reduces writebacks on benchmarks from SPEC2006 suite on average by 32.9% while also decreasing misses on average by 4.7%. In a PCM based memory system, this decreases energy consumption by 23% compared to LRU and provides a 49% lifetime improvement beyond what is possible with randomized wear-leveling.
Our second proposal is “Variable-Timeslice Thread Scheduling" (VATS), an OS kernel-level approach to CPU cache sharing. With modern, large, last-level caches (LLC), the time to fill the LLC is greater than the OS scheduling window. As a result, when a thread aggressively thrashes the LLC by replacing much of the data in it, another thread may not be able to recover its working set before being rescheduled. We isolate the threads in time by increasing their allotted time quanta, and allowing larger periods of time between interfering threads. Our approach, compared to conventional scheduling, mitigates up to 100% of the performance loss caused by CPU LLC interference. The system throughput is boosted by up to 15%.
As an unconventional approach to utilizing emerging memory technologies, we present a Ternary Content-Addressable Memory (TCAM) design with Flash transistors. TCAM is successfully used in network routing but can also be utilized in the OS Virtual Memory applications. Based on our layout and circuit simulation experiments, we conclude that our FTCAM block achieves an area improvement of 7.9Ă— and a power improvement of 1.64Ă— compared to a CMOS approach.
In order to lower the cost of Main Memory in systems with huge memory demand, it is becoming practical to extend the DRAM in the system with the less-expensive NVMe Flash, for a much lower system cost. However, given the relatively high Flash devices access latency, naively using them as main memory leads to serious performance degradation. We propose OSVPP, a software-only, OS swap-based page prefetching scheme for managing such hybrid DRAM + NVM systems. We show that it is possible to gain about 50% of the lost performance due to swapping into the NVM and thus enable the utilization of such hybrid systems for memory-hungry applications, lowering the memory cost while keeping the performance comparable to the DRAM-only system
- …