5 research outputs found

    An Analog VLSI Deep Machine Learning Implementation

    Get PDF
    Machine learning systems provide automated data processing and see a wide range of applications. Direct processing of raw high-dimensional data such as images and video by machine learning systems is impractical both due to prohibitive power consumption and the “curse of dimensionality,” which makes learning tasks exponentially more difficult as dimension increases. Deep machine learning (DML) mimics the hierarchical presentation of information in the human brain to achieve robust automated feature extraction, reducing the dimension of such data. However, the computational complexity of DML systems limits large-scale implementations in standard digital computers. Custom analog signal processing (ASP) can yield much higher energy efficiency than digital signal processing (DSP), presenting means of overcoming these limitations. The purpose of this work is to develop an analog implementation of DML system. First, an analog memory is proposed as an essential component of the learning systems. It uses the charge trapped on the floating gate to store analog value in a non-volatile way. The memory is compatible with standard digital CMOS process and allows random-accessible bi-directional updates without the need for on-chip charge pump or high voltage switch. Second, architecture and circuits are developed to realize an online k-means clustering algorithm in analog signal processing. It achieves automatic recognition of underlying data pattern and online extraction of data statistical parameters. This unsupervised learning system constitutes the computation node in the deep machine learning hierarchy. Third, a 3-layer, 7-node analog deep machine learning engine is designed featuring online unsupervised trainability and non-volatile floating-gate analog storage. It utilizes massively parallel reconfigurable current-mode analog architecture to realize efficient computation. And algorithm-level feedback is leveraged to provide robustness to circuit imperfections in analog signal processing. At a processing speed of 8300 input vectors per second, it achieves 1×1012 operation per second per Watt of peak energy efficiency. In addition, an ultra-low-power tunable bump circuit is presented to provide similarity measures in analog signal processing. It incorporates a novel wide-input-range tunable pseudo-differential transconductor. The circuit demonstrates tunability of bump center, width and height with a power consumption significantly lower than previous works

    Architecting Memory Systems for Emerging Technologies

    Full text link
    The advance of traditional dynamic random access memory (DRAM) technology has slowed down, while the capacity and performance needs of memory system have continued to increase. This is a result of increasing data volume from emerging applications, such as machine learning and big data analytics. In addition to such demands, increasing energy consumption is becoming a major constraint on the capabilities of computer systems. As a result, emerging non-volatile memories, for example, Spin Torque Transfer Magnetic RAM (STT-MRAM), and new memory interfaces, for example, High Bandwidth Memory (HBM), have been developed as an alternative. Thus far, most previous studies have retained a DRAM-like memory architecture and management policy. This preserves compatibility but hides the true benefits of those new memory technologies. In this research, we proposed the co-design of memory architectures and their management policies for emerging technologies. First, we introduced a new memory architecture for an STT-MRAM main memory. In particular, we defined a new page mode operation for efficient activation and sensing. By fully exploiting the non-destructive nature of STT- MRAM, our design achieved higher performance, lower energy consumption, and a smaller area than the traditional designs. Second, we developed a cost-effective technique to improve load balancing for HBM memory channels. We showed that the proposed technique was capable of efficiently redistributing memory requests across multiple memory channels to improve the channel utilization, resulting in improved performance.PHDElectrical EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/145988/1/bcoh_1.pd

    MRAM commercialization potential evaluation Research Based on the Chinese Market

    Get PDF
    Regarding the Chinese data storage industries, there is an urgent need for a national strategy and new investments as there are new technologies emerging in the global markets. The storage technology commercialization activities are becoming a widespread concern for the Chinese government and their strategic enterprises. The promotion of storage technology commercialization has become a common goal for enterprise and national government strategies. How the potential for commercialization of a storage technology can be assessed, what evaluation index should be used, and what the factors affect the storage technology are the important issues that must be addressed

    MODELING AND BENCHMARKING OF SPINTRONIC DEVICES AND THEIR APPLICATIONS

    Get PDF
    Spintronic devices are promising candidates for low power applications such as logic and memory due to the characteristics of non-volatility, scalability, and fast switching speed. To evaluate the array-level performances of various spintronic memory devices, we have benchmarked spin-transfer torque magnetorestrictive random-access memory (STT-MRAM), spin-orbit torque MRAM (SOT-MRAM), voltage-controlled exchange coupling MRAM (VCEC-MRAM), and magnetoelectric MRAM (ME-MRAM). Among them, electric-field driven devices such as magnetoelectric (ME) device and the VCEC-MRAM can eliminate the joule heating energy thus is potentially more energy efficient than the current-controlled devices. Bismuth ferrite (BiFeO3) is a multiferroic material with the properties of ferroelectricity, antiferromagnetism, and weak ferromagnetism at room temperature. By combining BiFeO3 with a ferromagnet such as CoFe to form a BiFeO3/CoFe heterojunction, one can manipulate the magnetic state of CoFe by applying an external electric field. However, the switching mechanisms of the ferroelectric and the magnetic order of the BiFeO3 and CoFe are less understood which limits the estimation of the delay time and the write energy of the ME device. To evaluate the potential performance of this voltage-controlled BFO/CoFe heterojunction device in memory or logic application, we present a unified micromagnetic/ferroelectric simulation framework that can model the transient response and the switching behaviors of both BFO and CoFe layers. In addition, the important material parameters such as the interface exchange coupling coefficient are extracted from the experiments. Next, we build a physics-based compact model of the BFO/CoFe heterojunction to simulate the ME device in the circuit level. The results from our compact model closely match very well with those from the micromagnetic models when simulating the magnetization dynamics of BFO and CoFe. Using the compact model we developed, the SPICE simulation shows that ME-MRAM can potentially operate with a lower write energy compared to the STT-MRAM, SOT-MRAM or even SRAM when the coercive voltage of the BFO layer is as small as 20mV. Last, we model and benchmark the read and write performances of SOT-MRAM using various SOT materials including heavy metals, alloys, Weyl semi-metals, and topological insulators. The non-ideal factors such as current-shunting effect, current crowding effect, and the variability are included. Our results indicate that spintronic memory devices are prospective candidates in the embedded memory application due to the better energy efficiency and smaller layout area compared to SRAM.Ph.D
    corecore