Search CORE

285 research outputs found

Towards the Teraflop CFD

Author: Schreiber Robert
Simon Horst D.
Publication venue
Publication date
Field of study

We are surveying current projects in the area of parallel supercomputers. The machines considered here will become commercially available in the 1990 - 1992 time frame. All are suitable for exploring the critical issues in applying parallel processors to large scale scientific computations, in particular CFD calculations. This chapter presents an overview of the surveyed machines, and a detailed analysis of the various architectural and technology approaches taken. Particular emphasis is placed on the feasibility of a Teraflops capability following the paths proposed by various developers

NASA Technical Reports Server

An Analog VLSI Deep Machine Learning Implementation

Author: Lu Junjie
Publication venue: TRACE: Tennessee Research and Creative Exchange
Publication date: 01/05/2014
Field of study

Machine learning systems provide automated data processing and see a wide range of applications. Direct processing of raw high-dimensional data such as images and video by machine learning systems is impractical both due to prohibitive power consumption and the “curse of dimensionality,” which makes learning tasks exponentially more difficult as dimension increases. Deep machine learning (DML) mimics the hierarchical presentation of information in the human brain to achieve robust automated feature extraction, reducing the dimension of such data. However, the computational complexity of DML systems limits large-scale implementations in standard digital computers. Custom analog signal processing (ASP) can yield much higher energy efficiency than digital signal processing (DSP), presenting means of overcoming these limitations. The purpose of this work is to develop an analog implementation of DML system. First, an analog memory is proposed as an essential component of the learning systems. It uses the charge trapped on the floating gate to store analog value in a non-volatile way. The memory is compatible with standard digital CMOS process and allows random-accessible bi-directional updates without the need for on-chip charge pump or high voltage switch. Second, architecture and circuits are developed to realize an online k-means clustering algorithm in analog signal processing. It achieves automatic recognition of underlying data pattern and online extraction of data statistical parameters. This unsupervised learning system constitutes the computation node in the deep machine learning hierarchy. Third, a 3-layer, 7-node analog deep machine learning engine is designed featuring online unsupervised trainability and non-volatile floating-gate analog storage. It utilizes massively parallel reconfigurable current-mode analog architecture to realize efficient computation. And algorithm-level feedback is leveraged to provide robustness to circuit imperfections in analog signal processing. At a processing speed of 8300 input vectors per second, it achieves 1×1012 operation per second per Watt of peak energy efficiency. In addition, an ultra-low-power tunable bump circuit is presented to provide similarity measures in analog signal processing. It incorporates a novel wide-input-range tunable pseudo-differential transconductor. The circuit demonstrates tunability of bump center, width and height with a power consumption significantly lower than previous works

University of Tennessee, Knoxville: Trace

Neuro-fuzzy chip to handle complex tasks with analog performance

Author: Navas-Gonzalez Rafael Jesus
Rodríguez Vázquez Angel
Vidal-Verdu Fernando
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2003
Field of study

This paper presents a mixed-signal neuro-fuzzy controller chip which, in terms of power consumption, input–output delay, and precision, performs as a fully analog implementation. However, it has much larger complexity than its purely analog counterparts. This combination of performance and complexity is achieved through the use of a mixed-signal architecture consisting of a programmable analog core of reduced complexity, and a strategy, and the associated mixed-signal circuitry, to cover the whole input space through the dynamic programming of this core. Since errors and delays are proportional to the reduced number of fuzzy rules included in the analog core, they are much smaller than in the case where the whole rule set is implemented by analog circuitry. Also, the area and the power consumption of the new architecture are smaller than those of its purely analog counterparts simply because most rules are implemented through programming. The Paper presents a set of building blocks associated to this architecture, and gives results for an exemplary prototype. This prototype, called multiplexing fuzzy controller (MFCON), has been realized in a CMOS 0.7 um standard technology. It has two inputs, implements 64 rules, and features 500 ns of input to output delay with 16-mW of power consumption. Results from the chip in a control application with a dc motor are also provided

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Repositorio Institucional Universidad de Málaga

Neuro-fuzzy chip to handle complex tasks with analog performance

Author: Navas González Rafael
Rodríguez Vázquez Ángel Benito
Vidal Verdú Fernando
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2003
Field of study

This Paper presents a mixed-signal neuro-fuzzy controller chip which, in terms of power consumption, input-output delay and precision performs as a fully analog implementation. However, it has much larger complexity than its purely analog counterparts. This combination of performance and complexity is achieved through the use of a mixed-signal architecture consisting of a programmable analog core of reduced complexity, and a strategy, and the associated mixed-signal circuitry, to cover the whole input space through the dynamic programming of this core [1]. Since errors and delays are proportional to the reduced number of fuzzy rules included in the analog core, they are much smaller than in the case where the whole rule set is implemented by analog circuitry. Also, the area and the power consumption of the new architecture are smaller than those of its purely analog counterparts simply because most rules are implemented through programming. The Paper presents a set of building blocks associated to this architecture, and gives results for an exemplary prototype. This prototype, called MFCON, has been realized in a CMOS 0.7μm standard technology. It has two inputs, implements 64 rules and features 500ns of input to output delay with 16mW of power consumption. Results from the chip in a control application with a DC motor are also provided

idUS. Depósito de Investigación Universidad de Sevilla

ENERGY-EFFICIENT AND SECURE HARDWARE FOR INTERNET OF THINGS (IoT) DEVICES

Author: Selvakumaran Dinesh Kumar
Publication venue: UKnowledge
Publication date: 01/01/2018
Field of study

Internet of Things (IoT) is a network of devices that are connected through the Internet to exchange the data for intelligent applications. Though IoT devices provide several advantages to improve the quality of life, they also present challenges related to security. The security issues related to IoT devices include leakage of information through Differential Power Analysis (DPA) based side channel attacks, authentication, piracy, etc. DPA is a type of side-channel attack where the attacker monitors the power consumption of the device to guess the secret key stored in it. There are several countermeasures to overcome DPA attacks. However, most of the existing countermeasures consume high power which makes them not suitable to implement in power constraint devices. IoT devices are battery operated, hence it is important to investigate the methods to design energy-efficient and secure IoT devices not susceptible to DPA attacks. In this research, we have explored the usefulness of a novel computing platform called adiabatic logic, low-leakage FinFET devices and Magnetic Tunnel Junction (MTJ) Logic-in-Memory (LiM) architecture to design energy-efficient and DPA secure hardware. Further, we have also explored the usefulness of adiabatic logic in the design of energy-efficient and reliable Physically Unclonable Function (PUF) circuits to overcome the authentication and piracy issues in IoT devices. Adiabatic logic is a low-power circuit design technique to design energy-efficient hardware. Adiabatic logic has reduced dynamic switching energy loss due to the recycling of charge to the power clock. As the first contribution of this dissertation, we have proposed a novel DPA-resistant adiabatic logic family called Energy-Efficient Secure Positive Feedback Adiabatic Logic (EE-SPFAL). EE-SPFAL based circuits are energy-efficient compared to the conventional CMOS based design because of recycling the charge after every clock cycle. Further, EE-SPFAL based circuits consume uniform power irrespective of input data transition which makes them resilience against DPA attacks. Scaling of CMOS transistors have served the industry for more than 50 years in providing integrated circuits that are denser, and cheaper along with its high performance, and low power. However, scaling of the transistors leads to increase in leakage current. Increase in leakage current reduces the energy-efficiency of the computing circuits,and increases their vulnerability to DPA attack. Hence, it is important to investigate the crypto circuits in low leakage devices such as FinFET to make them energy-efficient and DPA resistant. In this dissertation, we have proposed a novel FinFET based Secure Adiabatic Logic (FinSAL) family. FinSAL based designs utilize the low-leakage FinFET device along with adiabatic logic principles to improve energy-efficiency along with its resistance against DPA attack. Recently, Magnetic Tunnel Junction (MTJ)/CMOS based Logic-in-Memory (LiM) circuits have been explored to design low-power non-volatile hardware. Some of the advantages of MTJ device include non-volatility, near-zero leakage power, high integration density and easy compatibility with CMOS devices. However, the differences in power consumption between the switching of MTJ devices increase the vulnerability of Differential Power Analysis (DPA) based side-channel attack. Further, the MTJ/CMOS hybrid logic circuits which require frequent switching of MTJs are not very energy-efficient due to the significant energy required to switch the MTJ devices. In the third contribution of this dissertation, we have investigated a novel approach of building cryptographic hardware in MTJ/CMOS circuits using Look-Up Table (LUT) based method where the data stored in MTJs are constant during the entire encryption/decryption operation. Currently, high supply voltage is required in both writing and sensing operations of hybrid MTJ/CMOS based LiM circuits which consumes a considerable amount of energy. In order to meet the power budget in low-power devices, it is important to investigate the novel design techniques to design ultra-low-power MTJ/CMOS circuits. In the fourth contribution of this dissertation, we have proposed a novel energy-efficient Secure MTJ/CMOS Logic (SMCL) family. The proposed SMCL logic family consumes uniform power irrespective of data transition in MTJ and more energy-efficient compared to the state-of-art MTJ/ CMOS designs by using charge sharing technique. The other important contribution of this dissertation is the design of reliable Physical Unclonable Function (PUF). Physically Unclonable Function (PUF) are circuits which are used to generate secret keys to avoid the piracy and device authentication problems. However, existing PUFs consume high power and they suffer from the problem of generating unreliable bits. This dissertation have addressed this issue in PUFs by designing a novel adiabatic logic based PUF. The time ramp voltages in adiabatic PUF is utilized to improve the reliability of the PUF along with its energy-efficiency. Reliability of the adiabatic logic based PUF proposed in this dissertation is tested through simulation based temperature variations and supply voltage variations

University of Kentucky

Statistical characterization, analysis and modeling of speed performance in digital standard cell designs subject to process variations

Author: Mastrandrea Antonio
Publication venue
Publication date: 10/03/2014
Field of study

Pubblicazioni Aperte Digitali Interateneo Sapienza

Archivio della ricerca- Università di Roma La Sapienza

A Novel Boost Converter Based LED Driver Chip Targeting Mobile Applications

Author
Publication venue
Publication date: 01/01/2016
Field of study

abstract: A novel integrated constant current LED driver design on a single chip is developed in this dissertation. The entire design consists of two sections. The first section is a DC-DC switching regulator (boost regulator) as the frontend power supply; the second section is the constant current LED driver system. In the first section, a pulse width modulated (PWM) peak current mode boost regulator is utilized. The overall boost regulator system and its related sub-cells are explained. Among them, an original error amplifier design, a current sensing circuit and slope compensation circuit are presented. In the second section – the focus of this dissertation – a highly accurate constant current LED driver system design is unveiled. The detailed description of this highly accurate LED driver system and its related sub-cells are presented. A hybrid PWM and linear current modulation scheme to adjust the LED driver output currents is explained. The novel design ideas to improve the LED current accuracy and channel-to-channel output current mismatch are also explained in detail. These ideas include a novel LED driver system architecture utilizing 1) a dynamic current mirror structure and 2) a closed loop structure to keep the feedback loop of the LED driver active all the time during both PWM on-duty and PWM off-duty periods. Inside the LED driver structure, the driving amplifier with a novel slew rate enhancement circuit to dramatically accelerate its response time is also presented.Dissertation/ThesisDoctoral Dissertation Electrical Engineering 201

ASU Digital Repository

A Sub-nW 2.4 GHz Transmitter for Low Data-Rate Sensing Applications

Author: Bandyopadhyay Saurav
Chandrakasan Anantha P.
Lysaght Andrew Christopher
Mercier Patrick Philip
Stankovic Konstantina M.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/03/2014
Field of study

This paper presents the design of a narrowband transmitter and antenna system that achieves an average power consumption of 78 pW when operating at a duty-cycled data rate of 1 bps. Fabricated in a 0.18 μm CMOS process, the transmitter employs a direct-RF power oscillator topology where a loop antenna acts as a both a radiative and resonant element. The low-complexity single-stage architecture, in combination with aggressive power gating techniques and sizing optimizations, limited the standby power of the transmitter to only 39.7 pW at 0.8 V. Supporting both OOK and FSK modulations at 2.4 GHz, the transmitter consumed as low as 38 pJ/bit at an active-mode data rate of 5 Mbps. The loop antenna and integrated diodes were also used as part of a wireless power transfer receiver in order to kick-start the system power supply prior to energy harvesting operation.Semiconductor Research Corporation. Interconnect Focus CenterSemiconductor Research Corporation. C2S2 Focus CenterNational Institutes of Health (U.S.) (Grant K08 DC010419)National Institutes of Health (U.S.) (Grant T32 DC00038)Bertarelli Foundatio

DSpace@MIT

PubMed Central

eScholarship - University of California

Improving the Performance and Energy Efficiency of GPGPU Computing through Adaptive Cache and Memory Management Techniques

Author: Kim Kyu Yeun
Publication venue: Graduate School of UNIST
Publication date: 01/02/2020
Field of study

Department of Computer Science and EngineeringAs the performance and energy efficiency requirement of GPGPUs have risen, memory management techniques of GPGPUs have improved to meet the requirements by employing hardware caches and utilizing heterogeneous memory. These techniques can improve GPGPUs by providing lower latency and higher bandwidth of the memory. However, these methods do not always guarantee improved performance and energy efficiency due to the small cache size and heterogeneity of the memory nodes. While prior works have proposed various techniques to address this issue, relatively little work has been done to investigate holistic support for memory management techniques. In this dissertation, we analyze performance pathologies and propose various techniques to improve memory management techniques. First, we investigate the effectiveness of advanced cache indexing (ACI) for high-performance and energy-efficient GPGPU computing. Specifically, we discuss the designs of various static and adaptive cache indexing schemes and present implementation for GPGPUs. We then quantify and analyze the effectiveness of the ACI schemes based on a cycle-accurate GPGPU simulator. Our quantitative evaluation shows that ACI schemes achieve significant performance and energy-efficiency gains over baseline conventional indexing scheme. We also analyze the performance sensitivity of ACI to key architectural parameters (i.e., capacity, associativity, and ICN bandwidth) and the cache indexing latency. We also demonstrate that ACI continues to achieve high performance in various settings. Second, we propose IACM, integrated adaptive cache management for high-performance and energy-efficient GPGPU computing. Based on the performance pathology analysis of GPGPUs, we integrate state-of-the-art adaptive cache management techniques (i.e., cache indexing, bypassing, and warp limiting) in a unified architectural framework to eliminate performance pathologies. Our quantitative evaluation demonstrates that IACM significantly improves the performance and energy efficiency of various GPGPU workloads over the baseline architecture (i.e., 98.1% and 61.9% on average, respectively) and achieves considerably higher performance than the state-of-the-art technique (i.e., 361.4% at maximum and 7.7% on average). Furthermore, IACM delivers significant performance and energy efficiency gains over the baseline GPGPU architecture even when enhanced with advanced architectural technologies (e.g., higher capacity, associativity). Third, we propose bandwidth- and latency-aware page placement (BLPP) for GPGPUs with heterogeneous memory. BLPP analyzes the characteristics of a application and determines the optimal page allocation ratio between the GPU and CPU memory. Based on the optimal page allocation ratio, BLPP dynamically allocate pages across the heterogeneous memory nodes. Our experimental results show that BLPP considerably outperforms the baseline and state-of-the-art technique (i.e., 13.4% and 16.7%) and performs similar to the static-best version (i.e., 1.2% difference), which requires extensive offline profiling.clos

ScholarWorks@UNIST