12 research outputs found

    Full-System Simulation of Mobile CPU/GPU Platforms

    Get PDF
    Graphics Processing Units (GPUs) critically rely on a complex system software stack comprising kernel- and userspace drivers and Just-in-time (JIT) compilers. Yet, existing GPU simulators typically abstract away details of the software stack and GPU instruction set. Partly, this is because GPU vendors rarely release sufficient information about their latest GPU products. However, this is also due to the lack of an integrated CPU/GPU simulation framework, which is complete and powerful enough to drive the complex GPU software environment. This has led to a situation where research on GPU architectures and compilers is largely based on outdated or greatly simplified architectures and software stacks, undermining the validity of the generated results. In this paper we develop a full-system system simulation environment for a mobile platform, which enables users to run a complete and unmodified software stack for a state-of-the-art mobile Arm CPU and Mali-G71 GPU powered device. We validate our simulator against a hardware implementation and Arm’s stand-alone GPU simulator, achieving 100% architectural accuracy across all available toolchains. We demonstrate the capability of our GPU simulation framework by optimizing an advanced Computer Vision application using simulated statistics unavailable with other simulation approaches or physical GPU implementations. We demonstrate that performance optimizations for desktop GPUs trigger bottlenecks on mobile GPUs, and show the importance of efficient memory use.Postprin

    Design of a Programmable Passive SoC for Biomedical Applications Using RFID ISO 15693/NFC5 Interface

    Get PDF
    Low power, low cost inductively powered passive biotelemetry system involving fully customized RFID/NFC interface base SoC has gained popularity in the last decades. However, most of the SoCs developed are application specific and lacks either on-chip computational or sensor readout capability. In this paper, we present design details of a programmable passive SoC in compliance with ISO 15693/NFC5 standard for biomedical applications. The integrated system consists of a 32-bit microcontroller, a sensor readout circuit, a 12-bit SAR type ADC, 16 kB RAM, 16 kB ROM and other digital peripherals. The design is implemented in a 0.18 μ m CMOS technology and used a die area of 1.52 mm × 3.24 mm. The simulated maximum power consumption of the analog block is 592 μ W. The number of external components required by the SoC is limited to an external memory device, sensors, antenna and some passive components. The external memory device contains the application specific firmware. Based on the application, the firmware can be modified accordingly. The SoC design is suitable for medical implants to measure physiological parameters like temperature, pressure or ECG. As an application example, the authors have proposed a bioimplant to measure arterial blood pressure for patients suffering from Peripheral Artery Disease (PAD)

    Techniques of Energy-Efficient VLSI Chip Design for High-Performance Computing

    Get PDF
    How to implement quality computing with the limited power budget is the key factor to move very large scale integration (VLSI) chip design forward. This work introduces various techniques of low power VLSI design used for state of art computing. From the viewpoint of power supply, conventional in-chip voltage regulators based on analog blocks bring the large overhead of both power and area to computational chips. Motivated by this, a digital based switchable pin method to dynamically regulate power at low circuit cost has been proposed to make computing to be executed with a stable voltage supply. For one of the widely used and time consuming arithmetic units, multiplier, its operation in logarithmic domain shows an advantageous performance compared to that in binary domain considering computation latency, power and area. However, the introduced conversion error reduces the reliability of the following computation (e.g. multiplication and division.). In this work, a fast calibration method suppressing the conversion error and its VLSI implementation are proposed. The proposed logarithmic converter can be supplied by dc power to achieve fast conversion and clocked power to reduce the power dissipated during conversion. Going out of traditional computation methods and widely used static logic, neuron-like cell is also studied in this work. Using multiple input floating gate (MIFG) metal-oxide semiconductor field-effect transistor (MOSFET) based logic, a 32-bit, 16-operation arithmetic logic unit (ALU) with zipped decoding and a feedback loop is designed. The proposed ALU can reduce the switching power and has a strong driven-in capability due to coupling capacitors compared to static logic based ALU. Besides, recent neural computations bring serious challenges to digital VLSI implementation due to overload matrix multiplications and non-linear functions. An analog VLSI design which is compatible to external digital environment is proposed for the network of long short-term memory (LSTM). The entire analog based network computes much faster and has higher energy efficiency than the digital one

    A novel low-temperature growth method of silicon structures and application in flash memory.

    Get PDF
    Flash memories are solid-state non-volatile memories. They play a vital role especially in information storage in a wide range of consumer electronic devices and applications including smart phones, digital cameras, laptop computers, and satellite navigators. The demand for high density flash has surged as a result of the proliferation of these consumer electronic portable gadgets and the more features they offer – wireless internet, touch screen, video capabilities. The increase in the density of flash memory devices over the years has come as a result of continuous memory cell-size reduction. This size scaling is however approaching a dead end and it is widely agreed that further reduction beyond the 20 nm technological node is going to be very difficult, as it would result to challenges such as cross-talk or cell-to-cell interference, a high statistical variation in the number of stored electrons in the floating gate and high leakage currents due to thinner tunnel oxides. Because of these challenges a wide range of solutions in form of materials and device architectures are being investigated. Among them is three-dimensional (3-D) flash, which is widely acclaimed as the ideal solution, as they promise the integration of long-time retention and ultra-high density cells without compromising device reliability. However, current high temperature (>600 °C) growth techniques of the Polycrystalline silicon floating gate material are incompatible with 3-D flash memory; with vertically stacked memory layers, which require process temperatures to be ≤ 400 °C. There already exist some low temperature techniques for producing polycrystalline silicon such as laser annealing, solid-phase crystallization of amorphous silicon and metal-induced crystallization. However, these have some short-comings which make them not suitable for use in 3-D flash memory, e.g. the high furnace annealing temperatures (700 °C) in solid-phase crystallization of amorphous silicon which could potentially damage underlying memory layers in 3-D flash, and the metal contaminants in metal-induced crystallization which is a potential source of high leakage currents. There is therefore a need for alternative low temperature techniques that would be most suitable for flash memory purposes. With reference to the above, the main objective of this research was to develop a novel low temperature method for growing silicon structures at ≤ 400 °C. This thesis thus describes the development of a low-temperature method for polycrystalline silicon growth and the application of the technique in a capacitor-like flash memory device. It has been demonstrated that silicon structures with polycrystalline silicon-like properties can be grown at ≤ 400 °C in a 13.56 MHz radio frequency (RF) plasma-enhanced chemical vapour deposition (PECVD) reactor with the aid of Nickel Formate Dihydrate (NFD). It is also shown that the NFD coated on the substrates, thermally decomposes in-situ during the deposition process forming Ni particles that act as nucleation and growth sites of polycrystalline silicon. Silicon films grown by this technique and without annealing, have exhibited optical band gaps of ~ 1.2 eV compared to 1.78 eV for films grown under identical conditions but without the substrate being coated. These values were determined from UV-Vis spectroscopy and Tauc plots. These optical band gaps correspond to polycrystalline silicon and amorphous silicon respectively, meaning that the films grown on NFD-coated substrates are polycrystalline silicon while those grown on uncoated substrates remain amorphous. Moreover, this novel technique has been used to fabricate a capacitor-like flash memory that has exhibited hysteresis width corresponding to charge storage density in the order of 1012 cm-2 with a retention time well above 20 days for a device with silicon films grown at 300 °C. Films grown on uncoated films have not exhibit any significant hysteresis, and thus no flash memory-like behaviour. Given that all process temperatures throughout the fabrication of the devices are less than 400 °C and that no annealing of any sort was done on the material and devices, this growth method is thermal budget efficient and meets the crucial process temperature requirements of 3-D flash memory. Furthermore, the technique is glass compatible, which could prove a major step towards the acquisition of flash memory-integrated systems on glass, as well as other applications requiring low temperature polycrystalline silicon

    Recent Advances in Embedded Computing, Intelligence and Applications

    Get PDF
    The latest proliferation of Internet of Things deployments and edge computing combined with artificial intelligence has led to new exciting application scenarios, where embedded digital devices are essential enablers. Moreover, new powerful and efficient devices are appearing to cope with workloads formerly reserved for the cloud, such as deep learning. These devices allow processing close to where data are generated, avoiding bottlenecks due to communication limitations. The efficient integration of hardware, software and artificial intelligence capabilities deployed in real sensing contexts empowers the edge intelligence paradigm, which will ultimately contribute to the fostering of the offloading processing functionalities to the edge. In this Special Issue, researchers have contributed nine peer-reviewed papers covering a wide range of topics in the area of edge intelligence. Among them are hardware-accelerated implementations of deep neural networks, IoT platforms for extreme edge computing, neuro-evolvable and neuromorphic machine learning, and embedded recommender systems

    Real-Time High-Resolution Multiple-Camera Depth Map Estimation Hardware and Its Applications

    Get PDF
    Depth information is used in a variety of 3D based signal processing applications such as autonomous navigation of robots and driving systems, object detection and tracking, computer games, 3D television, and free view-point synthesis. These applications require high accuracy and speed performances for depth estimation. Depth maps can be generated using disparity estimation methods, which are obtained from stereo matching between multiple images. The computational complexity of disparity estimation algorithms and the need of large size and bandwidth for the external and internal memory make the real-time processing of disparity estimation challenging, especially for high resolution images. This thesis proposes a high-resolution high-quality multiple-camera depth map estimation hardware. The proposed hardware is verified in real-time with a complete system from the initial image capture to the display and applications. The details of the complete system are presented. The proposed binocular and trinocular adaptive window size disparity estimation algorithms are carefully designed to be suitable to real-time hardware implementation by allowing efficient parallel and local processing while providing high-quality results. The proposed binocular and trinocular disparity estimation hardware implementations can process 55 frames per second on a Virtex-7 FPGA at a 1024 x 768 XGA video resolution for a 128 pixel disparity range. The proposed binocular disparity estimation hardware provides best quality compared to existing real-time high-resolution disparity estimation hardware implementations. A novel compressed-look up table based rectification algorithm and its real-time hardware implementation are presented. The low-complexity decompression process of the rectification hardware utilizes a negligible amount of LUT and DFF resources of the FPGA while it does not require the existence of external memory. The first real-time high-resolution free viewpoint synthesis hardware utilizing three-camera disparity estimation is presented. The proposed hardware generates high-quality free viewpoint video in real-time for any horizontally aligned arbitrary camera positioned between the leftmost and rightmost physical cameras. The full embedded system of the depth estimation is explained. The presented embedded system transfers disparity results together with synchronized RGB pixels to the PC for application development. Several real-time applications are developed on a PC using the obtained RGB+D results. The implemented depth estimation based real-time software applications are: depth based image thresholding, speed and distance measurement, head-hands-shoulders tracking, virtual mouse using hand tracking and face tracking integrated with free viewpoint synthesis. The proposed binocular disparity estimation hardware is implemented in an ASIC. The ASIC implementation of disparity estimation imposes additional constraints with respect to the FPGA implementation. These restrictions, their implemented efficient solutions and the ASIC implementation results are presented. In addition, a very high-resolution (82.3 MP) 360°x90° omnidirectional multiple camera system is proposed. The hemispherical camera system is able to view the target locations close to horizontal plane with more than two cameras. Therefore, it can be used in high-resolution 360° depth map estimation and its applications in the future

    ENERGY-AWARE OPTIMIZATION FOR EMBEDDED SYSTEMS WITH CHIP MULTIPROCESSOR AND PHASE-CHANGE MEMORY

    Get PDF
    Over the last two decades, functions of the embedded systems have evolved from simple real-time control and monitoring to more complicated services. Embedded systems equipped with powerful chips can provide the performance that computationally demanding information processing applications need. However, due to the power issue, the easy way to gain increasing performance by scaling up chip frequencies is no longer feasible. Recently, low-power architecture designs have been the main trend in embedded system designs. In this dissertation, we present our approaches to attack the energy-related issues in embedded system designs, such as thermal issues in the 3D chip multiprocessor (CMP), the endurance issue in the phase-change memory(PCM), the battery issue in the embedded system designs, the impact of inaccurate information in embedded system, and the cloud computing to move the workload to remote cloud computing facilities. We propose a real-time constrained task scheduling method to reduce peak temperature on a 3D CMP, including an online 3D CMP temperature prediction model and a set of algorithm for scheduling tasks to different cores in order to minimize the peak temperature on chip. To address the challenging issues in applying PCM in embedded systems, we propose a PCM main memory optimization mechanism through the utilization of the scratch pad memory (SPM). Furthermore, we propose an MLC/SLC configuration optimization algorithm to enhance the efficiency of the hybrid DRAM + PCM memory. We also propose an energy-aware task scheduling algorithm for parallel computing in mobile systems powered by batteries. When scheduling tasks in embedded systems, we make the scheduling decisions based on information, such as estimated execution time of tasks. Therefore, we design an evaluation method for impacts of inaccurate information on the resource allocation in embedded systems. Finally, in order to move workload from embedded systems to remote cloud computing facility, we present a resource optimization mechanism in heterogeneous federated multi-cloud systems. And we also propose two online dynamic algorithms for resource allocation and task scheduling. We consider the resource contention in the task scheduling

    Safe Intelligent Driver Assistance System in V2X Communication Environments based on IoT

    Get PDF
    In the modern world, power and speed of cars have increased steadily, as traffic continued to increase. At the same time highway-related fatalities and injuries due to road incidents are constantly growing and safety problems come first. Therefore, the development of Driver Assistance Systems (DAS) has become a major issue. Numerous innovations, systems and technologies have been developed in order to improve road transportation and safety. Modern computer vision algorithms enable cars to understand the road environment with low miss rates. A number of Intelligent Transportation Systems (ITSs), Vehicle Ad-Hoc Networks (VANETs) have been applied in the different cities over the world. Recently, a new global paradigm, known as the Internet of Things (IoT) brings new idea to update the existing solutions. Vehicle-to-Infrastructure communication based on IoT technologies would be a next step in intelligent transportation for the future Internet-of-Vehicles (IoV). The overall purpose of this research was to come up with a scalable IoT solution for driver assistance, which allows to combine safety relevant information for a driver from different types of in-vehicle sensors, in-vehicle DAS, vehicle networks and driver`s gadgets. This study brushed up on the evolution and state-of-the-art of Vehicle Systems. Existing ITSs, VANETs and DASs were evaluated in the research. The study proposed a design approach for the future development of transport systems applying IoT paradigm to the transport safety applications in order to enable driver assistance become part of Internet of Vehicles (IoV). The research proposed the architecture of the Safe Intelligent DAS (SiDAS) based on IoT V2X communications in order to combine different types of data from different available devices and vehicle systems. The research proposed IoT ARM structure for SiDAS, data flow diagrams, protocols. The study proposes several IoT system structures for the vehicle-pedestrian and vehicle-vehicle collision prediction as case studies for the flexible SiDAS framework architecture. The research has demonstrated the significant increase in driver situation awareness by using IoT SiDAS, especially in NLOS conditions. Moreover, the time analysis, taking into account IoT, Cloud, LTE and DSRS latency, has been provided for different collision scenarios, in order to evaluate the overall system latency and ensure applicability for real-time driver emergency notification. Experimental results demonstrate that the proposed SiDAS improves traffic safety

    Social Control Experience Design:A Cross-Domain Investigation on Media

    Get PDF

    Social Control Experience Design:A Cross-Domain Investigation on Media

    Get PDF
    corecore