1,462 research outputs found

    A Survey of Techniques For Improving Energy Efficiency in Embedded Computing Systems

    Full text link
    Recent technological advances have greatly improved the performance and features of embedded systems. With the number of just mobile devices now reaching nearly equal to the population of earth, embedded systems have truly become ubiquitous. These trends, however, have also made the task of managing their power consumption extremely challenging. In recent years, several techniques have been proposed to address this issue. In this paper, we survey the techniques for managing power consumption of embedded systems. We discuss the need of power management and provide a classification of the techniques on several important parameters to highlight their similarities and differences. This paper is intended to help the researchers and application-developers in gaining insights into the working of power management techniques and designing even more efficient high-performance embedded systems of tomorrow

    The "MIND" Scalable PIM Architecture

    Get PDF
    MIND (Memory, Intelligence, and Network Device) is an advanced parallel computer architecture for high performance computing and scalable embedded processing. It is a Processor-in-Memory (PIM) architecture integrating both DRAM bit cells and CMOS logic devices on the same silicon die. MIND is multicore with multiple memory/processor nodes on each chip and supports global shared memory across systems of MIND components. MIND is distinguished from other PIM architectures in that it incorporates mechanisms for efficient support of a global parallel execution model based on the semantics of message-driven multithreaded split-transaction processing. MIND is designed to operate either in conjunction with other conventional microprocessors or in standalone arrays of like devices. It also incorporates mechanisms for fault tolerance, real time execution, and active power management. This paper describes the major elements and operational methods of the MIND architecture

    Time-Shared Execution of Realtime Computer Vision Pipelines by Dynamic Partial Reconfiguration

    Full text link
    This paper presents an FPGA runtime framework that demonstrates the feasibility of using dynamic partial reconfiguration (DPR) for time-sharing an FPGA by multiple realtime computer vision pipelines. The presented time-sharing runtime framework manages an FPGA fabric that can be round-robin time-shared by different pipelines at the time scale of individual frames. In this new use-case, the challenge is to achieve useful performance despite high reconfiguration time. The paper describes the basic runtime support as well as four optimizations necessary to achieve realtime performance given the limitations of DPR on today's FPGAs. The paper provides a characterization of a working runtime framework prototype on a Xilinx ZC706 development board. The paper also reports the performance of realtime computer vision pipelines when time-shared

    Design Of Neural Network Circuit Inside High Speed Camera Using Analog CMOS 0.35 ¼m Technology

    Get PDF
    Analog VLSI on-chip learning Neural Networks represent a mature technology for a large number of applications involving industrial as well as consumer appliances. This is particularly the case when low power consumption, small size and/or very high speed are required. This approach exploits the computational features of Neural Networks, the implementation efficiency of analog VLSI circuits and the adaptation capabilities of the on-chip learning feedback schema. High-speed video cameras are powerful tools for investigating for instance the biomechanics analysis or the movements of mechanical parts in manufacturing processes. In the past years, the use of CMOS sensors instead of CCDs has enabled the development of high-speed video cameras offering digital outputs , readout flexibility, and lower manufacturing costs. In this paper, we propose a high-speed smart camera based on a CMOS sensor with embedded Analog Neural Network

    High-Efficient Parallel CAVLC Encoders on Heterogeneous Multicore Architectures

    Get PDF
    This article presents two high-efficient parallel realizations of the context-based adaptive variable length coding (CAVLC) based on heterogeneous multicore processors. By optimizing the architecture of the CAVLC encoder, three kinds of dependences are eliminated or weaken, including the context-based data dependence, the memory accessing dependence and the control dependence. The CAVLC pipeline is divided into three stages: two scans, coding, and lag packing, and be implemented on two typical heterogeneous multicore architectures. One is a block-based SIMD parallel CAVLC encoder on multicore stream processor STORM. The other is a component-oriented SIMT parallel encoder on massively parallel architecture GPU. Both of them exploited rich data-level parallelism. Experiments results show that compared with the CPU version, more than 70 times of speedup can be obtained for STORM and over 50 times for GPU. The implementation of encoder on STORM can make a real-time processing for 1080p @30fps and GPU-based version can satisfy the requirements for 720p real-time encoding. The throughput of the presented CAVLC encoders is more than 10 times higher than that of published software encoders on DSP and multicore platforms

    Design and Implementation of an Embedded Vision System for Industrial Robots

    Get PDF
    Enhanced vision system attached to any industrial robot undoubtedly increases its accuracy and precision. Today, vision in industrial robots is facilitated almost exclusively via external or fixed cameras which may cause rather inconvenience due to obstacles in the line of sight. In this master thesis project, an Embedded Vision System is designed and developed to stream live video of a robot's focus point while being attached to its arm/actuator. Within the scope of this work, a working prototype has been achieved which is capable of producing live video stream on 640 X 480 resolution with a frame rate slightly above 9 FPS, having 256 colors for each pixel, displayable on a regular LCD display monitor. The system has been realized in a Spartan 6 platform and an Aptina image sensor has been used to acquire pixel information. I2C interfacing has been used to program the image sensor, data transfers have been facilitated by DMA cores, an off-chip DDR2 memory has been used for frame buffer and HDMI has been used as video out. Feasibility of adding Ethernet transmission capability has also been investigated
    corecore