2,581 research outputs found

    EbbRT: a framework for building per-application library operating systems

    Full text link
    Efficient use of high speed hardware requires operating system components be customized to the application work- load. Our general purpose operating systems are ill-suited for this task. We present EbbRT, a framework for constructing per-application library operating systems for cloud applications. The primary objective of EbbRT is to enable high-performance in a tractable and maintainable fashion. This paper describes the design and implementation of EbbRT, and evaluates its ability to improve the performance of common cloud applications. The evaluation of the EbbRT prototype demonstrates memcached, run within a VM, can outperform memcached run on an unvirtualized Linux. The prototype evaluation also demonstrates an 14% performance improvement of a V8 JavaScript engine benchmark, and a node.js webserver that achieves a 50% reduction in 99th percentile latency compared to it run on Linux

    Fusion and Perspective Correction of Multiple Networked Video Sensors

    Get PDF
    A network of adaptive processing elements has been developed that transforms and fuses video captured from multiple sensors. Unlike systems that rely on end-systems to process data, this system distributes the computation throughout the network in order to reduce overall network bandwidth. The network architecture is scalable because it uses a hierarchy of processing engines to perform signal processing. Nodes within the network can be dynamically reprogrammed in order to compose video from multiple sources, digitally transform camera perspectives, and adapt the video format to meet the needs of speciļ¬c applications. A prototype has been developed using reconļ¬gurable hardware that collects and processes real-time, streaming video of an urban environment. Multiple video cameras gather data from diļ¬€erent perspectives and fuse that data into a uniļ¬ed, top-down view. The hardware exploits both the spatial and temporal parallelism of the video streams and the regular processing when applying the transforms. Recon-ļ¬gurable hardware allows for the functions at nodes to be reprogrammed for dynamic changes in topology. Hardware-based video processors also consume less power than high frequency software-based solutions. Performance and scalability are compared to a distributed software-based implementation. The reconļ¬gurable hardware design is coded in VHDL and prototyped using Washington Universityā€™s Field Programmable Port Extender (FPX) platform. The transform engine circuit utilizes approximately 34 percent of the resources of a Xilinx Virtex 2000E FPGA, and can be clocked at frequencies up to 48 MHz. The com-position engine circuit utilizes approximately 39 percent of the resources of a Xilinx Virtex 2000E FPGA, and can be clocked at frequencies up to 45 MHz

    A case study for NoC based homogeneous MPSoC architectures

    Get PDF
    The many-core design paradigm requires flexible and modular hardware and software components to provide the required scalability to next-generation on-chip multiprocessor architectures. A multidisciplinary approach is necessary to consider all the interactions between the different components of the design. In this paper, a complete design methodology that tackles at once the aspects of system level modeling, hardware architecture, and programming model has been successfully used for the implementation of a multiprocessor network-on-chip (NoC)-based system, the NoCRay graphic accelerator. The design, based on 16 processors, after prototyping with field-programmable gate array (FPGA), has been laid out in 90-nm technology. Post-layout results show very low power, area, as well as 500 MHz of clock frequency. Results show that an array of small and simple processors outperform a single high-end general purpose processo

    The "MIND" Scalable PIM Architecture

    Get PDF
    MIND (Memory, Intelligence, and Network Device) is an advanced parallel computer architecture for high performance computing and scalable embedded processing. It is a Processor-in-Memory (PIM) architecture integrating both DRAM bit cells and CMOS logic devices on the same silicon die. MIND is multicore with multiple memory/processor nodes on each chip and supports global shared memory across systems of MIND components. MIND is distinguished from other PIM architectures in that it incorporates mechanisms for efficient support of a global parallel execution model based on the semantics of message-driven multithreaded split-transaction processing. MIND is designed to operate either in conjunction with other conventional microprocessors or in standalone arrays of like devices. It also incorporates mechanisms for fault tolerance, real time execution, and active power management. This paper describes the major elements and operational methods of the MIND architecture
    • ā€¦
    corecore