Search CORE

2,581 research outputs found

EbbRT: a framework for building per-application library operating systems

Author: Appavoo Jonathan
Cadden James
Dong Han
Krieger Orran
Schatzberg Dan
Publication venue: Computer Science Department, Boston University
Publication date: 23/02/2016
Field of study

Efficient use of high speed hardware requires operating system components be customized to the application work- load. Our general purpose operating systems are ill-suited for this task. We present EbbRT, a framework for constructing per-application library operating systems for cloud applications. The primary objective of EbbRT is to enable high-performance in a tractable and maintainable fashion. This paper describes the design and implementation of EbbRT, and evaluates its ability to improve the performance of common cloud applications. The evaluation of the EbbRT prototype demonstrates memcached, run within a VM, can outperform memcached run on an unvirtualized Linux. The prototype evaluation also demonstrates an 14% performance improvement of a V8 JavaScript engine benchmark, and a node.js webserver that achieves a 50% reduction in 99th percentile latency compared to it run on Linux

Boston University Institutional Repository (OpenBU)

Fusion and Perspective Correction of Multiple Networked Video Sensors

Author: Lookwood John W.
Neely Christopher E.
Publication venue: Washington University Open Scholarship
Publication date: 01/05/2004
Field of study

A network of adaptive processing elements has been developed that transforms and fuses video captured from multiple sensors. Unlike systems that rely on end-systems to process data, this system distributes the computation throughout the network in order to reduce overall network bandwidth. The network architecture is scalable because it uses a hierarchy of processing engines to perform signal processing. Nodes within the network can be dynamically reprogrammed in order to compose video from multiple sources, digitally transform camera perspectives, and adapt the video format to meet the needs of speciﬁc applications. A prototype has been developed using reconﬁgurable hardware that collects and processes real-time, streaming video of an urban environment. Multiple video cameras gather data from diﬀerent perspectives and fuse that data into a uniﬁed, top-down view. The hardware exploits both the spatial and temporal parallelism of the video streams and the regular processing when applying the transforms. Recon-ﬁgurable hardware allows for the functions at nodes to be reprogrammed for dynamic changes in topology. Hardware-based video processors also consume less power than high frequency software-based solutions. Performance and scalability are compared to a distributed software-based implementation. The reconﬁgurable hardware design is coded in VHDL and prototyped using Washington University’s Field Programmable Port Extender (FPX) platform. The transform engine circuit utilizes approximately 34 percent of the resources of a Xilinx Virtex 2000E FPGA, and can be clocked at frequencies up to 48 MHz. The com-position engine circuit utilizes approximately 39 percent of the resources of a Xilinx Virtex 2000E FPGA, and can be clocked at frequencies up to 45 MHz

Washington University St. Louis: Open Scholarship

A case study for NoC based homogeneous MPSoC architectures

Author: Casu Mario Roberto
Macchiarulo Luca
Ruo Roch Massimo
Tota Sergio Vincenzo
Zamboni Maurizio
Publication venue: IEEE
Publication date: 01/01/2009
Field of study

The many-core design paradigm requires flexible and modular hardware and software components to provide the required scalability to next-generation on-chip multiprocessor architectures. A multidisciplinary approach is necessary to consider all the interactions between the different components of the design. In this paper, a complete design methodology that tackles at once the aspects of system level modeling, hardware architecture, and programming model has been successfully used for the implementation of a multiprocessor network-on-chip (NoC)-based system, the NoCRay graphic accelerator. The design, based on 16 processors, after prototyping with field-programmable gate array (FPGA), has been laid out in 90-nm technology. Post-layout results show very low power, area, as well as 500 MHz of clock frequency. Results show that an array of small and simple processors outperform a single high-end general purpose processo

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

PORTO Publications Open Repository TOrino

The "MIND" Scalable PIM Architecture

Author: Brodowicz Maciej
Sterling Thomas
Publication venue
Publication date: 01/01/2005
Field of study

MIND (Memory, Intelligence, and Network Device) is an advanced parallel computer architecture for high performance computing and scalable embedded processing. It is a Processor-in-Memory (PIM) architecture integrating both DRAM bit cells and CMOS logic devices on the same silicon die. MIND is multicore with multiple memory/processor nodes on each chip and supports global shared memory across systems of MIND components. MIND is distinguished from other PIM architectures in that it incorporates mechanisms for efficient support of a global parallel execution model based on the semantics of message-driven multithreaded split-transaction processing. MIND is designed to operate either in conjunction with other conventional microprocessors or in standalone arrays of like devices. It also incorporates mechanisms for fault tolerance, real time execution, and active power management. This paper describes the major elements and operational methods of the MIND architecture

Caltech Authors