Search CORE

564 research outputs found

A Compiler and Runtime Infrastructure for Automatic Program Distribution

Author: Chu Matt
Diaconescu Roxana E.
Mouri Zachary
Wang Lei
Publication venue
Publication date: 01/04/2005
Field of study

This paper presents the design and the implementation of a compiler and runtime infrastructure for automatic program distribution. We are building a research infrastructure that enables experimentation with various program partitioning and mapping strategies and the study of automatic distribution's effect on resource consumption (e.g., CPU, memory, communication). Since many optimization techniques are faced with conflicting optimization targets (e.g., memory and communication), we believe that it is important to be able to study their interaction. We present a set of techniques that enable flexible resource modeling and program distribution. These are: dependence analysis, weighted graph partitioning, code and communication generation, and profiling. We have developed these ideas in the context of the Java language. We present in detail the design and implementation of each of the techniques as part of our compiler and runtime infrastructure. Then, we evaluate our design and present preliminary experimental data for each component, as well as for the entire system

Caltech Authors

Network Virtual Machine (NetVM): A New Architecture for Efficient and Portable Packet Processing Applications

Author: Baldi Mario
Buffa D.
Degioanni L.
Risso Fulvio Giovanni Ottavio
Stirano F.
Varenni G.
Publication venue: IEEE
Publication date: 01/01/2005
Field of study

A challenge facing network device designers, besides increasing the speed of network gear, is improving its programmability in order to simplify the implementation of new applications (see for example, active networks, content networking, etc). This paper presents our work on designing and implementing a virtual network processor, called NetVM, which has an instruction set optimized for packet processing applications, i.e., for handling network traffic. Similarly to a Java Virtual Machine that virtualizes a CPU, a NetVM virtualizes a network processor. The NetVM is expected to provide a compatibility layer for networking tasks (e.g., packet filtering, packet counting, string matching) performed by various packet processing applications (firewalls, network monitors, intrusion detectors) so that they can be executed on any network device, ranging from expensive routers to small appliances (e.g. smart phones). Moreover, the NetVM will provide efficient mapping of the elementary functionalities used to realize the above mentioned networking tasks upon specific hardware functional units (e.g., ASICs, FPGAs, and network processing elements) included in special purpose hardware systems possibly deployed to implement network devices

Crossref

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

PORTO Publications Open Repository TOrino

A Dynamically Configurable Discrete Event Simulation Framework for Many-Core Chip Multiprocessors

Author: Christopher Barnes
Jaehwan Lee
Publication venue: 'IntechOpen'
Publication date: 18/08/2010
Field of study

IntechOpen

Crossref

cphVB: A System for Automated Runtime Optimization and Parallelization of Vectorized Applications

Author: Blum Troels
Kristensen Mads Ruben Burgdorff
Lund Simon Andreas Frimann
Vinter Brian
Publication venue
Publication date: 01/01/2012
Field of study

Modern processor architectures, in addition to having still more cores, also require still more consideration to memory-layout in order to run at full capacity. The usefulness of most languages is deprecating as their abstractions, structures or objects are hard to map onto modern processor architectures efficiently. The work in this paper introduces a new abstract machine framework, cphVB, that enables vector oriented high-level programming languages to map onto a broad range of architectures efficiently. The idea is to close the gap between high-level languages and hardware optimized low-level implementations. By translating high-level vector operations into an intermediate vector bytecode, cphVB enables specialized vector engines to efficiently execute the vector operations. The primary success parameters are to maintain a complete abstraction from low-level details and to provide efficient code execution across different, modern, processors. We evaluate the presented design through a setup that targets multi-core CPU architectures. We evaluate the performance of the implementation using Python implementations of well-known algorithms: a jacobi solver, a kNN search, a shallow water simulation and a synthetic stencil simulation. All demonstrate good performance

arXiv.org e-Print Archive

Crossref

Copenhagen University Research Information System

t|ket> : A retargetable compiler for NISQ devices

Author: Cowtan Alexander
Dilkes Silas
Duncan Ross
Edgington Alec
Simmons Will
Sivarajah Seyon
Publication venue: 'IOP Publishing'
Publication date: 24/04/2020
Field of study

We present t|ket>, a quantum software development platform produced by Cambridge Quantum Computing Ltd. The heart of t|ket> is a language-agnostic optimising compiler designed to generate code for a variety of NISQ devices, which has several features designed to minimise the influence of device error. The compiler has been extensively benchmarked and outperforms most competitors in terms of circuit optimisation and qubit routing

arXiv.org e-Print Archive

University of Strathclyde Institutional Repository

Using Rapid Prototyping in Computer Architecture Design Laboratories

Author: Binh Dao
Henry Owen
James Hamblen
Sudhakar Yalamanchili
Publication venue
Publication date: 01/01/1996
Field of study

This paper describes the undergraduate computer architecture courses and laboratories introduced at Georgia Tech during the past two years. A core sequence of six required courses for computer engineering students has been developed. In this paper, emphasis is placed upon the new core laboratories which utilize commercial CAD tools, FPGAs, hardware emulators, and a VHDL based rapid prototyping approach to simulate, synthesize, and implement prototype computer hardware

CiteSeerX

Crossref

Optimized Fast Fourier Transform Architecture Using Instruction Set Architecture Extension In Low-End Digital Signal Controller

Author: Salim Sani Irwan
Publication venue
Publication date: 01/01/2018
Field of study

Smart microgrids have emerged as a viable solution in case of emergency situations occurred at the main electricity grid. The main concern of a smart microgrid is the degradation of the power quality caused by harmonic distortion originated from the non-linear equipment. With the rapid development of power electronic technology, the increased of harmonic-producing loads in the smart microgrids necessitating a new digital signal controller architecture for the harmonic measurement system. While the current system configurations are directed towards the 32-bit architecture, it shows higher requirements in area footprint and multi-core setup. This thesis presents the design of a low-end digital signal controller architecture using instruction set architecture (ISA) extension for the implementation of the harmonic measurement system in a smart microgrid. A new architecture, called UTeMRISC, is developed from the baseline 8-bit microcontroller with the capability to perform signal processing applications such as Fast Fourier Transform (FFT). The architecture is improved using the Application-Specific Instruction Set Processor (ASIP) approach by extending the instruction set architecture to 16-bit length. Instruction set customization is implemented to enable the execution of computationally intensive tasks. The entire architecture is described in Verilog Hardware Description Language (HDL) and implemented on the Virtex-6 FPGA board. From the test programs, UTeMRISC has demonstrated faster execution times and higher maximum operating frequency while not significantly increased the core’s resource utilization. Compared to the initial processor architecture, the support of extended ISA has increased the UTeMRISC core by 21.8% but at the same time allows to execute Fast Fourier Transform algorithm up to 5× faster. The combine effort of ISA extension and optimized instruction set generation results in up to 1 Mega sample per second, which translated to 66.8% increase of data throughput in the FFT algorithm when compared to a 32-bit architecture. This research proves that with comprehensive ASIP methodology and ISA extension, a low-end digital signal controller architecture is feasible and effective to be implemented in a harmonic measurement system for a smart microgrid

Universiti Teknikal Malaysia Melaka (UTeM) Repository

pony - The occam-pi Network Environment

Author: Sampson Adam T.
Schweigler Mario
Publication venue: 'IOS Press'
Publication date: 01/09/2006
Field of study

Although concurrency is generally perceived to be a `hard' subject, it can in fact be very simple --- provided that the underlying model is simple. The occam-pi parallel processing language provides such a simple yet powerful concurrency model that is based on CSP and the pi-calculus. This paper presents pony, the occam-pi Network Environment. occam-pi and pony provide a new, unified, concurrency model that bridges inter- and intra-processor concurrency. This enables the development of distributed applications in a transparent, dynamic and highly scalable way. The first part of this paper discusses the philosophy behind pony, explains how it is used, and gives a brief overview of its implementation. The second part evaluates pony's performance by presenting a number of benchmarks

Kent Academic Repository

A distributed programming environment for Ada

Author: Brennan Peter
Litke John D.
Mcdonnell Tom
Mcfarland Gregory
Timmins Lawrence J.
Publication venue
Publication date
Field of study

Despite considerable commercial exploitation of fault tolerance systems, significant and difficult research problems remain in such areas as fault detection and correction. A research project is described which constructs a distributed computing test bed for loosely coupled computers. The project is constructing a tool kit to support research into distributed control algorithms, including a distributed Ada compiler, distributed debugger, test harnesses, and environment monitors. The Ada compiler is being written in Ada and will implement distributed computing at the subsystem level. The design goal is to provide a variety of control mechanics for distributed programming while retaining total transparency at the code level

NASA Technical Reports Server