563 research outputs found

    A Compiler and Runtime Infrastructure for Automatic Program Distribution

    Get PDF
    This paper presents the design and the implementation of a compiler and runtime infrastructure for automatic program distribution. We are building a research infrastructure that enables experimentation with various program partitioning and mapping strategies and the study of automatic distribution's effect on resource consumption (e.g., CPU, memory, communication). Since many optimization techniques are faced with conflicting optimization targets (e.g., memory and communication), we believe that it is important to be able to study their interaction. We present a set of techniques that enable flexible resource modeling and program distribution. These are: dependence analysis, weighted graph partitioning, code and communication generation, and profiling. We have developed these ideas in the context of the Java language. We present in detail the design and implementation of each of the techniques as part of our compiler and runtime infrastructure. Then, we evaluate our design and present preliminary experimental data for each component, as well as for the entire system

    Network Virtual Machine (NetVM): A New Architecture for Efficient and Portable Packet Processing Applications

    Get PDF
    A challenge facing network device designers, besides increasing the speed of network gear, is improving its programmability in order to simplify the implementation of new applications (see for example, active networks, content networking, etc). This paper presents our work on designing and implementing a virtual network processor, called NetVM, which has an instruction set optimized for packet processing applications, i.e., for handling network traffic. Similarly to a Java Virtual Machine that virtualizes a CPU, a NetVM virtualizes a network processor. The NetVM is expected to provide a compatibility layer for networking tasks (e.g., packet filtering, packet counting, string matching) performed by various packet processing applications (firewalls, network monitors, intrusion detectors) so that they can be executed on any network device, ranging from expensive routers to small appliances (e.g. smart phones). Moreover, the NetVM will provide efficient mapping of the elementary functionalities used to realize the above mentioned networking tasks upon specific hardware functional units (e.g., ASICs, FPGAs, and network processing elements) included in special purpose hardware systems possibly deployed to implement network devices

    cphVB: A System for Automated Runtime Optimization and Parallelization of Vectorized Applications

    Full text link
    Modern processor architectures, in addition to having still more cores, also require still more consideration to memory-layout in order to run at full capacity. The usefulness of most languages is deprecating as their abstractions, structures or objects are hard to map onto modern processor architectures efficiently. The work in this paper introduces a new abstract machine framework, cphVB, that enables vector oriented high-level programming languages to map onto a broad range of architectures efficiently. The idea is to close the gap between high-level languages and hardware optimized low-level implementations. By translating high-level vector operations into an intermediate vector bytecode, cphVB enables specialized vector engines to efficiently execute the vector operations. The primary success parameters are to maintain a complete abstraction from low-level details and to provide efficient code execution across different, modern, processors. We evaluate the presented design through a setup that targets multi-core CPU architectures. We evaluate the performance of the implementation using Python implementations of well-known algorithms: a jacobi solver, a kNN search, a shallow water simulation and a synthetic stencil simulation. All demonstrate good performance

    t|ket> : A retargetable compiler for NISQ devices

    Get PDF
    We present t|ket>, a quantum software development platform produced by Cambridge Quantum Computing Ltd. The heart of t|ket> is a language-agnostic optimising compiler designed to generate code for a variety of NISQ devices, which has several features designed to minimise the influence of device error. The compiler has been extensively benchmarked and outperforms most competitors in terms of circuit optimisation and qubit routing

    Using Rapid Prototyping in Computer Architecture Design Laboratories

    Get PDF
    This paper describes the undergraduate computer architecture courses and laboratories introduced at Georgia Tech during the past two years. A core sequence of six required courses for computer engineering students has been developed. In this paper, emphasis is placed upon the new core laboratories which utilize commercial CAD tools, FPGAs, hardware emulators, and a VHDL based rapid prototyping approach to simulate, synthesize, and implement prototype computer hardware

    Optimized Fast Fourier Transform Architecture Using Instruction Set Architecture Extension In Low-End Digital Signal Controller

    Get PDF
    Smart microgrids have emerged as a viable solution in case of emergency situations occurred at the main electricity grid. The main concern of a smart microgrid is the degradation of the power quality caused by harmonic distortion originated from the non-linear equipment. With the rapid development of power electronic technology, the increased of harmonic-producing loads in the smart microgrids necessitating a new digital signal controller architecture for the harmonic measurement system. While the current system configurations are directed towards the 32-bit architecture, it shows higher requirements in area footprint and multi-core setup. This thesis presents the design of a low-end digital signal controller architecture using instruction set architecture (ISA) extension for the implementation of the harmonic measurement system in a smart microgrid. A new architecture, called UTeMRISC, is developed from the baseline 8-bit microcontroller with the capability to perform signal processing applications such as Fast Fourier Transform (FFT). The architecture is improved using the Application-Specific Instruction Set Processor (ASIP) approach by extending the instruction set architecture to 16-bit length. Instruction set customization is implemented to enable the execution of computationally intensive tasks. The entire architecture is described in Verilog Hardware Description Language (HDL) and implemented on the Virtex-6 FPGA board. From the test programs, UTeMRISC has demonstrated faster execution times and higher maximum operating frequency while not significantly increased the core’s resource utilization. Compared to the initial processor architecture, the support of extended ISA has increased the UTeMRISC core by 21.8% but at the same time allows to execute Fast Fourier Transform algorithm up to 5× faster. The combine effort of ISA extension and optimized instruction set generation results in up to 1 Mega sample per second, which translated to 66.8% increase of data throughput in the FFT algorithm when compared to a 32-bit architecture. This research proves that with comprehensive ASIP methodology and ISA extension, a low-end digital signal controller architecture is feasible and effective to be implemented in a harmonic measurement system for a smart microgrid

    pony - The occam-pi Network Environment

    Get PDF
    Although concurrency is generally perceived to be a `hard' subject, it can in fact be very simple --- provided that the underlying model is simple. The occam-pi parallel processing language provides such a simple yet powerful concurrency model that is based on CSP and the pi-calculus. This paper presents pony, the occam-pi Network Environment. occam-pi and pony provide a new, unified, concurrency model that bridges inter- and intra-processor concurrency. This enables the development of distributed applications in a transparent, dynamic and highly scalable way. The first part of this paper discusses the philosophy behind pony, explains how it is used, and gives a brief overview of its implementation. The second part evaluates pony's performance by presenting a number of benchmarks

    A distributed programming environment for Ada

    Get PDF
    Despite considerable commercial exploitation of fault tolerance systems, significant and difficult research problems remain in such areas as fault detection and correction. A research project is described which constructs a distributed computing test bed for loosely coupled computers. The project is constructing a tool kit to support research into distributed control algorithms, including a distributed Ada compiler, distributed debugger, test harnesses, and environment monitors. The Ada compiler is being written in Ada and will implement distributed computing at the subsystem level. The design goal is to provide a variety of control mechanics for distributed programming while retaining total transparency at the code level
    corecore