1,292 research outputs found

    Benchmark Test of CP-PACS for Lattice QCD

    Full text link
    The CP-PACS is a massively parallel computer dedicated for calculations in computational physics and will be in operation in the spring of 1996 at Center for Computational Physics, University of Tsukuba. In this article, we describe the architecture of the CP-PACS and report the results of the estimate of the performance of the CP-PACS for typical lattice QCD calculations.Comment: 12 pages (5 figures), Postscript file, talk presented at "QCD on Massively Parallel Computers" (Yamagata, Japan, March 16-18,1995

    Reconfigurable microarchitectures at the programmable logic interface

    Get PDF

    Automated design of domain-specific custom instructions

    Get PDF

    Hierarchical architecture design and simulation environment

    Get PDF
    The Hierarchical Architectural design and Simulation Environment (HASE)is intended as a flexible tool for computer architects who wish to experiment with alternative architectural configurations and design parameters. HASE is both a design environment and a simulator. Architecture components are described by a hierarchical library of objects defined in terms of an object oriented simulation language. HASE instantiates these objects to simulate and animate the execution of a computer architecture. An event trace generated by the simulator therefore describes the interaction between architecture components, for example, fetch stages, address and data buses, sequencers, instruction buffers and register files. The objects can model physical components at different abstraction levels, eg. PMS (processor memory switch), ISP (instruction set processor) and RTL (register transfer level). HASE applies the concepts of inheritance, encapsulation and polymorphism associated with object orientation, to simplify the design and implementation of an architecture simulation that models component operations at different abstraction levels. For example, HASE can probe the performance of a processor's floating point unit, executing a multiplication operation, at a lower level of abstraction, i.e. the RTL, whilst simulating remaining architecture components at a PMS level of abstraction. By adopting this approach, HASE returns a more meaningful and relevant event trace from an architecture simulation. Furthermore, an animator visualises the simulation's event trace to clarify the collaborations and interactions between architecture components. The prototype version of HASE is based on GSS (Graphical Support System), and DEMOS (Discrete Event Modelling On Simula)

    The Cost of Application-Class Processing: Energy and Performance Analysis of a Linux-Ready 1.7-GHz 64-Bit RISC-V Core in 22-nm FDSOI Technology

    Get PDF
    The open-source RISC-V instruction set architecture (ISA) is gaining traction, both in industry and academia. The ISA is designed to scale from microcontrollers to server-class processors. Furthermore, openness promotes the availability of various open-source and commercial implementations. Our main contribution in this paper is a thorough power, performance, and efficiency analysis of the RISC-V ISA targeting baseline "application class" functionality, i.e., supporting the Linux OS and its application environment based on our open-source single-issue in-order implementation of the 64-bit ISA variant (RV64GC) called Ariane. Our analysis is based on a detailed power and efficiency analysis of the RISC-V ISA extracted from silicon measurements and calibrated simulation of an Ariane instance (RV64IMC) taped-out in GlobalFoundries 22FDX technology. Ariane runs at up to 1.7-GHz, achieves up to 40-Gop/sW energy efficiency, which is superior to similar cores presented in the literature. We provide insight into the interplay between functionality required for the application-class execution (e.g., virtual memory, caches, and multiple modes of privileged operation) and energy cost. We also compare Ariane with RISCY, a simpler and a slower microcontroller-class core. Our analysis confirms that supporting application-class execution implies a nonnegligible energy-efficiency loss and that compute performance is more cost-effectively boosted by instruction extensions (e.g., packed SIMD) rather than the high-frequency operation

    A new parallelisation technique for heterogeneous CPUs

    Get PDF
    Parallelization has moved in recent years into the mainstream compilers, and the demand for parallelizing tools that can do a better job of automatic parallelization is higher than ever. During the last decade considerable attention has been focused on developing programming tools that support both explicit and implicit parallelism to keep up with the power of the new multiple core technology. Yet the success to develop automatic parallelising compilers has been limited mainly due to the complexity of the analytic process required to exploit available parallelism and manage other parallelisation measures such as data partitioning, alignment and synchronization. This dissertation investigates developing a programming tool that automatically parallelises large data structures on a heterogeneous architecture and whether a high-level programming language compiler can use this tool to exploit implicit parallelism and make use of the performance potential of the modern multicore technology. The work involved the development of a fully automatic parallelisation tool, called VSM, that completely hides the underlying details of general purpose heterogeneous architectures. The VSM implementation provides direct and simple access for users to parallelise array operations on the Cell’s accelerators without the need for any annotations or process directives. This work also involved the extension of the Glasgow Vector Pascal compiler to work with the VSM implementation as a one compiler system. The developed compiler system, which is called VP-Cell, takes a single source code and parallelises array expressions automatically. Several experiments were conducted using Vector Pascal benchmarks to show the validity of the VSM approach. The VP-Cell system achieved significant runtime performance on one accelerator as compared to the master processor’s performance and near-linear speedups over code runs on the Cell’s accelerators. Though VSM was mainly designed for developing parallelising compilers it also showed a considerable performance by running C code over the Cell’s accelerators

    Towards Porting Operating Systems with Program Synthesis

    Full text link
    The end of Moore's Law has ushered in a diversity of hardware not seen in decades. Operating system (and system software) portability is accordingly becoming increasingly critical. Simultaneously, there has been tremendous progress in program synthesis. We set out to explore the feasibility of using modern program synthesis to generate the machine-dependent parts of an operating system. Our ultimate goal is to generate new ports automatically from descriptions of new machines. One of the issues involved is writing specifications, both for machine-dependent operating system functionality and for instruction set architectures. We designed two domain-specific languages: Alewife for machine-independent specifications of machine-dependent operating system functionality and Cassiopea for describing instruction set architecture semantics. Automated porting also requires an implementation. We developed a toolchain that, given an Alewife specification and a Cassiopea machine description, specializes the machine-independent specification to the target instruction set architecture and synthesizes an implementation in assembly language with a customized symbolic execution engine. Using this approach, we demonstrate successful synthesis of a total of 140 OS components from two pre-existing OSes for four real hardware platforms. We also developed several optimization methods for OS-related assembly synthesis to improve scalability. The effectiveness of our languages and ability to synthesize code for all 140 specifications is evidence of the feasibility of program synthesis for machine-dependent OS code. However, many research challenges remain; we also discuss the benefits and limitations of our synthesis-based approach to automated OS porting.Comment: ACM Transactions on Programming Languages and Systems. Accepted on August 202
    • 

    corecore