1,292 research outputs found
Benchmark Test of CP-PACS for Lattice QCD
The CP-PACS is a massively parallel computer dedicated for calculations in
computational physics and will be in operation in the spring of 1996 at Center
for Computational Physics, University of Tsukuba. In this article, we describe
the architecture of the CP-PACS and report the results of the estimate of the
performance of the CP-PACS for typical lattice QCD calculations.Comment: 12 pages (5 figures), Postscript file, talk presented at "QCD on
Massively Parallel Computers" (Yamagata, Japan, March 16-18,1995
Hierarchical architecture design and simulation environment
The Hierarchical Architectural design and Simulation Environment (HASE)is
intended as a flexible tool for computer architects who wish to experiment with
alternative architectural configurations and design parameters. HASE is both
a design environment and a simulator. Architecture components are described
by a hierarchical library of objects defined in terms of an object oriented simulation language. HASE instantiates these objects to simulate and animate the
execution of a computer architecture. An event trace generated by the simulator
therefore describes the interaction between architecture components, for example,
fetch stages, address and data buses, sequencers, instruction buffers and register
files. The objects can model physical components at different abstraction levels,
eg. PMS (processor memory switch), ISP (instruction set processor) and RTL
(register transfer level). HASE applies the concepts of inheritance, encapsulation
and polymorphism associated with object orientation, to simplify the design and
implementation of an architecture simulation that models component operations
at different abstraction levels. For example, HASE can probe the performance
of a processor's floating point unit, executing a multiplication operation, at a
lower level of abstraction, i.e. the RTL, whilst simulating remaining architecture
components at a PMS level of abstraction. By adopting this approach, HASE
returns a more meaningful and relevant event trace from an architecture simulation. Furthermore, an animator visualises the simulation's event trace to clarify
the collaborations and interactions between architecture components. The prototype version of HASE is based on GSS (Graphical Support System), and DEMOS
(Discrete Event Modelling On Simula)
The Cost of Application-Class Processing: Energy and Performance Analysis of a Linux-Ready 1.7-GHz 64-Bit RISC-V Core in 22-nm FDSOI Technology
The open-source RISC-V instruction set architecture (ISA) is gaining traction, both in industry and academia. The ISA is designed to scale from microcontrollers to server-class processors. Furthermore, openness promotes the availability of various open-source and commercial implementations. Our main contribution in this paper is a thorough power, performance, and efficiency analysis of the RISC-V ISA targeting baseline "application class" functionality, i.e., supporting the Linux OS and its application environment based on our open-source single-issue in-order implementation of the 64-bit ISA variant (RV64GC) called Ariane. Our analysis is based on a detailed power and efficiency analysis of the RISC-V ISA extracted from silicon measurements and calibrated simulation of an Ariane instance (RV64IMC) taped-out in GlobalFoundries 22FDX technology. Ariane runs at up to 1.7-GHz, achieves up to 40-Gop/sW energy efficiency, which is superior to similar cores presented in the literature. We provide insight into the interplay between functionality required for the application-class execution (e.g., virtual memory, caches, and multiple modes of privileged operation) and energy cost. We also compare Ariane with RISCY, a simpler and a slower microcontroller-class core. Our analysis confirms that supporting application-class execution implies a nonnegligible energy-efficiency loss and that compute performance is more cost-effectively boosted by instruction extensions (e.g., packed SIMD) rather than the high-frequency operation
A new parallelisation technique for heterogeneous CPUs
Parallelization has moved in recent years into the mainstream compilers, and the demand
for parallelizing tools that can do a better job of automatic parallelization is higher than
ever. During the last decade considerable attention has been focused on developing programming
tools that support both explicit and implicit parallelism to keep up with the
power of the new multiple core technology. Yet the success to develop automatic parallelising
compilers has been limited mainly due to the complexity of the analytic process
required to exploit available parallelism and manage other parallelisation measures such
as data partitioning, alignment and synchronization.
This dissertation investigates developing a programming tool that automatically parallelises
large data structures on a heterogeneous architecture and whether a high-level programming
language compiler can use this tool to exploit implicit parallelism and make use
of the performance potential of the modern multicore technology. The work involved the
development of a fully automatic parallelisation tool, called VSM, that completely hides
the underlying details of general purpose heterogeneous architectures. The VSM implementation
provides direct and simple access for users to parallelise array operations on the
Cellâs accelerators without the need for any annotations or process directives. This work
also involved the extension of the Glasgow Vector Pascal compiler to work with the VSM
implementation as a one compiler system. The developed compiler system, which is called
VP-Cell, takes a single source code and parallelises array expressions automatically.
Several experiments were conducted using Vector Pascal benchmarks to show the validity
of the VSM approach. The VP-Cell system achieved significant runtime performance
on one accelerator as compared to the master processorâs performance and near-linear
speedups over code runs on the Cellâs accelerators. Though VSM was mainly designed for
developing parallelising compilers it also showed a considerable performance by running
C code over the Cellâs accelerators
Towards Porting Operating Systems with Program Synthesis
The end of Moore's Law has ushered in a diversity of hardware not seen in
decades. Operating system (and system software) portability is accordingly
becoming increasingly critical. Simultaneously, there has been tremendous
progress in program synthesis. We set out to explore the feasibility of using
modern program synthesis to generate the machine-dependent parts of an
operating system. Our ultimate goal is to generate new ports automatically from
descriptions of new machines. One of the issues involved is writing
specifications, both for machine-dependent operating system functionality and
for instruction set architectures. We designed two domain-specific languages:
Alewife for machine-independent specifications of machine-dependent operating
system functionality and Cassiopea for describing instruction set architecture
semantics. Automated porting also requires an implementation. We developed a
toolchain that, given an Alewife specification and a Cassiopea machine
description, specializes the machine-independent specification to the target
instruction set architecture and synthesizes an implementation in assembly
language with a customized symbolic execution engine. Using this approach, we
demonstrate successful synthesis of a total of 140 OS components from two
pre-existing OSes for four real hardware platforms. We also developed several
optimization methods for OS-related assembly synthesis to improve scalability.
The effectiveness of our languages and ability to synthesize code for all 140
specifications is evidence of the feasibility of program synthesis for
machine-dependent OS code. However, many research challenges remain; we also
discuss the benefits and limitations of our synthesis-based approach to
automated OS porting.Comment: ACM Transactions on Programming Languages and Systems. Accepted on
August 202
- âŠ