Search CORE

11 research outputs found

OMICRON : a parallel computer architecture for declarative languages

Author: Lioupis Dimitris
Lioupis Dimitris
Publication venue: Department of Computing, Imperial College London
Publication date: 01/01/1988
Field of study

Imperial Users onl

Spiral - Imperial College Digital Repository

Exploring Cache Performance in Multithreaded Processors

Author: Dimitris Lioupis
Sotiris Milios
Publication venue
Publication date: 01/01/1997
Field of study

Multithreading is a well known technique to hide latency in a nonblocking cache architecture. By switching execution from one thread to another, the CPU can perform useful work, while waiting for pending requests to be processed by the main memory. In this paper we examine the effects of varying the associativity and block size on cache performance in a reduced locality of reference environment, due to multithreading. We find that for associativity equal to the number of threads, the cache produces very low miss rate even for small sizes. Also by taking into account the increase in cycle time due to larger cache size or associativity we find that the optimum cache configuration for best processor performance is 16Kbytes direct mapped. Finally, with a constant main memory bandwidth, increasing the block size to more than 32 bytes, reduces the miss rate, but degrades processor performance. 1. Introduction Multithreading is an important technique proposed to tolerate memory latency in co..

CiteSeerX

Real Time Behavior Of Multithreaded Processors

Author: Dimitris Lioupis
Multithreaded Processors
Publication venue
Publication date
Field of study

We find that the processor response greatly depends on the cache configuration and main memory throughput. For simple cache design, the conflict misses reduce RT response below 55%. The only way to guarantee RT performance in multithreaded processors is to increase the memory bandwidth by employing pipelining. This way misses are serviced faster and near 100% performance can be achieved. Introduction Multithreading has been proposed as a technique for tolerating latency in computer systems. In uniprocessor systems, multithreading has been proposed by Hirata [4], Gupta [13], Eggers [11] and others, to tolerate the latency caused by a cache miss. Multithreading has also been studied both in multiprocessor systems such as the APRIL [5], the Tera Computer [6], and the HEP [14], as well as in data--flow machines such as the *T [3], the Monsoon [15], and others, to tolerate the latency caused by long memory access through interconnection networks. It will not be long before multithreaded p..

CiteSeerX

Exploring cache performance in multithreaded processors

Author: Dimitris Lioupis
Sotiris Milios
Publication venue: 'Elsevier BV'
Publication date
Field of study

Crossref

Exploring the Cache Design Space in a Multithreaded Processor

Author: Dimitris Lioupis
Sotiris Milios
Publication venue
Publication date
Field of study

Multithreading can be used to hide latency in a non-blocking cache architecture. By switching execution form one thread to another, the CPU can perform useful work, while waiting for pending requests to be processed by the main memory. This frequent context switching however, produces a very irregular memory referencing pattern. In this paper we examine the effects of associativity and block size on cache performance in such a hostile environment. We find that for associativity equal to the number of threads, the cache produces very low miss rate even for small sizes, which helps improve the performance of the processor. By increasing the block size to more than 32 bytes, although the miss rate is reduced, the processor performance degrades with a given memory bandwidth. 1. Introduction Multithreading is an important technique for tolerating latency in computer systems. It has been shown to work in multiprocessor systems [1], [5], [6], [14], as well as in data-flow machines [3], [4], ..

CiteSeerX

Efficient On-Line Trace Driven Simulation of Parallel Computer Architectures

Author: Dimitris Lioupis
Georgios Theodoropoulos
Sotiris Milios
Publication venue
Publication date
Field of study

This paper presents an approach for the synchronization of the trace generator and the target architectural and memory models in online, trace-driven simulations of parallel computer architectures. This approach eliminates the need for special concurrency control mechanisms conventionally used to achieve this synchronization, thus providing for simplicity and portability. The validity of the proposed approach is illustrated by employing it for the simulation of a multithreaded processor. INTRODUCTION Technological and architectural advances have dramatically increased the size and complexity of computer system designs. The need to cope with this complexity and the requirement for shorter development times and reduced design costs have assigned key roles to modeling and simulation in computer architecture research. Modeling and simulation are essential tools for experimenting with alternative ways of using the available silicon area, verifying the timing behaviour and functional corre..

CiteSeerX

PiSMA

Author: Andreas Pipis
Dimitris Lioupis
Kuehn J.T.
Lioupis D.
Lioupis D.
Maria Smirli
Michael Stefanidakis
Warren D. H. D.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Crossref

Efficient Modeling And Simulation Of A Virtually Shared Memory Architecture

Author: Andreas Pipis
Dimitris Lioupis
Georgios K. Theodoropoulos
Michael Stefanidakis
Publication venue
Publication date
Field of study

Modeling and simulation have been assigned crucial roles in the design, development, analysis and evaluation of computer architectures. The design of parallel architectures in particular is a complex and difficult endeavor that makes modeling and simulation essential tools. In this case, high simulation performance is a prerequisite since, large workloads need to be simulated for a realistic analysis of the parallel system. This paper presents the approach that has been followed for the modeling and simulation of PiSMA, a virtual shared memory parallel architecture, scaleable up to 100s of processors. 1. INTRODUCTION Technological and architectural advances have dramatically increased the size and complexity of computer system designs. The need to cope with this complexity and the requirement for shorter development times and reduced design costs have assigned key roles to discrete event simulation modeling [15] [19] in computer architecture research. Discrete event simulation modeli..

CiteSeerX

Architecture of a VLSI instruction cache for a RISC

Author: Chris Nyberg
David A. Patterson
Dimitris Lioupis
Hennessy J.
Korbin Van Dyke
Mark Hill
Phil Garrison
Smith J.E.
Tim Sippel
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Crossref