Search CORE

1,939 research outputs found

Modeling and Analysis of Dual Block Multithreading

Author: Zuberek W. M.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2004
Field of study

Instruction level multithreading is a technique for tolerating long– latency operations (e.g., cache misses) by switching the processor to another thread instead of waiting for the completion of a lengthy operation. In block mul- tithreading, context switching occurs for each initiated long–latency operation. However, processor cycles during pipeline stalls as well as during context switch- ing are not used in typical block multithreading, reducing the performance of a processor. Dual block multithreading introduces a second active thread which is used for instruction issuing whenever the original (main) thread becomes in- active. Dual block multithreading can be regarded as a simple and specialized case of simultaneous multithreading when two (simultaneous) threads are used to issue instructions for a single pipeline. The paper develops a simple timed Petri net model of a dual block multithreading and uses this model to estimate the performance improvements of the proposed dual block multithreading

Memorial University Research Repository

Analysis of Multi-Threading and Cache Memory Latency Masking on Processor Performance Using Thread Synchronization Technique

Author: Ehis Akhigbe-mudu Thursday
Publication venue: Brazilian Journal of Science Publications
Publication date: 25/09/2023
Field of study

Multithreading is a process in which a single processor executes multiple threads concurrently. This enables the processor to divide tasks into separate threads and run them simultaneously, thereby increasing the utilization of available system resources and enhancing performance. When multiple threads share an object and one or more of them modify it, unpredictable outcomes may occur. Threads that exhibit poor locality of memory reference, such as database applications, often experience delays while waiting for a response from the memory hierarchy. This observation suggests how to better manage pipeline contention. To assess the impact of memory latency on processor performance, a dual-core MT machine with four thread contexts per core is utilized. These specific benchmarks are chosen to allow the workload to include programs with both favorable and unfavorable cache locality. To eliminate the issue of wasting the wake-up signals, this work proposes an approach that involves storing all the wake-up calls. It asserts the wake-up calls to the consumer and the producer can store the wake-up call in a variable.   An assigned value in working system (or kernel) storage that each process can check is a semaphore. Semaphore is a variable that reads, and update operations automatically in bit mode. It cannot be actualized in client mode since a race condition may persistently develop when two or more processors endeavor to induce to the variable at the same time. This study includes code to measure the time taken to execute both functions and plot the graph. It should be noted that sending multiple requests to a website simultaneously could trigger a flag, ultimately blocking access to the data. This necessitates some computation on the collected statistics. The execution time is reduced to one third when using threads compared to executing the functions sequentially. This exemplifies the power of multithreading

Brazilian Journal of Science

Safe and Verifiable Design of Concurrent Java Programs

Author: Bakkers A.W.P.
Hilderink G.H.
Stiles G.S.
Welch P.H.
Publication venue: Acta Press
Publication date: 01/01/2001
Field of study

The design of concurrent programs has a reputation for being difficult, and thus potentially dangerous in safetycritical real-time and embedded systems. The recent appearance of Java, whilst cleaning up many insecure aspects of OO programming endemic in C++, suffers from a deceptively simple threads model that is an insecure variant of ideas that are over 25 years old [1]. Consequently, we cannot directly exploit a range of new CASE tools -- based upon modern developments in parallel computing theory -- that can verify and check the design of concurrent systems for a variety of dangers\ud such as deadlock and livelock that otherwise plague us during testing and maintenance and, more seriously, cause catastrophic failure in service. \ud Our approach uses recently developed Java class\ud libraries based on Hoare's Communicating Sequential Processes (CSP); the use of CSP greatly simplifies the design of concurrent systems and, in many cases, a parallel approach often significantly simplifies systems originally approached sequentially. New CSP CASE tools permit designs to be verified against formal specifications\ud and checked for deadlock and livelock. Below we introduce CSP and its implementation in Java and develop a small concurrent application. The formal CSP description of the application is provided, as well as that of an equivalent sequential version. FDR is used to verify the correctness of both implementations, their\ud equivalence, and their freedom from deadlock and livelock

University of Twente Research Information

Fetch unit design for scalable simultaneous multithreading (ScSMT)

Author: Luque Fadón Emilio
Moure Juan Carlos
Rexachs del Rosario Dolores
Publication venue
Publication date: 29/03/2004
Field of study

Continuous IC process enhancements make possible to integrate on a single chip the re-sources required for simultaneously executing multiple control flows or threads, exploiting different levels of thread-level parallelism: application-, function-, and loop-level. Scalable simultaneous multi-threading combines static and dynamic mechanisms to assemble a complexity-effective design that provides high instruction per cycle rates without sacrificing cycle time nor single-thread performance. This paper addresses the design of the fetch unit for a high-performance, scalable, simultaneous multithreaded processor. We present the detailed microarchitecture of a clustered and reconfigurable fetch unit based on an existing single-thread fetch unit. In order to minimize the occurrence of fetch hazards, the fetch unit dynamically adapts to the available thread-level parallelism and to the fetch characteristics of the active threads, working as a single shared unit or as two separate clusters. It combines static and dynamic methods in a complexity-efficient way. The design is supported by a simulation- based analysis of different instruction cache and branch target buffer configurations on the context of a multithreaded execution workload. Average reductions on the miss rates between 30% and 60% and peak reductions greater than 200% are obtained.Facultad de Informátic

Servicio de Difusión de la Creación Intelectual

Fetch unit design for scalable simultaneous multithreading (ScSMT)

Author: Luque Fadón Emilio
Moure Juan Carlos
Rexachs del Rosario Dolores
Publication venue
Publication date: 01/01/2001
Field of study

Recommended from our members

Multithreading for high performance finance risk analysis

Author: Wei Wenqian
Publication venue: Brunel University School of Engineering and Design PhD Theses
Publication date: 01/01/2013
Field of study

This thesis was submitted for the degree of Master of Philosophy and awarded by Brunel UniversityWith the increasing of risks in the financial market, the models of risk management are developing quickly. The standard of the accuracy and effect of the models is improved continuously. This thesis investigates Value at Risk (VaR) which is an important method for measuring the market risk. It reviews the three methods which can be used to quantify VaR. These methods are parameter method, historical data processing method and Monte Carlo simulation method. Monte Carlo simulation has been widely employed for finance risk analysis. One challenge in Monte Carlo simulation is its computation complexity. For this purpose, this thesis researches into multithreading technique for high performance

Brunel University Research Archive

A Non-blocking Interconnection Network-Shared Cache Organization for Multi-core Processors

Author: Allam Rebhi Mohammad AbuMwais
علام ربحي محمد ابومويس
Publication venue: جامعة القدس
Publication date: 28/09/2013
Field of study

Al-Quds University Digital Repository

Experiences with porting and modelling wavefront algorithms on many-core architectures

Author: Hammond Simon D.
Jarvis Stephen A.
Mudalige Gihan R.
Pennycook Simon J.
Publication venue
Publication date: 01/09/2010
Field of study

We are currently investigating the viability of many-core architectures for the acceleration of wavefront applications and this report focuses on graphics processing units (GPUs) in particular. To this end, we have implemented NASA’s LU benchmark – a real world production-grade application – on GPUs employing NVIDIA’s Compute Unified Device Architecture (CUDA). This GPU implementation of the benchmark has been used to investigate the performance of a selection of GPUs, ranging from workstation-grade commodity GPUs to the HPC "Tesla” and "Fermi” GPUs. We have also compared the performance of the GPU solution at scale to that of traditional high perfor- mance computing (HPC) clusters based on a range of multi- core CPUs from a number of major vendors, including Intel (Nehalem), AMD (Opteron) and IBM (PowerPC). In previous work we have developed a predictive “plug-and-play” performance model of this class of application running on such clusters, in which CPUs communicate via the Message Passing Interface (MPI). By extending this model to also capture the performance behaviour of GPUs, we are able to: (1) comment on the effects that architectural changes will have on the performance of single-GPU solutions, and (2) make projections regarding the performance of multi-GPU solutions at larger scale

Warwick Research Archives Portal Repository

CellSim: a validated modular heterogeneous multiprocessor simulator

Author: Ayguadé Parra Eduard
Cabarcas Jaramillo Felipe
Martorell Bofill Xavier
Ramírez Bellido Alejandro
Rico Carro Alejandro
Ródenas Picó David
Publication venue: Thomson Editores Spain
Publication date: 01/01/2007
Field of study

As the number of transistors on a chip continues increasing the power consumption has become the most important constraint in processors design. Therefore, to increase performance, computer architects have decided to use multiprocessors. Moreover, recent studies have shown that heterogeneous chip multiprocessors have greater potential than homogeneous ones. We have built a modular simulator for heterogeneous multiprocessors that can be configure to model IBM's Cell Processor. The simulator has been validated against the real machine to be used as a research tool.Peer ReviewedPostprint (published version

UPCommons. Portal del coneixement obert de la UPC