1,939 research outputs found
Modeling and Analysis of Dual Block Multithreading
Instruction level multithreading is a technique for tolerating long–
latency operations (e.g., cache misses) by switching the processor to another
thread instead of waiting for the completion of a lengthy operation. In block mul-
tithreading, context switching occurs for each initiated long–latency operation.
However, processor cycles during pipeline stalls as well as during context switch-
ing are not used in typical block multithreading, reducing the performance of
a processor. Dual block multithreading introduces a second active thread which
is used for instruction issuing whenever the original (main) thread becomes in-
active. Dual block multithreading can be regarded as a simple and specialized
case of simultaneous multithreading when two (simultaneous) threads are used
to issue instructions for a single pipeline. The paper develops a simple timed
Petri net model of a dual block multithreading and uses this model to estimate
the performance improvements of the proposed dual block multithreading
Analysis of Multi-Threading and Cache Memory Latency Masking on Processor Performance Using Thread Synchronization Technique
Multithreading is a process in which a single processor executes multiple threads concurrently. This enables the processor to divide tasks into separate threads and run them simultaneously, thereby increasing the utilization of available system resources and enhancing performance. When multiple threads share an object and one or more of them modify it, unpredictable outcomes may occur. Threads that exhibit poor locality of memory reference, such as database applications, often experience delays while waiting for a response from the memory hierarchy. This observation suggests how to better manage pipeline contention. To assess the impact of memory latency on processor performance, a dual-core MT machine with four thread contexts per core is utilized. These specific benchmarks are chosen to allow the workload to include programs with both favorable and unfavorable cache locality. To eliminate the issue of wasting the wake-up signals, this work proposes an approach that involves storing all the wake-up calls. It asserts the wake-up calls to the consumer and the producer can store the wake-up call in a variable. An assigned value in working system (or kernel) storage that each process can check is a semaphore. Semaphore is a variable that reads, and update operations automatically in bit mode. It cannot be actualized in client mode since a race condition may persistently develop when two or more processors endeavor to induce to the variable at the same time.
This study includes code to measure the time taken to execute both functions and plot the graph. It should be noted that sending multiple requests to a website simultaneously could trigger a flag, ultimately blocking access to the data. This necessitates some computation on the collected statistics. The execution time is reduced to one third when using threads compared to executing the functions sequentially. This exemplifies the power of multithreading
Safe and Verifiable Design of Concurrent Java Programs
The design of concurrent programs has a reputation for being difficult, and thus potentially dangerous in safetycritical real-time and embedded systems. The recent appearance of Java, whilst cleaning up many insecure aspects of OO programming endemic in C++, suffers from a deceptively simple threads model that is an insecure variant of ideas that are over 25 years old [1]. Consequently, we cannot directly exploit a range of new CASE tools -- based upon modern developments in parallel computing theory -- that can verify and check the design of concurrent systems for a variety of dangers\ud
such as deadlock and livelock that otherwise plague us during testing and maintenance and, more seriously, cause catastrophic failure in service. \ud
Our approach uses recently developed Java class\ud
libraries based on Hoare's Communicating Sequential Processes (CSP); the use of CSP greatly simplifies the design of concurrent systems and, in many cases, a parallel approach often significantly simplifies systems originally approached sequentially. New CSP CASE tools permit designs to be verified against formal specifications\ud
and checked for deadlock and livelock. Below we introduce CSP and its implementation in Java and develop a small concurrent application. The formal CSP description of the application is provided, as well as that of an equivalent sequential version. FDR is used to verify the correctness of both implementations, their\ud
equivalence, and their freedom from deadlock and livelock
Fetch unit design for scalable simultaneous multithreading (ScSMT)
Continuous IC process enhancements make possible to integrate on a single chip the re-sources required for simultaneously executing multiple control flows or threads, exploiting different levels of thread-level parallelism: application-, function-, and loop-level. Scalable simultaneous multi-threading combines static and dynamic mechanisms to assemble a complexity-effective design that provides high instruction per cycle rates without sacrificing cycle time nor single-thread performance. This paper addresses the design of the fetch unit for a high-performance, scalable, simultaneous multithreaded processor. We present the detailed microarchitecture of a clustered and reconfigurable fetch unit based on an existing single-thread fetch unit. In order to minimize the occurrence of fetch hazards, the fetch unit dynamically adapts to the available thread-level parallelism and to the fetch characteristics of the active threads, working as a single shared unit or as two separate clusters. It combines static and dynamic methods in a complexity-efficient way. The design is supported by a simulation- based analysis of different instruction cache and branch target buffer configurations on the context of a multithreaded execution workload. Average reductions on the miss rates between 30% and 60% and peak reductions greater than 200% are obtained.Facultad de Informátic
Fetch unit design for scalable simultaneous multithreading (ScSMT)
Continuous IC process enhancements make possible to integrate on a single chip the re-sources required for simultaneously executing multiple control flows or threads, exploiting different levels of thread-level parallelism: application-, function-, and loop-level. Scalable simultaneous multi-threading combines static and dynamic mechanisms to assemble a complexity-effective design that provides high instruction per cycle rates without sacrificing cycle time nor single-thread performance. This paper addresses the design of the fetch unit for a high-performance, scalable, simultaneous multithreaded processor. We present the detailed microarchitecture of a clustered and reconfigurable fetch unit based on an existing single-thread fetch unit. In order to minimize the occurrence of fetch hazards, the fetch unit dynamically adapts to the available thread-level parallelism and to the fetch characteristics of the active threads, working as a single shared unit or as two separate clusters. It combines static and dynamic methods in a complexity-efficient way. The design is supported by a simulation- based analysis of different instruction cache and branch target buffer configurations on the context of a multithreaded execution workload. Average reductions on the miss rates between 30% and 60% and peak reductions greater than 200% are obtained.Facultad de Informátic
Recommended from our members
Multithreading for high performance finance risk analysis
This thesis was submitted for the degree of Master of Philosophy and awarded by Brunel UniversityWith the increasing of risks in the financial market, the models of risk management are developing quickly. The standard of the accuracy and effect of the models is improved continuously. This thesis investigates Value at Risk (VaR) which is an important method for measuring the market risk. It reviews the three methods which can be used to quantify VaR. These methods are parameter method, historical data processing method and Monte Carlo simulation method.
Monte Carlo simulation has been widely employed for finance risk analysis. One challenge in Monte Carlo simulation is its computation complexity. For this purpose, this thesis researches into multithreading technique for high performance
Experiences with porting and modelling wavefront algorithms on many-core architectures
We are currently investigating the viability of many-core architectures for the acceleration of wavefront applications and this report focuses on graphics processing units (GPUs) in particular. To this end, we have implemented NASA’s LU benchmark – a real world production-grade application – on GPUs employing NVIDIA’s Compute Unified Device Architecture (CUDA).
This GPU implementation of the benchmark has been used to investigate the performance of a selection of GPUs, ranging from workstation-grade commodity GPUs to the HPC "Tesla” and "Fermi” GPUs. We have also compared the performance of the GPU solution at scale to that of traditional high perfor- mance computing (HPC) clusters based on a range of multi- core CPUs from a number of major vendors, including Intel (Nehalem), AMD (Opteron) and IBM (PowerPC).
In previous work we have developed a predictive “plug-and-play” performance model of this class of application running on such clusters, in which CPUs communicate via the Message Passing Interface (MPI). By extending this model to also capture the performance behaviour of GPUs, we are able to: (1) comment on the effects that architectural changes will have on the performance of single-GPU solutions, and (2) make projections regarding the performance of multi-GPU solutions at larger scale
CellSim: a validated modular heterogeneous multiprocessor simulator
As the number of transistors on a chip continues increasing the power consumption has become the most important constraint in processors design. Therefore, to increase performance, computer architects have decided to use multiprocessors. Moreover, recent studies have shown that heterogeneous chip multiprocessors have greater potential than homogeneous ones. We have built a modular simulator for heterogeneous multiprocessors that can be configure to model IBM's Cell Processor. The simulator has been validated against the real
machine to be used as a research tool.Peer ReviewedPostprint (published version
- …