Search CORE

6 research outputs found

Breaking serialization in lock-free multicore synchronization

Author: Gangwani Tanmay
Publication venue
Publication date: 01/08/2016
Field of study

In multicores, performance-critical synchronization is increasingly performed in a lock-free manner using atomic instructions such as CAS or LL/SC. However, when many processors synchronize on the same variable, performance can still degrade significantly. Contending writes get serialized, creating a non-scalable condition. Past proposals that build hardware queues of synchronizing processors do not fundamentally solve this problem. At best, they help to efficiently serialize the contending writes. We propose a novel architecture that breaks the serialization of hardware queues and enables the queued processors to perform lock-free synchronization in parallel. The architecture, called Caspar, is able to (1) execute the CASes in the queued-up processors in parallel through eager forwarding of expected values, and (2) validate the CASes in parallel and dequeue groups of processors at a time. The result is highly scalable synchronization. We evaluate Caspar with simulations of a 64-core chip. Compared to existing proposals with hardware queues, Caspar improves the throughput of kernels by 32% on average and reduces the execution time of the sections considered in lock-free versions of applications by 47% on average. This makes these sections 2.5x faster than in the original applications

Illinois Digital Environment for Access to Learning and Scholarship Repository

ILP and TLP in Shared Memory Applications: A Limit Study

Author: Fatehi Ehsan
Publication venue
Publication date: 21/09/2015
Field of study

The work in this dissertation explores the limits of Chip-multiprocessors (CMPs) with respect to shared-memory, multi-threaded benchmarks, which will help aid in identifying microarchitectural bottlenecks. This, in turn, will lead to more efficient CMP design. In the first part we introduce DotSim, a trace-driven toolkit designed to explore the limits of instruction and thread-level scaling and identify microarchitectural bottlenecks in multi-threaded applications. DotSim constructs an instruction-level Data Flow Graph (DFG) from each thread in multi-threaded applications, adjusting for inter-thread dependencies. The DFGs dynamically change depending on the microarchitectural constraints applied. Exploiting these DFGs allows for the easy extraction of the performance upper bound. We perform a case study on modeling the upper-bound performance limits of a processor microarchitecture modeled off a AMD Opteron. In the second part, we conduct a limit study simultaneously analyzing the two dominant forms of parallelism exploited by modern computer architectures: Instruction Level Parallelism (ILP) and Thread Level Parallelism (TLP). This study gives insight into the upper bounds of performance that future architectures can achieve. Furthermore, it identifies the bottlenecks of emerging workloads. To the best of our knowledge, our work is the first study that combines the two forms of parallelism into one study with modern applications. We evaluate the PARSEC multithreaded benchmark suite using DotSim. We make several contributions describing the high-level behavior of next-generation applications. For example, we show that these applications contain up to a factor of 929X more ILP than what is currently being extracted from real machines. We then show the effects of breaking the application into increasing numbers of threads (exploiting TLP), instruction window size, realistic branch prediction, realistic memory latency, and thread dependencies on exploitable ILP. Our examination shows that theses benchmarks differ vastly from one another. As a result, we expect that no single, homogeneous, micro-architecture will work optimally for all, arguing for reconfigurable, heterogeneous designs. In the third part of this thesis, we use our novel simulator DotSim to study the benefits of prefetching shared memory within critical sections. In this chapter we calculate the upper bound of performance under our given constraints. Our intent is to provide motivation for new techniques to exploit the potential benefits of reducing latency of shared memory among threads. We conduct an idealized workload characterization study focusing on the data that is truly shared among threads, using a simplified memory model. We explore the degree of shared memory criticality, and characterize the benefits of being able to use latency reducing techniques to reduce execution time and increase ILP. We find that on average true sharing among benchmarks is quite low compared to overall memory accesses on the critical path and overall program. We also find that truly shared memory between threads does not affect the critical path for the majority of benchmarks, and when it does the impact is less than 1%. Therefore, we conclude that it is not worth exploring latency reducing techniques of truly shared memory within critical sections

Texas A&M Repository

The role of the basal ganglia in the selection and control of sequential action

Author: Britain Alfred Alexander
Publication venue
Publication date: 01/08/1996
Field of study

Bangor University Research Portal

Proceedings of the 11th international Conference on Cognitive Modeling : ICCM 2012

Author
Publication venue: Berlin
Publication date: 13/04/2012
Field of study

The International Conference on Cognitive Modeling (ICCM) is the premier conference for research on computational models and computation-based theories of human behavior. ICCM is a forum for presenting, discussing, and evaluating the complete spectrum of cognitive modeling approaches, including connectionism, symbolic modeling, dynamical systems, Bayesian modeling, and cognitive architectures. ICCM includes basic and applied research, across a wide variety of domains, ranging from low-level perception and attention to higher-level problem-solving and learning. Online-Version published by Universitätsverlag der TU Berlin (www.univerlag.tu-berlin.de

DepositOnce

Performance analysis for wireless G (IEEE 802.11G) and wireless N (IEEE 802.11N) in outdoor environment

Author: Abal Abas Zuraida
Abd Wahab Mohd Helmy
Fadilah Suri Iryani
Hashim Wan Nur Wahidah
Shibghatullah Abdul Samad
Publication venue
Publication date: 24/06/2014
Field of study

This paper described an analysis the different capabilities and limitation of both IEEE technologies that has been utilized for data transmission directed to mobile device. In this work, we have compared an IEEE 802.11/g/n outdoor environment to know what technology is better. The comparison consider on coverage area (mobility), throughput and measuring the interferences. The work presented here is to help the researchers to select the best technology depending of their deploying case, and investigate the best variant for outdoor. The tool used is Iperf software which is to measure the data transmission performance of IEEE 802.11n and IEEE 802.11g

Universiti Teknikal Malaysia Melaka (UTeM) Repository

Performance Analysis For Wireless G (IEEE 802.11 G) And Wireless N (IEEE 802.11 N) In Outdoor Environment

Author: - Suzi Iryanti Fadilah
Abal Abas Zuraida
Abd Wahab Mohd Helmy
Hashim Wan Nur Wahidah
Shibghatullah Abdul Samad
Publication venue
Publication date: 01/01/2014
Field of study

This paper described an analysis the different capabilities and limitation of both IEEE technologies that has been utilized for data transmission directed to mobile device. In this work, we have compared an IEEE 802.11/g/n outdoor environment to know what technology is better. the comparison consider on coverage area (mobility), through put and measuring the interferences. The work presented here is to help the researchers to select the best technology depending of their deploying case, and investigate the best variant for outdoor. The tool used is Iperf software which is to measure the data transmission performance of IEEE 802.11n and IEEE 802.11g

Universiti Teknikal Malaysia Melaka (UTeM) Repository