Search CORE

1,145 research outputs found

New hardware support transactional memory and parallel debugging in multicore processors

Author: Orosa Nogueira Lois
Publication venue
Publication date: 01/01/2013
Field of study

This thesis contributes to the area of hardware support for parallel programming by introducing new hardware elements in multicore processors, with the aim of improving the performance and optimize new tools, abstractions and applications related with parallel programming, such as transactional memory and data race detectors. Specifically, we configure a hardware transactional memory system with signatures as part of the hardware support, and we develop a new hardware filter for reducing the signature size. We also develop the first hardware asymmetric data race detector (which is also able to tolerate them), based also in hardware signatures. Finally, we propose a new module of hardware signatures that solves some of the problems that we found in the previous tools related with the lack of flexibility in hardware signatures

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Repositorio Institucional da Universidade de Santiago de Compostela

HeTM: Transactional Memory for Heterogeneous Systems

Author: Castro Daniel
Ilic Aleksandar
Khan Amin M.
Romano Paolo
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 02/09/2019
Field of study

Modern heterogeneous computing architectures, which couple multi-core CPUs with discrete many-core GPUs (or other specialized hardware accelerators), enable unprecedented peak performance and energy efficiency levels. Unfortunately, though, developing applications that can take full advantage of the potential of heterogeneous systems is a notoriously hard task. This work takes a step towards reducing the complexity of programming heterogeneous systems by introducing the abstraction of Heterogeneous Transactional Memory (HeTM). HeTM provides programmers with the illusion of a single memory region, shared among the CPUs and the (discrete) GPU(s) of a heterogeneous system, with support for atomic transactions. Besides introducing the abstract semantics and programming model of HeTM, we present the design and evaluation of a concrete implementation of the proposed abstraction, which we named Speculative HeTM (SHeTM). SHeTM makes use of a novel design that leverages on speculative techniques and aims at hiding the inherently large communication latency between CPUs and discrete GPUs and at minimizing inter-device synchronization overhead. SHeTM is based on a modular and extensible design that allows for easily integrating alternative TM implementations on the CPU's and GPU's sides, which allows the flexibility to adopt, on either side, the TM implementation (e.g., in hardware or software) that best fits the applications' workload and the architectural characteristics of the processing unit. We demonstrate the efficiency of the SHeTM via an extensive quantitative study based both on synthetic benchmarks and on a porting of a popular object caching system.Comment: The current work was accepted in the 28th International Conference on Parallel Architectures and Compilation Techniques (PACT'19

arXiv.org e-Print Archive

Crossref

Optimization of thread affinity and memory affinity for remote core locking synchronization in multithreaded programs for multicore computer systems

Author: Alexey Paznikov
Publication venue: 'JVE International Ltd.'
Publication date: 30/06/2017
Field of study

This paper proposes the algorithms for optimization of Remote Core Locking (RCL) synchronization method in multithreaded programs. The algorithm of initialization of RCL-locks and the algorithms for threads affinity optimization are developed. The algorithms consider the structures of hierarchical computer systems and non-uniform memory access (NUMA) to minimize execution time of RCL-programs. The experimental results on multi-core computer systems represented in the paper shows the reduction of RCL-programs execution time

Measurement, Modeling, and Characterization for Power-Aware Computing

Author: Goel Bhavishya
Publication venue
Publication date: 01/01/2014
Field of study

Society’s increasing dependence on information technology has resulted in the deployment of vast compute resources. The energy costs of operating these resources coupled with environmental concerns have made power-aware computingone of the primary challenges for the IT sector. Making energy-efficient computing a rule rather than an exception requires that researchers and system designers use the right set of techniques and tools. These involve measuring,modeling, and characterizing the energy consumption of computers at varying degrees of granularity.In this thesis, we present techniques to measure power consumption of computer systems at various levels. We compare them for accuracy and sensitivityand discuss their effectiveness. We test Intel’s hardware power model for estimation accuracy and show that it is fairly accurate for estimating energy consumption when sampled at the temporal granularity of more than tens ofmilliseconds.We present a methodology to estimate per-core processor power consumption using performance counter and temperature-based power modeling and validate it across multiple platforms. We show our model exhibits negligible computationoverhead, and the median estimation errors ranges from 0.3% to 10.1% for applications from SPEC2006, SPEC-OMP and NAS benchmarks. We test the usefulness of the model in a meta-scheduler to enforce power constraint on a system.Finally, we perform a detailed performance and energy characterization of Intel’s Restricted Transactional Memory (RTM). We use TinySTM software transactional memory (STM) system to benchmark RTM’s performance against competing STM alternatives. We use microbenchmarks and STAMP benchmarksuite to compare RTM versus STM performance and energy behavior. We quantify the RTM hardware limitations that affect its success rate. We show that RTM performs better than TinySTM when working-set fits inside the cache and that RTM is better at handling high contention workloads

Chalmers Research

Trusted Computing and Secure Virtualization in Cloud Computing

Author: Paladi Nicolae
Publication venue
Publication date: 01/01/2012
Field of study

Large-scale deployment and use of cloud computing in industry is accompanied and in the same time hampered by concerns regarding protection of data handled by cloud computing providers. One of the consequences of moving data processing and storage off company premises is that organizations have less control over their infrastructure. As a result, cloud service (CS) clients must trust that the CS provider is able to protect their data and infrastructure from both external and internal attacks. Currently however, such trust can only rely on organizational processes declared by the CS provider and can not be remotely verified and validated by an external party. Enabling the CS client to verify the integrity of the host where the virtual machine instance will run, as well as to ensure that the virtual machine image has not been tampered with, are some steps towards building trust in the CS provider. Having the tools to perform such verifications prior to the launch of the VM instance allows the CS clients to decide in runtime whether certain data should be stored- or calculations should be made on the VM instance offered by the CS provider. This thesis combines three components -- trusted computing, virtualization technology and cloud computing platforms -- to address issues of trust and security in public cloud computing environments. Of the three components, virtualization technology has had the longest evolution and is a cornerstone for the realization of cloud computing. Trusted computing is a recent industry initiative that aims to implement the root of trust in a hardware component, the trusted platform module. The initiative has been formalized in a set of specifications and is currently at version 1.2. Cloud computing platforms pool virtualized computing, storage and network resources in order to serve a large number of customers customers that use a multi-tenant multiplexing model to offer on-demand self-service over broad network. Open source cloud computing platforms are, similar to trusted computing, a fairly recent technology in active development. The issue of trust in public cloud environments is addressed by examining the state of the art within cloud computing security and subsequently addressing the issues of establishing trust in the launch of a generic virtual machine in a public cloud environment. As a result, the thesis proposes a trusted launch protocol that allows CS clients to verify and ensure the integrity of the VM instance at launch time, as well as the integrity of the host where the VM instance is launched. The protocol relies on the use of Trusted Platform Module (TPM) for key generation and data protection. The TPM also plays an essential part in the integrity attestation of the VM instance host. Along with a theoretical, platform-agnostic protocol, the thesis also describes a detailed implementation design of the protocol using the OpenStack cloud computing platform. In order the verify the implementability of the proposed protocol, a prototype implementation has built using a distributed deployment of OpenStack. While the protocol covers only the trusted launch procedure using generic virtual machine images, it presents a step aimed to contribute towards the creation of a secure and trusted public cloud computing environment

RISE – Research Institutes of Sweden

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Swedish Institute of Computer Science Publications Database

Luleå University of Technology Publications

Software institutes' Online Digital Archive

Scalability Analysis of Signatures in Transactional Memory Systems

Author: Eladio Gutiérrez
Quislant Ricardo
Óscar Plata
Publication venue
Publication date: 29/10/2014
Field of study

Signatures have been proposed in transactional memory systems to represent read and write sets and to decouple transaction conflict detection from private caches or to accelerate it. Generally, signatures are implemented as Bloom filters that allow unbounded read/write sets to be summarized in bounded space at the cost of false conflict detection. It is known that this behavior has great impact in parallel performance. In this work, a scalability study of state-of-the-art signature designs is presented, for different orthogonal transactional characteristics, including contention, length, concurrency and spatial locality. This study was accomplished using the Stanford EigenBench benchmark. This benchmark was modified to support spatial locality analysis using a Zipf address distribution. Experimental evaluation on a hardware transactional memory simulator shows the impact of those parameters in the behavior of state-of-the-art signatures.Universidad de Málaga. Campus de Excelencia Internacional Andalucía Tech

Crossref

Repositorio Institucional Universidad de Málaga

Recommended from our members

Software lock elision for x86 machine code

Author: Roy Amitabha
Publication venue: University of Cambridge
Publication date: 12/07/2011
Field of study

More than a decade after becoming a topic of intense research there is no transactional memory hardware nor any examples of software transactional memory use outside the research community. Using software transactional memory in large pieces of software needs copious source code annotations and often means that standard compilers and debuggers can no longer be used. At the same time, overheads associated with software transactional memory fail to motivate programmers to expend the needed effort to use software transactional memory. The only way around the overheads in the case of general unmanaged code is the anticipated availability of hardware support. On the other hand, architects are unwilling to devote power and area budgets in mainstream microprocessors to hardware transactional memory, pointing to transactional memory being a "niche" programming construct. A deadlock has thus ensued that is blocking transactional memory use and experimentation in the mainstream. This dissertation covers the design and construction of a software transactional memory runtime system called SLE_x86 that can potentially break this deadlock by decoupling transactional memory from programs using it. Unlike most other STM designs, the core design principle is transparency rather than performance. SLE_x86 operates at the level of x86 machine code, thereby becoming immediately applicable to binaries for the popular x86 architecture. The only requirement is that the binary synchronise using known locking constructs or calls such as those in Pthreads or OpenMP libraries. SLE_x86 provides speculative lock elision (SLE) entirely in software, executing critical sections in the binary using transactional memory. Optionally, the critical sections can also be executed without using transactions by acquiring the protecting lock. The dissertation makes a careful analysis of the impact on performance due to the demands of the x86 memory consistency model and the need to transparently instrument x86 machine code. It shows that both of these problems can be overcome to reach a reasonable level of performance, where transparent software transactional memory can perform better than a lock. SLE_x86 can ensure that programs are ready for transactional memory in any form, without being explicitly written for it

Apollo (Cambridge)