Search CORE

3,971 research outputs found

Clock gate on abort: Towards energy-efficient hardware transactional memory

Author: Cristal Kestelman Adrián
Roy Sourav
Sanyal Sutirtha
Unsal Osman Sabri
Valero Cortés Mateo
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2009
Field of study

Transactional Memory (TM) is an emerging technology which promises to make parallel programming easier compared to earlier lock based approaches. However, as with any form of speculation, Transactional Memory too wastes a considerable amount of energy when the speculation goes wrong and transaction aborts. For Transactional Memory this wastage will typically be quite high because programmer will often mark a large portion of the code to be executed transactionally. We are proposing to turn-off a processor dynamically by gating all its clocks, whenever any transaction running in it is aborted. We have described a novel protocol which can be used in the Scalable-TCC like Hardware Transactional Memory systems. Also in the protocol we are proposing a gating-aware contention management policy to set the duration of the clock gating period precisely so that both performance and energy can be improved. With our proposal we got an average 19% savings in the total consumed energy and even an average speed-up of 4%.Peer ReviewedPostprint (published version

The moderating influence of device characteristics and usage on user acceptance of smart mobile devices

Author: Ally Mustafa
Gardiner Michael
Publication venue: Australasian Conference on Information Systems (ACIS)
Publication date: 01/12/2012
Field of study

This study seeks to develop a comprehensive model of consumer acceptance in the context of Smart Mobile Device (SMDs). This paper proposes an adaptation of the Technology Acceptance Model (TAM) and the Unified Theory of Acceptance and Use of Technology (UTAUT2) model that can be employed to explain and predict the acceptance of SMDs. Also included in the model are a number of external and new moderating variables that can be used to explain user intentions and subsequent usage behaviour. The model holds that Activity-based Usage and Device Characteristics are posited to moderate the impact of the constructs empirically validated in the UTAUT2 model. Through an important cluster of antecedents the proposed model aims to enhance our understanding of consumer motivations for using SMDs and aid efforts to promote the adoption and diffusion of these devices

Tracing a paradigm for externalization: Avatars and the GPII Nexus

Author: Basman Antranig
Clark Colin B.D.
Publication venue: 'American College of Medical Physics (ACMP)'
Publication date: 01/01/2017
Field of study

We will situate the concept of an avatar (a working simulacrum of part of a system separated from it in space or time) with respect to traditional concepts of programming language and systems design. Whilst much theory and practice argues in favour of insulation (the creation of architectural boundaries prohibiting the leakage of information) we will find that many successful systems take a diametrically opposed approach. We name this family of systems as those based on externalised state transfer. Rather than hiding implementation details behind APIs, object interfaces or similar, these systems actively advertise their internal structure and its coordinates via data and metadata. Examples of these systems include RESTful web applications, MIDI devices, and the DWARF debugging file format. We discuss such systems and how we can purposefully design new systems embodying such virtues in a more distilled form

A Memory Scheduling Infrastructure for Multi-Core Systems with Re-Programmable Logic

Author: Hoornaert Denis
Mancuso Renato
Roozkhosh Shahin
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 33rd Euromicro Conference on Real-Time Systems (ECRTS 2021)
Publication date: 01/01/2021
Field of study

The sharp increase in demand for performance has prompted an explosion in the complexity of modern multi-core embedded systems. This has lead to unprecedented temporal unpredictability concerns in Cyber-Physical Systems (CPS). On-chip integration of programmable logic (PL) alongside a conventional Processing System (PS) in modern Systems-on-Chip (SoC) establishes a genuine compromise between specialization, performance, and reconfigurability. In addition to typical use-cases, it has been shown that the PL can be used to observe, manipulate, and ultimately manage memory traffic generated by a traditional multi-core processor. This paper explores the possibility of PL-aided memory scheduling by proposing a Scheduler In-the-Middle (SchIM). We demonstrate that the SchIM enables transaction-level control over the main memory traffic generated by a set of embedded cores. Focusing on extensibility and reconfigurability, we put forward a SchIM design covering two main objectives. First, to provide a safe playground to test innovative memory scheduling mechanisms; and second, to establish a transition path from software-based memory regulation to provably correct hardware-enforced memory scheduling. We evaluate our design through a full-system implementation on a commercial PS-PL platform using synthetic and real-world benchmarks

Boston University Institutional Repository (OpenBU)

Dagstuhl Research Online Publication Server

Software caching techniques and hardware optimizations for on-chip local memories

Author: Vujic Nikola
Publication venue: Universitat Politècnica de Catalunya
Publication date: 01/01/2012
Field of study

Despite the fact that the most viable L1 memories in processors are caches, on-chip local memories have been a great topic of consideration lately. Local memories are an interesting design option due to their many benefits: less area occupancy, reduced energy consumption and fast and constant access time. These benefits are especially interesting for the design of modern multicore processors since power and latency are important assets in computer architecture today. Also, local memories do not generate coherency traffic which is important for the scalability of the multicore systems. Unfortunately, local memories have not been well accepted in modern processors yet, mainly due to their poor programmability. Systems with on-chip local memories do not have hardware support for transparent data transfers between local and global memories, and thus ease of programming is one of the main impediments for the broad acceptance of those systems. This thesis addresses software and hardware optimizations regarding the programmability, and the usage of the on-chip local memories in the context of both single-core and multicore systems. Software optimizations are related to the software caching techniques. Software cache is a robust approach to provide the user with a transparent view of the memory architecture; but this software approach can suffer from poor performance. In this thesis, we start optimizing traditional software cache by proposing a hierarchical, hybrid software-cache architecture. Afterwards, we develop few optimizations in order to speedup our hybrid software cache as much as possible. As the result of the software optimizations we obtain that our hybrid software cache performs from 4 to 10 times faster than traditional software cache on a set of NAS parallel benchmarks. We do not stop with software caching. We cover some other aspects of the architectures with on-chip local memories, such as the quality of the generated code and its correspondence with the quality of the buffer management in local memories, in order to improve performance of these architectures. Therefore, we run our research till we reach the limit in software and start proposing optimizations on the hardware level. Two hardware proposals are presented in this thesis. One is about relaxing alignment constraints imposed in the architectures with on-chip local memories and the other proposal is about accelerating the management of local memories by providing hardware support for the majority of actions performed in our software cache.Malgrat les memòries cau encara son el component basic pel disseny del subsistema de memòria, les memòries locals han esdevingut una alternativa degut a les seves característiques pel que fa a l’ocupació d’àrea, el seu consum energètic i el seu rendiment amb un temps d’accés ràpid i constant. Aquestes característiques son d’especial interès quan les properes arquitectures multi-nucli estan limitades pel consum de potencia i la latència del subsistema de memòria.Les memòries locals pateixen de limitacions respecte la complexitat en la seva programació, fet que dificulta la seva introducció en arquitectures multi-nucli, tot i els avantatges esmentats anteriorment. Aquesta tesi presenta un seguit de solucions basades en programari i maquinari específicament dissenyat per resoldre aquestes limitacions.Les optimitzacions del programari estan basades amb tècniques d'emmagatzematge de memòria cau suportades per llibreries especifiques. La memòria cau per programari és un sòlid mètode per proporcionar a l'usuari una visió transparent de l'arquitectura, però aquest enfocament pot patir d'un rendiment deficient. En aquesta tesi, es proposa una estructura jeràrquica i híbrida. Posteriorment, desenvolupem optimitzacions per tal d'accelerar l’execució del programari que suporta el disseny de la memòria cau. Com a resultat de les optimitzacions realitzades, obtenim que el nostre disseny híbrid es comporta de 4 a 10 vegades més ràpid que una implementació tradicional de memòria cau sobre un conjunt d’aplicacions de referencia, com son els “NAS parallel benchmarks”.El treball de tesi inclou altres aspectes de les arquitectures amb memòries locals, com ara la qualitat del codi generat i la seva correspondència amb la qualitat de la gestió de memòria intermèdia en les memòries locals, per tal de millorar el rendiment d'aquestes arquitectures. La tesi desenvolupa propostes basades estrictament en el disseny de nou maquinari per tal de millorar el rendiment de les memòries locals quan ja no es possible realitzar mes optimitzacions en el programari. En particular, la tesi presenta dues propostes de maquinari: una relaxa les restriccions imposades per les memòries locals respecte l’alineament de dades, l’altra introdueix maquinari específic per accelerar les operacions mes usuals sobre les memòries locals

Automatic Software Repair: a Bibliography

Author: Monperrus Martin
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 06/01/2016
Field of study

This article presents a survey on automatic software repair. Automatic software repair consists of automatically finding a solution to software bugs without human intervention. This article considers all kinds of repairs. First, it discusses behavioral repair where test suites, contracts, models, and crashing inputs are taken as oracle. Second, it discusses state repair, also known as runtime repair or runtime recovery, with techniques such as checkpoint and restart, reconfiguration, and invariant restoration. The uniqueness of this article is that it spans the research communities that contribute to this body of knowledge: software engineering, dependability, operating systems, programming languages, and security. It provides a novel and structured overview of the diversity of bug oracles and repair operators used in the literature

arXiv.org e-Print Archive

HAL - Lille 3

CiteSeerX

INRIA a CCSD electronic archive server

Hal-Diderot

Deploying active objects onto multicore

Author: Nobakht B. (Behrooz)
Publication venue
Publication date: 01/05/2011
Field of study

The performance of a program on multicore platform crucially depends on the scheduling of its tasks; existing high-level programming languages, however, offer limited control over scheduling. In this thesis, we develop Cacoj as an extensible tool set to transform Creol’s active concurrent objects into Java to be deployed on multicore through standard Java Runtime Environment. The concurrent object paradigm is a promising trend for multicore programming because each object may conceptually encapsulate a processor. Cacoj introduces a higher-level abstraction of concurrency API and a Creol compiler in which the translated object in Java takes control over the scheduling of the incoming messages through a per-object approach in contrast with current mainstream trend. Cacoj brings about the required grounds to extend Creol syntax to additionally specify different levels of priority and integrate them into the notion of active concurrent objects