55 research outputs found

    Recognition of objects to grasp and Neuro-Prosthesis control

    Get PDF

    Energy Aware Runtime Systems for Elastic Stream Processing Platforms

    Get PDF
    Following an invariant growth in the required computational performance of processors, the multicore revolution started around 20 years ago. This revolution was mainly an answer to power dissipation constraints restricting the increase of clock frequency in single-core processors. The multicore revolution not only brought in the challenge of parallel programming, i.e. being able to develop software exploiting the entire capabilities of manycore architectures, but also the challenge of programming heterogeneous platforms. The question of “on which processing element to map a specific computational unit?”, is well known in the embedded community. With the introduction of general-purpose graphics processing units (GPGPUs), digital signal processors (DSPs) along with many-core processors on different system-on-chip platforms, heterogeneous parallel platforms are nowadays widespread over several domains, from consumer devices to media processing platforms for telecom operators. Finding mapping together with a suitable hardware architecture is a process called design-space exploration. This process is very challenging in heterogeneous many-core architectures, which promise to offer benefits in terms of energy efficiency. The main problem is the exponential explosion of space exploration. With the recent trend of increasing levels of heterogeneity in the chip, selecting the parameters to take into account when mapping software to hardware is still an open research topic in the embedded area. For example, the current Linux scheduler has poor performance when mapping tasks to computing elements available in hardware. The only metric considered is CPU workload, which as was shown in recent work does not match true performance demands from the applications. Doing so may produce an incorrect allocation of resources, resulting in a waste of energy. The origin of this research work comes from the observation that these approaches do not provide full support for the dynamic behavior of stream processing applications, especially if these behaviors are established only at runtime. This research will contribute to the general goal of developing energy-efficient solutions to design streaming applications on heterogeneous and parallel hardware platforms. Streaming applications are nowadays widely spread in the software domain. Their distinctive characiteristic is the retrieving of multiple streams of data and the need to process them in real time. The proposed work will develop new approaches to address the challenging problem of efficient runtime coordination of dynamic applications, focusing on energy and performance management.Efter en oförĂ€nderlig tillvĂ€xt i prestandakrav hos processorer, började den flerkĂ€rniga processor-revolutionen för ungefĂ€r 20 Ă„r sedan. Denna revolution skedde till största del som en lösning till begrĂ€nsningar i energieffekten allt eftersom klockfrekvensen kontinuerligt höjdes i en-kĂ€rniga processorer. Den flerkĂ€rniga processor-revolutionen medförde inte enbart utmaningen gĂ€llande parallellprogrammering, m.a.o. förmĂ„gan att utveckla mjukvara som anvĂ€nder sig av alla delelement i de flerkĂ€rniga processorerna, men ocksĂ„ utmaningen med programmering av heterogena plattformar. FrĂ„gestĂ€llningen ”pĂ„ vilken processorelement skall en viss berĂ€kning utföras?” Ă€r vĂ€l kĂ€nt inom ramen för inbyggda datorsystem. Efter introduktionen av grafikprocessorer för allmĂ€nna berĂ€kningar (GPGPU), signalprocesserings-processorer (DSP) samt flerkĂ€rniga processorer pĂ„ olika system-on-chip plattformar, Ă€r heterogena parallella plattformar idag omfattande inom mĂ„nga domĂ€ner, frĂ„n konsumtionsartiklar till mediaprocesseringsplattformar för telekommunikationsoperatörer. Processen att placera berĂ€kningarna pĂ„ en passande hĂ„rdvaruplattform kallas för utforskning av en designrymd (design-space exploration). Denna process Ă€r mycket utmanande för heterogena flerkĂ€rniga arkitekturer, och kan medföra fördelar nĂ€r det gĂ€ller energieffektivitet. Det största problemet Ă€r att de olika valmöjligheterna i designrymden kan vĂ€xa exponentiellt. Enligt den nuvarande trenden som förespĂ„r ökad heterogeniska aspekter i processorerna Ă€r utmaningen att hitta den mest passande placeringen av berĂ€kningarna pĂ„ hĂ„rdvaran Ă€nnu en forskningsfrĂ„ga inom ramen för inbyggda datorsystem. Till exempel, den nuvarande schemalĂ€ggaren i Linux operativsystemet Ă€r inkapabel att hitta en effektiv placering av berĂ€kningarna pĂ„ den underliggande hĂ„rdvaran. Det enda mĂ€tsĂ€ttet som anvĂ€nds Ă€r processorns belastning vilket, som visats i tidigare forskning, inte motsvarar den verkliga prestandan i applikationen. AnvĂ€ndning av detta mĂ€tsĂ€tt vid resursallokering resulterar i slöseri med energi. Denna forskning hĂ€rstammar frĂ„n observationerna att dessa tillvĂ€gagĂ„ngssĂ€tt inte stöder det dynamiska beteendet hos ström-processeringsapplikationer (stream processing applications), speciellt om beteendena bara etableras vid körtid. Denna forskning kontribuerar till det allmĂ€nna mĂ„let att utveckla energieffektiva lösningar för ström-applikationer (streaming applications) pĂ„ heterogena flerkĂ€rniga hĂ„rdvaruplattformar. Ström-applikationer Ă€r numera mycket vanliga i mjukvarudomĂ€n. Deras distinkta karaktĂ€r Ă€r inlĂ€sning av flertalet dataströmmar, och behov av att processera dem i realtid. Arbetet i denna forskning understöder utvecklingen av nya sĂ€tt för att lösa det utmanade problemet att effektivt koordinera dynamiska applikationer i realtid och fokus pĂ„ energi- och prestandahantering

    Enabling Independent Communication for FPGAs in High Performance Computing

    Get PDF

    Coordinated management of the processor and memory for optimizing energy efficiency

    Get PDF
    Energy efficiency is a key design goal for future computing systems. With diverse components interacting with each other on the System-on-Chip (SoC), dynamically managing performance, energy and temperature is a challenge in 2D architectures and more so in a 3D stacked environment. Temperature has emerged as the parameter of primary concern. Heuristics based schemes have been employed so far to address these issues. Looking ahead into the future, complex multiphysics interactions between performance, energy and temperature reveal the limitations of such approaches. Therefore in this thesis, first, a comprehensive characterization of existing methods is carried out to identify causes for their inefficiency. Managing different components in an independent and isolated fashion using heuristics is seen to be the primary drawback. Following this, techniques based on feedback control theory to optimize the energy efficiency of the processor and memory in a coordinated fashion are developed. They are evaluated on a real physical system and a cycle-level simulator demonstrating significant improvements over prior schemes. The two main messages of this thesis are, (i) coordination between multiple components is paramount for next generation computing systems and (ii) temperature ought to be treated as a resource like compute or memory cycles.Ph.D

    Build framework and runtime abstraction for partial reconfiguration on FPGA SoCs

    Get PDF
    Growth in edge computing has increased the requirement for edge systems to process larger volumes of real-time data, such as with image processing and machine learning; which are increasingly demanding of computing resources. Offloading tasks to the cloud provides some relief but is network dependant, high latency and expensive. Alternative architectures such as GPUs provide higher performance acceleration for this type of data processing but trade processing performance for an increase in power consumption. Another option is the Field Programmable Gate Array; a flexible matrix of logic that can be configured by a designer to provide a highly optimised computation path for incoming data. There are drawbacks; the FPGA design process is complex, the domain is dissimilar to software and the tools require bespoke expertise. A designer must manage the hardware to software paradigm introduced when tightly-coupled with general purpose processor. Advanced features, such as the ability to partially reconfigure (PR) specific regions of the FPGA, further increase this complexity. This thesis presents theory and demonstration of custom frameworks and tools for increasing abstraction and simplifying control over PR applications. We present mechanisms for networked PR; a mechanism for bypassing the traditional software networking stack to trigger PR with reduced latency and increased determinism. We developed a build framework for automating the end-to-end PR design process for Linux based systems as well as an abstracted runtime for managing the resulting applications. Finally, we take expand on this work and present a high level abstraction for PR on cyber physical systems, with a demonstration using the Robot Operating System. This work is released as open source contributions, designed to enable future PR research

    Principled Elimination of Microarchitectural Timing Channels through Operating-System Enforced Time Protection

    Full text link
    Microarchitectural timing channels exploit resource contentions on a shared hardware platform to cause information leakage through timing variance. These channels threaten system security by providing unauthorised information flow in violation of the system’s security policy. Present operating systems lack the means for systematic prevention of such channels. To address this problem, we propose time protection as an operating system (OS) abstraction, which provides mandatory temporal isolation analogous to the spatial isolation provided by the established memory protection abstraction. In order to fully understand microarchitectural timing channels, we first study all published microarchitectural timing attacks, their countermeasures and analyse the underlying causes. Then we define two application scenarios, a confinement scenario and a cloud scenario, which between them represent a large class of security-critical use cases, and aim to develop a solution that supports both. Our study identifies competition for limited hardware resources as the underlying cause for microarchitectural timing channels. From this we derive the requirement that proper isolation requires that all shared resources must be partitioned, either spatially or temporally (time-shared). We then analyse a number of recent processors across two instruction-set architectures (ISAs), x86 and Arm, for their support for such partitioning. We discover that all examined processors exhibit hardware state that cannot be partitioned by architected means, meaning that they all have uncloseable channels.We define the requirements hardware must satisfy for timing-channel prevention, and propose an augmented ISA as a new, security-oriented hardware-software contract. Assuming conforming hardware, we then define the requirements that OS-provided time protection must satisfy. We propose a concrete design of time protection, consisting of a set of policy-free mechanisms, and present an implementation in the seL4 microkernel. We evaluate the efficacy and efficiency of the implementation, and show that it is highly effective at closing timing channels, to the degree supported by the underlying hardware. We also find that the performance overheads are small to negligible. We can conclude that principled prevention of timing channels is possible though mandatory, black-box enforcement by the OS, subject to hardware manufacturers providing mechanisms for scrubbing all shared microarchitectural state

    On the use of intelligent models towards meeting the challenges of the edge mesh

    Get PDF
    Nowadays, we are witnessing the advent of the Internet of Things (IoT) with numerous devices performing interactions between them or with their environment. The huge number of devices leads to huge volumes of data that demand the appropriate processing. The “legacy” approach is to rely on Cloud where increased computational resources can realize any desired processing. However, the need for supporting real-time applications requires a reduced latency in the provision of outcomes. Edge Computing (EC) comes as the “solver” of the latency problem. Various processing activities can be performed at EC nodes having direct connection with IoT devices. A number of challenges should be met before we conclude a fully automated ecosystem where nodes can cooperate or understand their status to efficiently serve applications. In this article, we perform a survey of the relevant research activities towards the vision of Edge Mesh (EM), i.e., a “cover” of intelligence upon the EC. We present the necessary hardware and discuss research outcomes in every aspect of EC/EM nodes functioning. We present technologies and theories adopted for data, tasks, and resource management while discussing how machine learning and optimization can be adopted in the domain

    From software to hardware: making dynamic multi-core processors practical

    Get PDF
    Heterogeneous processors such as Arm’s big.LITTLE have become popular as they offer a choice between performance and energy efficiency. However, the core configurations are fixed at design time which offers a limited amount of adaptation. Dynamic Multi-core Processors (DMPs) bridge the gap between homogeneous and fully reconfigurable systems. They present a new way of improving single-threaded performance by running a thread on groups of cores (compositions) and with the ability of changing the processor topology on the fly, they can better adapt themselves to any task at hand. However, these potential performance improvements are made difficult due to two main challenges: the difficulty of determining a processor configuration that leads to the optimal performance and knowing how to tackle hardware bottlenecks that may impede the performance of composition. This thesis first demonstrates that ahead-of-time thread and core partitioning used to improve the performance of multi-threaded programs can be automated. This is done by analysing static code features to generate a machine-learning model that determines a processor configuration that leads to good performance for an application. The machine learning model is able to predict a configuration that is within 16% of the performance of the best configuration from the search space. This is followed by studying how dynamically adapting the size of a composition at runtime can be used to reduce energy consumption whilst maintaining the same speedup as the fastest static core composition. An analysis of how much energy can be saved by adapting the size of the composition at runtime is conducted, showing that dynamic reconfiguration can reduce energy consumption by 42% on average. A model is then built using linear regression which analyses the features of basic blocks being executed to determine if the current composition should be reconfigured; on average it reduces energy consumption by 37%. Finally the hardware mechanisms that drive core composition are explored. A new fetching mechanism for core composition is proposed, where cores fetch code in a round-robin fashion. The use of value prediction is also motivated, as large core compositions are more susceptible to data-dependencies. This new hardware setup shows massive potential. By exploring a perfect value predictor with perfect branch prediction and the new fetching scheme, the performance of a large core composition can be improved by a factor of up to 3x, and 1.88x on average. Furthermore, this thesis shows that state-of-the-art value prediction with a normal branch predictor still leads to good performance improvements, with an average of 1.33x to up to 2.7x speedup
    • 

    corecore