59 research outputs found

    Implementation of Asymmetric Multiprocessing Support in a Real-Time Operating System

    Get PDF
    The semiconductor industry can no longer afford to rely on decreasing the size of the die, and increasing the frequency of operation to achieve higher performance. An alternative that has been proven to increase performance is multiprocessing. Multiprocessing refers to the concept of running more than one application or task on more than one central processor. Multi-core processors are the main engine of multiprocessing. In asymmetric multiprocessing, each core in a multi-core systems is independent and has its own code that determines its execution. These cores must be able to communicate and synchronize access to resources

    From plasma to beefarm: Design experience of an FPGA-based multicore prototype

    Get PDF
    In this paper, we take a MIPS-based open-source uniprocessor soft core, Plasma, and extend it to obtain the Beefarm infrastructure for FPGA-based multiprocessor emulation, a popular research topic of the last few years both in the FPGA and the computer architecture communities. We discuss various design tradeoffs and we demonstrate superior scalability through experimental results compared to traditional software instruction set simulators. Based on our experience of designing and building a complete FPGA-based multiprocessor emulation system that supports run-time and compiler infrastructure and on the actual executions of our experiments running Software Transactional Memory (STM) benchmarks, we comment on the pros, cons and future trends of using hardware-based emulation for research.Peer ReviewedPostprint (author's final draft

    Exploiting Hardware Abstraction for Parallel Programming Framework: Platform and Multitasking

    Get PDF
    With the help of the parallelism provided by the fine-grained architecture, hardware accelerators on Field Programmable Gate Arrays (FPGAs) can significantly improve the performance of many applications. However, designers are required to have excellent hardware programming skills and unique optimization techniques to explore the potential of FPGA resources fully. Intermediate frameworks above hardware circuits are proposed to improve either performance or productivity by leveraging parallel programming models beyond the multi-core era. In this work, we propose the PolyPC (Polymorphic Parallel Computing) framework, which targets enhancing productivity without losing performance. It helps designers develop parallelized applications and implement them on FPGAs. The PolyPC framework implements a custom hardware platform, on which programs written in an OpenCL-like programming model can launch. Additionally, the PolyPC framework extends vendor-provided tools to provide a complete development environment including intermediate software framework, and automatic system builders. Designers\u27 programs can be either synthesized as hardware processing elements (PEs) or compiled to executable files running on software PEs. Benefiting from nontrivial features of re-loadable PEs, and independent group-level schedulers, the multitasking is enabled for both software and hardware PEs to improve the efficiency of utilizing hardware resources. The PolyPC framework is evaluated regarding performance, area efficiency, and multitasking. The results show a maximum 66 times speedup over a dual-core ARM processor and 1043 times speedup over a high-performance MicroBlaze with 125 times of area efficiency. It delivers a significant improvement in response time to high-priority tasks with the priority-aware scheduling. Overheads of multitasking are evaluated to analyze trade-offs. With the help of the design flow, the OpenCL application programs are converted into executables through the front-end source-to-source transformation and back-end synthesis/compilation to run on PEs, and the framework is generated from users\u27 specifications

    Satisfying hard real-time constraints using COTS components

    Get PDF
    L'utilizzo di componenti COTS (Commercial-Off-The-Shelf) è sempre più comune nella produzione di sistemi embedded real-time. Prodotti commerciali, come periferiche di Input/Output e bus di sistema, vengono utilizzati in sistemi real-time al fine di ridurre i costi, il tempo di produzione, ed aumentare le performance. Sfortunatamente, hardware e sistemi operativi COTS sono progettati principalmente per ottimizzare le performance, ma con poca attenzione verso determinismo, predicibilità ed affidabilità. Per questa ragione, molte problematiche devono ancora essere affrontate prima di un loro impiego in sistemi real-time ad alta criticita'. In questa tesi abbiamo centrato la nostra attenzione su alcune delle piu' importanti sorgenti di impredicibilita' che devono essere rimosse al fine di integrare hardware e software COTS in sistemi hard real-time. Come prima cosa abbiamo sviluppato ASMP-Linux, una variante di Linux che minimizza overhead e latenza del sistema operativo. Successivamente abbiamo progettato ed implementato un nuovo sistema di gestione dell'I/O, basato sul Real-Time Bridge, un nuovo componente hardware che fornisce isolamento temporale sui bus COTS e rimuove le interferenze fra periferiche di I/O. E' stato anche sviluppato un Multi-Flow Real-Time Bridge per assicurare predicibilita' nel caso di periferiche condivise. Infine abbiamo proposto PREM, un nuovo modello di esecuzione per sistemi real-time che elimina le interferenze fra periferiche e CPU, e quelle fra processi ad alta criticita' ed interruzioni hardware. Per ognuna delle nostre soluzioni saranno descritti in dettaglio gli aspetti teorici, l'implementazione dei prototipi ed i risultati sperimentali.Real-time embedded systems are increasingly being built using Commercial Off-The-Shelf (COTS) components such as mass-produced peripherals and buses to reduce costs, time-to-market, and increase performance. Unfortunately, COTS hardware and operating systems are typically designed to optimize average performance, instead of determinism, predictability, and reliability, hence their employment in high criticality real-time systems is still a daunting task. In this thesis, we addressed some of the most important sources of unpredictability which must be removed in order to integrate COTS hardware and software into hard real-time systems. We first developed ASMP-Linux, a variant of Linux, capable of minimizing both operating system overhead and latency. Next, we designed and implemented a new I/O management system, based on real-time bridges, a novel hardware component that provides temporal isolation on the COTS bus and removes the interference among I/O peripherals. A multi-flow real-time bridge has been also developed to address interperipheral interference, allowing predictable device sharing. Finally, we propose PREM, a new execution model for real-time systems which eliminates interference between peripherals and the CPU, as well as interference between a critical task and driver interrupts. For each of our solutions, we will describe in detail theory aspects, as well as prototype implementations and experimental measurements

    Cycle-accurate modeling of multicore processors on FPGAs

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2013.Cataloged from PDF version of thesis.Includes bibliographical references (pages 169-176).We present a novel modeling methodology which enables the generation of a high-performance, cycle-accurate simulator from a cycle-level specification of the target design. We describe Arete, a full-system multicore processor simulator, developed using our modeling methodology. We provide details on Arete's resource-efficient and high-performance implementation on multiple FPGA platforms, and the architectural experiments performed using it. We present clear evidence that the use of simplified models in architectural studies can lead to wrong conclusions. Through two experiments performed using both cycle-accurate and simplified models, we show that on one hand there are substantial quantitative and qualitative differences in results, and on the other, the results match quite well.by Asif Imtiaz Khan.Ph.D

    Programming models for many-core architectures: a co-design approach

    Get PDF
    Common many-core processors contain tens of cores and distributed memory. Compared to a multicore system, which only has a few tightly coupled cores sharing a single bus and memory, several complex problems arise. Notably, many cores require many parallel tasks to fully utilize the cores, and communication happens in a distributed and decentralized way. Therefore, programming such a processor requires the application to exhibit concurrency. In contrast to a single-core application, a concurrent application has to deal with memory state changes with an observable (non-deterministic) intermediate state. The complexity introduced by these problems makes programming a many-core system with a single-core-based programming approach notoriously hard.\ud \ud The central concept of this thesis is that abstractions, which are related to (many-core) programming, are structured in a single platform model. A platform is a layered view of the hardware, a memory model, a concurrency model, a model of computation, and compile-time and run-time tooling. Then, a programming model is a specific view on this platform, which is used by a programmer. In this view, some details can be hidden from the programmer's perspective, some details cannot. For example, an operating system presents an infinite number of parallel virtual execution units to the application whilst it hides details regarding scheduling. On the other hand, a programmer usually has balance workload among threads by hand.\ud \ud This thesis presents modifications to different abstraction layers of a many-core architecture, in order to make the system as a whole more efficient, and to reduce the programming complexity. These modifications influence other abstractions in the platform, and especially the programming model. Therefore, this thesis applies co-design on all models. Notably, co-design of the memory model, concurrency model, and model of computation is required for a scalable implementation of lambda-calculus. Moreover, only the combination of requirements of the many-core hardware from one side and the concurrency model from the other leads to a memory model abstraction. Hence, this thesis shows that to cope with the current trends in many-core architectures from a programming perspective, it is essential and feasible to inspect and adapt all abstractions collectively
    • …
    corecore