20 research outputs found

    A Multiprocessor Platform Based on FPGA Technology Targeted for a Driver Vigilance Monitoring Device

    Get PDF
    Medical devices processing images or audio or executing complex AI algorithms are able to run more efficiently and meet real time requirements if the parallelism in those algorithms is exploited. In this research a methodology is proposed to exploit the flexibility and short design cycle of FPGAs (Field Programmable Gate Arrays) in order to achieve this target. Hardware/software co-design and dynamic partitioning allow the optimization of the multiprocessor platform design parameters and software code targeting each core to meet real time constraints. This is practically demonstrated by building a real life driver vigilance monitoring system based on visual cues extraction and evaluation. The application drives the whole design process to prove its effectiveness. An algorithm was built to achieve the goal of detecting the eye state of the driver (open or closed) and it is applied on captured consecutive frames to evaluate the vigilance state of the driver. Vigilance state is measured depending on duration of eye closure. This video processing application is then targeted to run on a multi-core FPGA based processing platform using the proposed methodology. Results obtained were very good using the Grimace Face Database and when operating the system on one’s face. On operating the device, a false positive of eye closure must take place two consecutive times in order to get an alarm, which decreases the probability of failure. The timing analysis applied proved the importance of using the concept of parallelism to achieve performance constraints. FPGA technology proved to be a very powerful prototyping tool for complex multiprocessor systems design. The flexible FPGA technology coupled with hardware/software co-design provided means to explore the design space and reach decisions that satisfy the design constraints with minimum time investment and cost

    A service based estimation method for MPSoC performance modelling

    Get PDF
    This paper presents an abstract service based estimation method for MPSoC performance modelling which allows fast, cycle accurate design space exploration of complex architectures including multi processor configurations at a very early stage in the design phase. The modelling method uses a service oriented model of computation based on Hierarchical Colored Petri Nets and allows the modelling of both software and hardware in one unified model. To illustrate the potential of the method, a small MPSoC system, developed at Bang & Olufsen ICEpower a/s, is modelled and performance estimates are produced for various configurations of the system in order to explore the best possible implementation

    A Model-based Design Framework for Application-specific Heterogeneous Systems

    Get PDF
    The increasing heterogeneity of computing systems enables higher performance and power efficiency. However, these improvements come at the cost of increasing the overall complexity of designing such systems. These complexities include constructing implementations for various types of processors, setting up and configuring communication protocols, and efficiently scheduling the computational work. The process for developing such systems is iterative and time consuming, with no well-defined performance goal. Current performance estimation approaches use source code implementations that require experienced developers and time to produce. We present a framework to aid in the design of heterogeneous systems and the performance tuning of applications. Our framework supports system construction: integrating custom hardware accelerators with existing cores into processors, integrating processors into cohesive systems, and mapping computations to processors to achieve overall application performance and efficient hardware usage. It also facilitates effective design space exploration using processor models (for both existing and future processors) that do not require source code implementations to estimate performance. We evaluate our framework using a variety of applications and implement them in systems ranging from low power embedded systems-on-chip (SoC) to high performance systems consisting of commercial-off-the-shelf (COTS) components. We show how the design process is improved, reducing the number of design iterations and unnecessary source code development ultimately leading to higher performing efficient systems

    Design Space Exploration Using Arithmetic-Level Hardware–Software Cosimulation for Configurable Multiprocessor Platforms

    No full text
    Configurable multiprocessor platforms consist of multiple soft processors configured on FPGA devices. They have become an attractive choice for implementing many computing applications. In addition to the various ways of distributing software execution among the multiple soft processors, the application designer can customize soft processors and the connections between them in order to improve the performance of the applications running on the multiprocessor platform. State-of-the-art design tools rely on low-level simulation to explore the various design trade-offs offered by configurable multiprocessor platforms. These low-level simulation based exploration techniques are too time-consuming and can be a major bottleneck to efficient design space exploration on these platforms. We propose a design space exploration technique for configurable multiprocessor platforms using arithmetic-level cycle-accurate hardware–software cosimulation. Arithmetic-level abstractions of the hardware and software execution platforms are created within the proposed cosimulation environment. The configurable multiprocessor platforms are described using these arithmetic-level abstractions. Hardware and software simulators are tightly integrated to concurrently simulate the arithmetic behavior of the multiprocessor platform. The simulation within the integrated simulators are synchronized to provide cycle-accurate simulation result

    System-on-Chip: Reuse and Integration

    Full text link

    매니코어 NoC 아키텍처에 대한 고속 사이클-근사 시뮬레이션 기법

    Get PDF
    학위논문 (박사)-- 서울대학교 대학원 : 전기·컴퓨터공학부, 2017. 2. 하순회.Simulation is a software technique that uses the current available architecture to prototype a future architecture. In computer architecture research, simulation techniques are one of the most important skills. Simulation techniques enable us to obtain important performance indicators of new architectures and to perform the design space exploration using these metrics. Furthermore, the simulator enables rapid software development and optimization on the architecture that does not exist. Despite various known problems, such as slow speed or coverage issue, the reliance on simulation technology in computer architecture research continues to increase. As the density of transistor increases and the performance improvement of the single core hits the ceiling, the newly constructed architectures usually consist of multi/many cores with the network-on-chip, which enables scalable communications. In addition, the implementation of the application itself has also been complicated to effectively utilize these parallel architectures. Thus, simulators for parallel architectures and parallel applications have become extremely complex, and existing sequential simulators no longer simulate these systems at a realistic time. While many of parallel simulation techniques are being developed to solve these problems, they suffer from poor simulation performance or accuracy. In this thesis, we propose and evaluate a novel many-core simulation technique that can obtain the best simulation performance at the cost of minimum simulation error. The proposed parallel many-core simulator is divided into three parts: 1) core simulator, 2) network-on-chip simulator, and 3) simulation backplane. Each core is executed by a core simulator, which communicates with the external simulation backplane via the Interprocess Communication (IPC). Each core simulation is performed individually in a separate host processor. The simulation backplane arranges messages from each core into chronological order, passes them to destination modules, and simulates hardware components other than cores. If the simulation backplane generates a request requiring NoC communication, this request is forwarded to the network simulator and is simulated at the most accurate accuracy level. In this thesis, we proposed a novel core simulation model, which combined analytical and sampled simulations. The core simulator presents 11.36 to 44.31 MIPS performance, while the simulation error is approximately 8 percent. The standalone core simulator is released as an open-source. We confirmed that NoC simulation has a great effect on the reliability of outputs generated from many-core simulation. First, existing flit-level NoC simulators were analyzed at source-code level. Based on the observations, various implementations were evaluated and various software optimizations was applied to improve the network simulation performance. The proposed NoC simulator presents more than 100KCycles/s performance unless the packet injection rate exceeds 0.00625, which is two times faster than state-of-the-arts NoC simulator at least. The speed of the simulation backplane depends greatly on the IPC overhead and SystemC scheduling overhead. To reduce the IPC overhead, the trace-driven co-simulation technique is used, faster IPC is introduced, and the segmented L1 data cache is embedded in a core simulator. In addition, to reduce SystemC scheduling overhead, it is important to reduce the number of modules that are simultaneously awakened. To this end, slave modules are redesigned to be activated only based on an event. A new scheduler parallelization technique is also studied. Although the newly developed SystemC parallel scheduler showed good performance under limited conditions, we also confirmed that no performance improvement was found in the TLM level many-core simulator developed in this thesis. While the proposed many-core simulator uses the conservative synchronization technique which is free from causality errors and performs an accurate flit-level NoC simulation, the simulation performance is still acceptable, thanks to parallelism and optimizations. Additionally, the simulator is highly scalable to add other modules because the simulation backplane is developed to be compatible with SystemC TLM 2.0 standard. Although extensive experiments on accuracy are not conducted, it will be complemented when a detailed specification of the target architecture is given. This dissertation can be a reference to the development of a many-core simulator, which will be more essential in the future.Chapter 1 Introduction 1 1.1 Motivation 1 1.2 Contribution 4 1.3 Dissertation Organization 5 Chapter 2 Background and Existing Research 6 2.1 Terminologies 6 2.1.1 Simulation Host / Simulation Target 6 2.1.2 Simulated Time / Simulation Time 2.1.3 User-level Simulation / Full-system Simulation 7 2.1.4 Execution-driven Simulation / Trace-driven Simulation 7 2.2 State-of-the-arts Many-core Simulators 8 2.2.1 Gem5 8 2.2.2 Marss 9 2.2.3 Sniper 9 2.2.4 Zsim 9 2.2.5 Manifold 10 2.2.6 Hornet 10 2.2.7 Summary 11 2.3 Host and Target Architecture 12 Chapter 3 Core Simulation 14 3.1 Overview 14 3.2 Related Works 16 3.2.1 Timing Models 16 3.2.2 Analytical Model: Interval Simulation 19 3.3 Sampling Mechanism 23 3.3.1 Sampling Configuration 24 3.3.2 Parameter Extraction 24 3.4 Trace Analyzer 27 3.4.1 Dependency Analysis 29 3.4.2 Life Cycle of An Instruction 31 3.5 Experimental Results 32 3.5.1 Time-accuracy Trade-off 34 3.5.2 Simulation Accuracy 37 3.5.3 Simulation Performance 41 3.6 Discussion 42 Chapter 4 NoC Simulation 45 4.1 Network-on-chip 45 4.2 Motivation 46 4.3 Related Works 48 4.3.1 Noxim 49 4.3.2 Booksim2 50 4.3.3 Garnet 51 4.4 Proposed Approach 51 4.4.1 Implementations 51 4.4.2 Optimizations 54 4.5 Experimental Results 56 4.5.1 Impact of Implementations and Optimizations 56 4.5.2 Comparison with Other State-Of-The-Arts 58 4.5.3 Performance Evaluation For Various Configurations 59 4.5.4 Full-System Simulation Accuracy Impact 59 4.5.5 Accuracy 61 4.6 Discussion 61 Chapter 5 Simulation Backplane 63 5.1 Overview 63 5.2 Background 65 5.2.1 SystemC 65 5.2.2 OSCI Transaction Level Modeling Standard 2.0 66 5.2.3 Synchronization Techniques 67 5.3 SystemC Models for the Target Architecture 69 5.4 Reducing the Cost of Interprocess Communications 71 5.4.1 Trace-driven Co-simulation 71 5.4.2 Better Interprocess Communication 73 5.4.3 Virtually embedding modules to core simulator 74 5.5 Reducing SystemC Scheduling Overhead 76 5.5.1 Event-based Slave Module Activation 76 5.5.2 SystemC Scheduler Parallelization 78 5.6 Evaluation 79 5.6.1 Scalability Test 79 5.6.2 Simulation Performance 79 5.6.3 Simulation Accuracy 80 Chapter 6 Simulation Backplane Parallelization 81 6.1 Background: OSCI SystemC Scheduler 81 6.2 Related Work: SystemC Parallelization Techniques 82 6.2.1 Fully-synchronous Approach 82 6.2.2 Parallel Distributed Event Scheduling (PDES) Approach 82 6.2.3 Out-of-order Execution with Dependency Analysis 83 6.2.4 Dynamic Offloading Approach 84 6.3 Proposed Technique 84 6.3.1 Basic Synchronization 85 6.3.2 Relaxed Synchronization 86 6.3.3 Modeling Restrictions 88 6.4 Experimental Results 89 6.4.1 Performance 90 6.4.2 Accuracy 92 6.5 Discussion and Limitation 93 Chapter 7 Conclusion 95 Bibliography 97 요약 107Docto

    GPU Based Acceleration of SystemC and Transaction Level Models for MPSOC Simulation

    Get PDF
    With increasing number of cores on a chip, the complexity of modeling hardware using virtual prototype is increasing rapidly. Typical SOCs today have multipro-cessors connected through a bus or NOC architecture which can be modeled using SystemC framework. SystemC is a popular language used for early design exploration and performance analysis of complex embedded systems. TLM2.0, an extension of SystemC, is increasingly used in MPSOC designs for simulating loosely and approxi-mately timed transaction level models. The OSCI reference kernel which implements SystemC library runs on a single thread, slowing up the simulation speed to a large extent. Previous works have used the computational power of multi-core systems and GPUs which can run multiple threads simultaneously, speeding up the simu-lation. Multi-core simulations are not as effective in cases where thread runtime is low, because synchronization overhead becomes comparable to thread runtime. Modern GPUs can run thousands of threads at a time and have shown good results for synthesizable designs in recent efforts. However, development in these works are limited to synthesizable subsets of SystemC models, not supporting timed events for process communication. In this research work, a methodology is proposed for accelerating timed event based SystemC TLM2.0 model to GPU based kernel, which maps SystemC processes to CUDA threads in GPU, providing high data level par-allelism. This work aims to provide a scalable solution for simulating large MPSOC designs, facilitating early design exploration and performance analysis. Experiments have shown that the proposed technique provides a speed-up of the order of 100x for typical MPSOC designs

    Methoden und Beschreibungssprachen zur Modellierung und Verifikation vonSchaltungen und Systemen: MBMV 2015 - Tagungsband, Chemnitz, 03. - 04. März 2015

    Get PDF
    Der Workshop Methoden und Beschreibungssprachen zur Modellierung und Verifikation von Schaltungen und Systemen (MBMV 2015) findet nun schon zum 18. mal statt. Ausrichter sind in diesem Jahr die Professur Schaltkreis- und Systementwurf der Technischen Universität Chemnitz und das Steinbeis-Forschungszentrum Systementwurf und Test. Der Workshop hat es sich zum Ziel gesetzt, neueste Trends, Ergebnisse und aktuelle Probleme auf dem Gebiet der Methoden zur Modellierung und Verifikation sowie der Beschreibungssprachen digitaler, analoger und Mixed-Signal-Schaltungen zu diskutieren. Er soll somit ein Forum zum Ideenaustausch sein. Weiterhin bietet der Workshop eine Plattform für den Austausch zwischen Forschung und Industrie sowie zur Pflege bestehender und zur Knüpfung neuer Kontakte. Jungen Wissenschaftlern erlaubt er, ihre Ideen und Ansätze einem breiten Publikum aus Wissenschaft und Wirtschaft zu präsentieren und im Rahmen der Veranstaltung auch fundiert zu diskutieren. Sein langjähriges Bestehen hat ihn zu einer festen Größe in vielen Veranstaltungskalendern gemacht. Traditionell sind auch die Treffen der ITGFachgruppen an den Workshop angegliedert. In diesem Jahr nutzen zwei im Rahmen der InnoProfile-Transfer-Initiative durch das Bundesministerium für Bildung und Forschung geförderte Projekte den Workshop, um in zwei eigenen Tracks ihre Forschungsergebnisse einem breiten Publikum zu präsentieren. Vertreter der Projekte Generische Plattform für Systemzuverlässigkeit und Verifikation (GPZV) und GINKO - Generische Infrastruktur zur nahtlosen energetischen Kopplung von Elektrofahrzeugen stellen Teile ihrer gegenwärtigen Arbeiten vor. Dies bereichert denWorkshop durch zusätzliche Themenschwerpunkte und bietet eine wertvolle Ergänzung zu den Beiträgen der Autoren. [... aus dem Vorwort
    corecore