56 research outputs found

    Using an FPGA for Fast Bit Accurate SoC Simulation

    Get PDF
    In this paper we describe a sequential simulation method to simulate large parallel homo- and heterogeneous systems on a single FPGA. The method is applicable for parallel systems were lengthy cycle and bit accurate simulations are required. It is particularly designed for systems that do not fit completely on the simulation platform (i.e. FPGA). As a case study, we use a Network-on-Chip (NoC) that is simulated in SystemC and on the described FPGA simulator. This enables us to observe the NoC behavior under a large variety of traffic patterns. Compared with the SystemC simulation we achieved a factor 80-300 of speed improvement, without compromising the cycle and bit level accuracy

    Fast, Accurate and Detailed NoC Simulations

    Get PDF
    Network-on-Chip (NoC) architectures have a wide variety of parameters that can be adapted to the designer's requirements. Fast exploration of this parameter space is only possible at a high-level and several methods have been proposed. Cycle and bit accurate simulation is necessary when the actual router's RTL description needs to be evaluated and verified. However, extensive simulation of the NoC architecture with cycle and bit accuracy is prohibitively time consuming. In this paper we describe a simulation method to simulate large parallel homogeneous and heterogeneous network-on-chips on a single FPGA. The method is especially suitable for parallel systems where lengthy cycle and bit accurate simulations are required. As a case study, we use a NoC that was modelled and simulated in SystemC. We simulate the same NoC on the described FPGA simulator. This enables us to observe the NoC behavior under a large variety of traffic patterns. Compared with the SystemC simulation we achieved a speed-up of 80-300, without compromising the cycle and bit level accuracy

    Demonstration of Run-time Spatial Mapping of Streaming Applications to a Heterogeneous Multi-Processor System-on-Chip (MPSoC)

    Get PDF
    In this paper, the problem of spatial mapping is defined. Reasons are presented to show why performing spatial mappings at run-time is both necessary and desirable and criteria for the qualitative comparison of spatial mappings are introduced. An algorithm is described that implements a preliminary spatial mapper. The methods used in the algorithm are demonstrated with an illustrative example

    An Approximate Maximum Common Subgraph Algorithm for Large Digital Circuits

    Get PDF
    This paper presents an approximate Maximum Common Subgraph (MCS) algorithm, specifically for directed, cyclic graphs representing digital circuits. \ud Because of the application domain, the graphs have nice properties: they are very sparse; have many different labels; and most vertices have only one predecessor. The algorithm iterates over all vertices once and uses heuristics to find the MCS. It is linear in computational complexity with respect to the size of the graph. Experiments show that very large common subgraphs were found in graphs of up to 200,000 vertices within a few minutes, when a quarter or less of the graphs differ. The variation in run-time and quality of the result is low

    Statistical performance analysis with dynamic workload using S-NET

    Get PDF
    Volkmar Wieser, Philip K. F. Hölzenspies, Michael Roßbory, and Raimund Kirner, 'Statistical performance analysis with dynamic workload using S-NET'. Paper presented at the Workshop on Feedback-Directed Compiler Optimization for Multi-Core Architectures. Paris, France 23-25 January 2012In this paper the ADVANCE approach for engineering con- current software systems with well-balanced hardware ef- ficiency is adressed using the stream processing language S-Net. To obtain the cost information in the concurrent system the metrics throughput, latency, and jitter are evalu- ated by analyzing generated synthetical data as well as using an industrial related application in the future. As fall-out an Eclipse plugin for S-Net has been developed to provide sup- port for syntax highlighting, content assistance, hover help, and more, for easier and faster development. The presented results of the current work are on the one hand an indicator for the status quo of the ADVANCE vision and on the other hand used to improve the applied statistical analysis tech- niques within ADVANCE. Like the ADVANCE project, this work is still under development, but further improvements and speedups are expected in the near future

    Run-time Spatial Mapping of Streaming Applications to Heterogeneous Multi-Processor Systems

    Get PDF
    In this paper, we define the problem of spatial mapping. We present reasons why performing spatial mappings at run-time is both necessary and desirable. We propose what is—to our knowledge—the first attempt at a formal description of spatial mappings for the embedded real-time streaming application domain. Thereby, we introduce criteria for a qualitative comparison of these spatial mappings. As an illustration of how our formalization relates to practice, we relate our own spatial mapping algorithm to the formal model

    The Chameleon Architecture for Streaming DSP Applications

    Get PDF
    We focus on architectures for streaming DSP applications such as wireless baseband processing and image processing. We aim at a single generic architecture that is capable of dealing with different DSP applications. This architecture has to be energy efficient and fault tolerant. We introduce a heterogeneous tiled architecture and present the details of a domain-specific reconfigurable tile processor called Montium. This reconfigurable processor has a small footprint (1.8 mm2^2 in a 130 nm process), is power efficient and exploits the locality of reference principle. Reconfiguring the device is very fast, for example, loading the coefficients for a 200 tap FIR filter is done within 80 clock cycles. The tiles on the tiled architecture are connected to a Network-on-Chip (NoC) via a network interface (NI). Two NoCs have been developed: a packet-switched and a circuit-switched version. Both provide two types of services: guaranteed throughput (GT) and best effort (BE). For both NoCs estimates of power consumption are presented. The NI synchronizes data transfers, configures and starts/stops the tile processor. For dynamically mapping applications onto the tiled architecture, we introduce a run-time mapping tool

    A survey of offline algorithms for energy minimization under deadline constraints

    Get PDF
    Modern computers allow software to adjust power management settings like speed and sleep modes to decrease the power consumption, possibly at the price of a decreased performance. The impact of these techniques mainly depends on the schedule of the tasks. In this article, a survey on underlying theoretical results on power management, as well as offline scheduling algorithms that aim at minimizing the energy consumption under real-time constraints, is given

    Statistical Performance Analysis of an Ant-Colony Optimisation Application in S-NET

    Get PDF
    Kenneth MacKenzie, Philip K. F. Hölzenspies, Kevin Hammond, Raimund Kirner, Vu Thien Nga Nguyen, Iraneus te Boekhorst, Clemens Grelck, Raphael Poss, Merijn Verstraaten, 'Statistical Performance Analysis of an Ant-Colony Optimisation Application in S-NET'. Paper presented at the 2nd Workshop on Feedback-Directed Compiler Optimization for Multi-Core Architectures. Berlin, Germany, 12 January 2013.We consider an ant-colony optimsation problem implemented on a multicore system as a collection of asynchronous stream- processing components under the control of the S-NET coordina- tion language. Statistical analysis and visualisation techniques are used to study the behaviour of the application, and this enables us to discover and correct problems in both the application program and the run-time system underlying S-NET

    Green computing: power optimisation of VFI-based real-time multiprocessor dataflow applications (extended version)

    Get PDF
    Execution time is no longer the only performance metric for computer systems. In fact, a trend is emerging to trade raw performance for energy savings. Techniques like Dynamic Power Management (DPM, switching to low power state) and Dynamic Voltage and Frequency Scaling (DVFS, throttling processor frequency) help modern systems to reduce their power consumption while adhering to performance requirements. To balance flexibility and design complexity, the concept of Voltage and Frequency Islands (VFIs) was recently introduced for power optimisation. It achieves fine-grained system-level power management, by operating all processors in the same VFI at a common frequency/voltage.This paper presents a novel approach to compute a power management strategy combining DPM and DVFS. In our approach, applications (modelled in full synchronous dataflow, SDF) are mapped on heterogeneous multiprocessor platforms (partitioned in voltage and frequency islands). We compute an energy-optimal schedule, meeting minimal throughput requirements. We demonstrate that the combination of DPM and DVFS provides an energy reduction beyond considering DVFS or DMP separately. Moreover, we show that by clustering processors in VFIs, DPM can be combined with any granularity of DVFS. Our approach uses model checking, by encoding the optimisation problem as a query over priced timed automata. The model-checker Uppaal Cora extracts a cost minimal trace, representing a power minimal schedule. We illustrate our approach with several case studies on commercially available hardware
    corecore