448 research outputs found

    Programming MPSoC platforms: Road works ahead

    Get PDF
    This paper summarizes a special session on multicore/multi-processor system-on-chip (MPSoC) programming challenges. The current trend towards MPSoC platforms in most computing domains does not only mean a radical change in computer architecture. Even more important from a SW developer´s viewpoint, at the same time the classical sequential von Neumann programming model needs to be overcome. Efficient utilization of the MPSoC HW resources demands for radically new models and corresponding SW development tools, capable of exploiting the available parallelism and guaranteeing bug-free parallel SW. While several standards are established in the high-performance computing domain (e.g. OpenMP), it is clear that more innovations are required for successful\ud deployment of heterogeneous embedded MPSoC. On the other hand, at least for coming years, the freedom for disruptive programming technologies is limited by the huge amount of certified sequential code that demands for a more pragmatic, gradual tool and code replacement strategy

    Generic Pipelined Processor Modeling and High Performance Cycle-Accurate Simulator Generation

    Full text link
    Detailed modeling of processors and high performance cycle-accurate simulators are essential for today's hardware and software design. These problems are challenging enough by themselves and have seen many previous research efforts. Addressing both simultaneously is even more challenging, with many existing approaches focusing on one over another. In this paper, we propose the Reduced Colored Petri Net (RCPN) model that has two advantages: first, it offers a very simple and intuitive way of modeling pipelined processors; second, it can generate high performance cycle-accurate simulators. RCPN benefits from all the useful features of Colored Petri Nets without suffering from their exponential growth in complexity. RCPN processor models are very intuitive since they are a mirror image of the processor pipeline block diagram. Furthermore, in our experiments on the generated cycle-accurate simulators for XScale and StrongArm processor models, we achieved an order of magnitude (~15 times) speedup over the popular SimpleScalar ARM simulator.Comment: Submitted on behalf of EDAA (http://www.edaa.com/

    Hardware/Software Codesign

    Get PDF
    The current state of the art technology in integrated circuits allows the incorporation of multiple processor cores and memory arrays, in addition to application specific hardware, on a single substrate. As silicon technology has become more advanced, allowing the implementation of more complex designs, systems have begun to incorporate considerable amounts of embedded software [3]. Thus it becomes increasingly necessary for the system designers to have knowledge on both hardware and software to make efficient design tradeoffs. This is where hardware/software codesign comes into existence

    Custom Integrated Circuits

    Get PDF
    Contains table of contents for Part III, table of contents for Section 1 and reports on eleven research projects.IBM CorporationMIT School of EngineeringNational Science Foundation Grant MIP 94-23221Defense Advanced Research Projects Agency/U.S. Army Intelligence Center Contract DABT63-94-C-0053Mitsubishi CorporationNational Science Foundation Young Investigator Award Fellowship MIP 92-58376Joint Industry Program on Offshore Structure AnalysisAnalog DevicesDefense Advanced Research Projects AgencyCadence Design SystemsMAFET ConsortiumConsortium for Superconducting ElectronicsNational Defense Science and Engineering Graduate FellowshipDigital Equipment CorporationMIT Lincoln LaboratorySemiconductor Research CorporationMultiuniversity Research IntiativeNational Science Foundatio

    Designing a CPU model: from a pseudo-formal document to fast code

    Get PDF
    For validating low level embedded software, engineers use simulators that take the real binary as input. Like the real hardware, these full-system simulators are organized as a set of components. The main component is the CPU simulator (ISS), because it is the usual bottleneck for the simulation speed, and its development is a long and repetitive task. Previous work showed that an ISS can be generated from an Architecture Description Language (ADL). In the work reported in this paper, we generate a CPU simulator directly from the pseudo-formal descriptions of the reference manual. For each instruction, we extract the information describing its behavior, its binary encoding, and its assembly syntax. Next, after automatically applying many optimizations on the extracted information, we generate a SystemC/TLM ISS. We also generate tests for the decoder and a formal specification in Coq. Experiments show that the generated ISS is as fast and stable as our previous hand-written ISS.Comment: 3rd Workshop on: Rapid Simulation and Performance Evaluation: Methods and Tools (2011

    t|ket> : A retargetable compiler for NISQ devices

    Get PDF
    We present t|ket>, a quantum software development platform produced by Cambridge Quantum Computing Ltd. The heart of t|ket> is a language-agnostic optimising compiler designed to generate code for a variety of NISQ devices, which has several features designed to minimise the influence of device error. The compiler has been extensively benchmarked and outperforms most competitors in terms of circuit optimisation and qubit routing

    Formal Architecture Specification for Time Analysis

    Get PDF
    International audienceWCET calculus is nowadays a must for safety critical systems. As a matter of fact, basic real-time properties rely on accurate timings. Although over the last years, substantial progress has been made in order to get a more precise WCET, we believe that the design of the underlying frameworks deserve more attention. In this paper, we are concerned mainly with two aspects which deal with the modularity of these frameworks. First, we enhance the existing language Sim-nML for describing processors at the instruction level in order to capture modern architecture aspects. Second, we propose a light DSL in order to describe, in a formal prose, architectural aspects related to both the structural aspects as well as to the behavioral aspects

    GCC-Plugin for Automated Accelerator Generation and Integration on Hybrid FPGA-SoCs

    Full text link
    In recent years, architectures combining a reconfigurable fabric and a general purpose processor on a single chip became increasingly popular. Such hybrid architectures allow extending embedded software with application specific hardware accelerators to improve performance and/or energy efficiency. Aiding system designers and programmers at handling the complexity of the required process of hardware/software (HW/SW) partitioning is an important issue. Current methods are often restricted, either to bare-metal systems, to subsets of mainstream programming languages, or require special coding guidelines, e.g., via annotations. These restrictions still represent a high entry barrier for the wider community of programmers that new hybrid architectures are intended for. In this paper we revisit HW/SW partitioning and present a seamless programming flow for unrestricted, legacy C code. It consists of a retargetable GCC plugin that automatically identifies code sections for hardware acceleration and generates code accordingly. The proposed workflow was evaluated on the Xilinx Zynq platform using unmodified code from an embedded benchmark suite.Comment: Presented at Second International Workshop on FPGAs for Software Programmers (FSP 2015) (arXiv:1508.06320

    Fast approximately timed simulation

    Get PDF
    International audienceIn this paper we present a technique for fast approximately timed simulation of software within a virtual prototyping framework. Our method performs a static analysis of the program control flow graph to construct annotations of the simulated program, combined with dynamic performance information. The static analysis estimates execution time based on a target architecture model. The delays introduced by instruction fetch and data cache misses are evaluated dynamically. At the end of each block, static and dynamic information are combined with branch target prediction to compute the total execution time of the blocks. As a result, we can provide approximate performance estimates with a high simulation speed that is still usable for software developers

    Optimized Fast Fourier Transform Architecture Using Instruction Set Architecture Extension In Low-End Digital Signal Controller

    Get PDF
    Smart microgrids have emerged as a viable solution in case of emergency situations occurred at the main electricity grid. The main concern of a smart microgrid is the degradation of the power quality caused by harmonic distortion originated from the non-linear equipment. With the rapid development of power electronic technology, the increased of harmonic-producing loads in the smart microgrids necessitating a new digital signal controller architecture for the harmonic measurement system. While the current system configurations are directed towards the 32-bit architecture, it shows higher requirements in area footprint and multi-core setup. This thesis presents the design of a low-end digital signal controller architecture using instruction set architecture (ISA) extension for the implementation of the harmonic measurement system in a smart microgrid. A new architecture, called UTeMRISC, is developed from the baseline 8-bit microcontroller with the capability to perform signal processing applications such as Fast Fourier Transform (FFT). The architecture is improved using the Application-Specific Instruction Set Processor (ASIP) approach by extending the instruction set architecture to 16-bit length. Instruction set customization is implemented to enable the execution of computationally intensive tasks. The entire architecture is described in Verilog Hardware Description Language (HDL) and implemented on the Virtex-6 FPGA board. From the test programs, UTeMRISC has demonstrated faster execution times and higher maximum operating frequency while not significantly increased the core’s resource utilization. Compared to the initial processor architecture, the support of extended ISA has increased the UTeMRISC core by 21.8% but at the same time allows to execute Fast Fourier Transform algorithm up to 5× faster. The combine effort of ISA extension and optimized instruction set generation results in up to 1 Mega sample per second, which translated to 66.8% increase of data throughput in the FFT algorithm when compared to a 32-bit architecture. This research proves that with comprehensive ASIP methodology and ISA extension, a low-end digital signal controller architecture is feasible and effective to be implemented in a harmonic measurement system for a smart microgrid
    • …
    corecore