53 research outputs found

    Real-time linux and hardware accelerated systems on QEMU

    Get PDF
    Dissertação de mestrado integrado em Industrial Electronics Engineering and ComputersSoftware application acceleration, using parallelization techniques and dedicated hardware components, is often an optimization compromise in a cost-benefit relationship during the migration of software processes to hardware Intellectual Property (IP) dedicated cores or accelerators. In real-time applications extra care is needed when dealing with these issues, so that the real-time requirements of the application are not compromised. An isolated validation, as far as application domains are concerned, does not guarantee integral system functionality. Using an integrated co-simulation environment, chances of early system problem detection before moving to the physical implementation phase are improved. By adopting a design flow aided by co-simulation, not only is the development process sped up, but also resource independent, since the system can be developed in its entirety in a host platform without being bound to a physical target platform. This dissertation aims to adopt a methodology of hardware-software co-design aided by co-simulation and extend embedded system simulation techniques to hardware IP co-simulation and integral validation, improving the design process of hardware accelerated embedded systems in their various development phases. Using Quick EMUlator (QEMU) as a tool for emulating embedded software platforms in a Linux-based environment, modifications were idealized and developed to enable QEMU to extend its embedded software platform emulating capabilities for custom hardware co-processor development purposes. Two QEMU extensions were developed, enabling easy integration of behavioral devices and co-simulation with external Register-Transfer Level (RTL) models in QEMU’s target platforms. A Verilog PLI library was also developed to allow Verilog simulators that support PLI to perform co-simulation with QEMU. To demonstrate the capabilities of following a hardware-software embedded co-design using the developed simulation environment, a demonstration application scenario was developed following a design flow that takes advantage of said simulation environment possibilities.A aceleração de aplicações de software, utilizando técnicas de paralelização e componentes de hardware dedicados, é frequentemente um compromisso de optimização numa relação de custo-benefício durante a migração de processos de software para aceleradores ou cores hardware IP dedicados. Em aplicações real-time, cuidados extra são necessários ao lidar com estas problemáticas, de forma a que os requisitos real-time da aplicação não sejam comprometidos. Uma validação isolada, no que respeitam os vários domínios de aplicação, não garante uma funcionalidade integral do sistema. Utilizando um ambiente de co-simulação integrado, falhas no sistema podem ser detectadas numa fase inicial do projecto, antes de ser atingida uma fase de implementação física. Ao adoptar um design flow auxiliado por cosimulação, não só é o processo de desenvolvimento agilizado, mas também isento de dependências a nível da plataforma target, uma vez que o sistema pode ser desenvolvido inteiramente na plataforma host sem estar dependente dos recursos físicos associados uma plataforma target. Esta dissertação surge no âmbito da validação de uma metodologia de hardware-software co-design auxiliada por co-simulação, no extender de técnicas de simulação de sistemas embebidos, com ou sem aceleração de processos em hardware RTL, e na validação integral, aperfeiçoando o processo de design dos mesmos ao longo das várias fases de desenvolvimento. Utilizando o QEMU como ferramenta para emulação de ambientes baseados em Linux para plataformas de CPU+FPGA, alterações foram idealizadas e desenvolvidas para permitir extender as capacidades de emulação das mesmas no QEMU, para propósitos de desenvolvimento de aceleradores em hardware customizados, possibilitando a integração de devices comportamentais e co-simulação com modelos RTL externos nas plataformas target do QEMU. Para demonstrar as capacidades de seguir um co-design de hardware-software embebido utilizando o ambiente de simulação desenvolvido, um cenário de aplicação demonstrador foi desenvolvido seguindo um design flow que toma partido das possibilidades do referido ambiente de simulação

    MPSoCBench : um framework para avaliação de ferramentas e metodologias para sistemas multiprocessados em chip

    Get PDF
    Orientador: Rodolfo Jardim de AzevedoTese (doutorado) - Universidade Estadual de Campinas, Instituto de ComputaçãoResumo: Recentes metodologias e ferramentas de projetos de sistemas multiprocessados em chip (MPSoC) aumentam a produtividade por meio da utilização de plataformas baseadas em simuladores, antes de definir os últimos detalhes da arquitetura. No entanto, a simulação só é eficiente quando utiliza ferramentas de modelagem que suportem a descrição do comportamento do sistema em um elevado nível de abstração. A escassez de plataformas virtuais de MPSoCs que integrem hardware e software escaláveis nos motivou a desenvolver o MPSoCBench, que consiste de um conjunto escalável de MPSoCs incluindo quatro modelos de processadores (PowerPC, MIPS, SPARC e ARM), organizado em plataformas com 1, 2, 4, 8, 16, 32 e 64 núcleos, cross-compiladores, IPs, interconexões, 17 aplicações paralelas e estimativa de consumo de energia para os principais componentes (processadores, roteadores, memória principal e caches). Uma importante demanda em projetos MPSoC é atender às restrições de consumo de energia o mais cedo possível. Considerando que o desempenho do processador está diretamente relacionado ao consumo, há um crescente interesse em explorar o trade-off entre consumo de energia e desempenho, tendo em conta o domínio da aplicação alvo. Técnicas de escalabilidade dinâmica de freqüência e voltagem fundamentam-se em gerenciar o nível de tensão e frequência da CPU, permitindo que o sistema alcance apenas o desempenho suficiente para processar a carga de trabalho, reduzindo, consequentemente, o consumo de energia. Para explorar a eficiência energética e desempenho, foram adicionados recursos ao MPSoCBench, visando explorar escalabilidade dinâmica de voltaegem e frequência (DVFS) e foram validados três mecanismos com base na estimativa dinâmica de energia e taxa de uso de CPUAbstract: Recent design methodologies and tools aim at enhancing the design productivity by providing a software development platform before the definition of the final Multiprocessor System on Chip (MPSoC) architecture details. However, simulation can only be efficiently performed when using a modeling and simulation engine that supports system behavior description at a high abstraction level. The lack of MPSoC virtual platform prototyping integrating both scalable hardware and software in order to create and evaluate new methodologies and tools motivated us to develop the MPSoCBench, a scalable set of MPSoCs including four different ISAs (PowerPC, MIPS, SPARC, and ARM) organized in platforms with 1, 2, 4, 8, 16, 32, and 64 cores, cross-compilers, IPs, interconnections, 17 parallel version of software from well-known benchmarks, and power consumption estimation for main components (processors, routers, memory, and caches). An important demand in MPSoC designs is the addressing of energy consumption constraints as early as possible. Whereas processor performance comes with a high power cost, there is an increasing interest in exploring the trade-off between power and performance, taking into account the target application domain. Dynamic Voltage and Frequency Scaling techniques adaptively scale the voltage and frequency levels of the CPU allowing it to reach just enough performance to process the system workload while meeting throughput constraints, and thereby, reducing the energy consumption. To explore this wide design space for energy efficiency and performance, both for hardware and software components, we provided MPSoCBench features to explore dynamic voltage and frequency scalability (DVFS) and evaluated three mechanisms based on energy estimation and CPU usage rateDoutoradoCiência da ComputaçãoDoutora em Ciência da Computaçã

    System-on-Chip Design and Test with Embedded Debug Capabilities

    Get PDF
    In this project, I started with a System-on-Chip platform with embedded test structures. The baseline platform consisted of a Leon2 CPU, AMBA on-chip bus, and an Advanced Encryption Standard decryption module. The basic objective of this thesis was to use the embedded reconfigurable logic blocks for post-silicon debug and verification. The System-on-Chip platform was designed at the register transistor level and implemented in a 180-nm IBM process. Test logic instrumentation was done with DAFCA (Design Automation for Flexible Chip Architecture) Inc. pre-silicon tools. The design was then synthesized using the Synopsys Design Compiler and placed and routed using Cadence SOC Encounter. Total transistor count is about 3 million, including 1400K transistors for the debug module serving as on chip logic analyzer. Core size of the design is about 4.8mm x 4.8mm and the system is working at 151MHz. Design verification was done with Cadence NCSim. The controllability and observability of internal signals of the design is greatly increased with the help of pre-silicon tools which helps locate bugs and later fix them with the help of post-silicon tools. This helps prevent re-spins on several occasions thus saving millions of dollars. Post-silicon tools have been used to program assertions and triggers and inject numerous personalities into the reconfigurable fabric which has greatly increased the versatility of the circuit

    Application development process for GNAT, a SOC networked system

    Get PDF
    The market for smart devices was identified years ago, and yet commercial progress into this field has not made significant progress. The reason such devices are so painfully slow to market is that the gap between the technologically possible and the market capitalizable is too vast. In order for inventions to succeed commercially, they must bridge the gap to tomorrow\u27s technology with marketability today. This thesis demonstrates a design methodology that enables such commercial success for one variety of smart device, the Ambient Intelligence Node (AIN). Commercial Off-The Shelf (COTS) design tools allowing a Model-Driven Architecture (MDA) approach are combined via custom middleware to form an end-to-end design flow for rapid prototyping and commercialization. A walkthrough of this design methodology demonstrates its effectiveness in the creation of Global Network Academic Test (GNAT), a sample AIN. It is shown how designers are given the flexibility to incorporate IP Blocks available in the Global Economy to reduce Time-To-Market and cost. Finally, new kinds of products and solutions built on the higher levels of design abstraction permitted by MDA design methods are explored

    A Multi-layer Fpga Framework Supporting Autonomous Runtime Partial Reconfiguration

    Get PDF
    Partial reconfiguration is a unique capability provided by several Field Programmable Gate Array (FPGA) vendors recently, which involves altering part of the programmed design within an SRAM-based FPGA at run-time. In this dissertation, a Multilayer Runtime Reconfiguration Architecture (MRRA) is developed, evaluated, and refined for Autonomous Runtime Partial Reconfiguration of FPGA devices. Under the proposed MRRA paradigm, FPGA configurations can be manipulated at runtime using on-chip resources. Operations are partitioned into Logic, Translation, and Reconfiguration layers along with a standardized set of Application Programming Interfaces (APIs). At each level, resource details are encapsulated and managed for efficiency and portability during operation. An MRRA mapping theory is developed to link the general logic function and area allocation information to the device related physical configuration level data by using mathematical data structure and physical constraints. In certain scenarios, configuration bit stream data can be read and modified directly for fast operations, relying on the use of similar logic functions and common interconnection resources for communication. A corresponding logic control flow is also developed to make the entire process autonomous. Several prototype MRRA systems are developed on a Xilinx Virtex II Pro platform. The Virtex II Pro on-chip PowerPC core and block RAM are employed to manage control operations while multiple physical interfaces establish and supplement autonomous reconfiguration capabilities. Area, speed and power optimization techniques are developed based on the developed Xilinx prototype. Evaluations and analysis of these prototype and techniques are performed on a number of benchmark and hashing algorithm case studies. The results indicate that based on a variety of test benches, up to 70% reduction in the resource utilization, up to 50% improvement in power consumption, and up to 10 times increase in run-time performance are achieved using the developed architecture and approaches compared with Xilinx baseline reconfiguration flow. Finally, a Genetic Algorithm (GA) for a FPGA fault tolerance case study is evaluated as a ultimate high-level application running on this architecture. It demonstrated that this is a hardware and software infrastructure that enables an FPGA to dynamically reconfigure itself efficiently under the control of a soft microprocessor core that is instantiated within the FPGA fabric. Such a system contributes to the observed benefits of intelligent control, fast reconfiguration, and low overhead

    Caracterización y optimización térmica de sistemas en chip mediante emulación con FPGAs

    Get PDF
    Tesis inédita de la Universidad Complutense de Madrid, Facultad de Informática, Departamento de Arquitectura de Computadores y Automática, leída el 15/06/2012Tablets and smartphones are some of the many intelligent devices that dominate the consumer electronics market. These systems are complex to design as they must execute multiple applications (e.g.: real-time video processing, 3D games, or wireless communications), while meeting additional design constraints, such as low energy consumption, reduced implementation size and, of course, a short time-to-market. Internally, they rely on Multi-processor Systems on Chip (MPSoCs) as their main processing cores, to meet the tight design constraints: performance, size, power consumption, etc. In a bad design, the high logic density may generate hotspots that compromise the chip reliability. This thesis introduces a FPGA-based emulation framework for easy exploration of SoC design alternatives. It provides fast and accurate estimations of performance, power, temperature, and reliability in one unified flow, to help designers tune their system architecture before going to silicon.El estado del arte, en lo que a diseño de chips para empotrados se refiere, se encuentra dominado por los multi-procesadores en chip, o MPSoCs. Son complejos de diseñar y presentan problemas de disipación de potencia, de temperatura, y de fiabilidad. En este contexto, esta tesis propone una nueva plataforma de emulación para facilitar la exploración del enorme espacio de diseño. La plataforma utiliza una FPGA de propósito general para acelerar la emulación, lo cual le da una ventaja competitiva frente a los simuladores arquitectónicos software, que son mucho más lentos. Los datos obtenidos de la ejecución en la FPGA son enviados a un PC que contiene bibliotecas (modelos) SW para calcular el comportamiento (e.g.: la temperatura, el rendimiento, etc...) que tendría el chip final. La parte experimental está enfocada a dos puntos: por un lado, a verificar que el sistema funciona correctamente y, por otro, a demostrar la utilidad del entorno para realizar exploraciones que muestren los efectos a largo plazo que suceden dentro del chip, como puede ser la evolución de la temperatura, que es un fenómeno lento que normalmente requiere de costosas simulaciones software.Depto. de Arquitectura de Computadores y AutomáticaFac. de InformáticaTRUEunpu

    apeNEXT: A Multi-Tflops LQCD Computing Project

    Get PDF
    This paper is a slightly modified and reduced version of the proposal of the {\bf apeNEXT} project, which was submitted to DESY and INFN in spring 2000. .It presents the basic motivations and ideas of a next generation lattice QCD (LQCD) computing project, whose goal is the construction and operation of several large scale Multi-TFlops LQCD engines, providing an integrated peak performance of tens of TFlops, and a sustained (double precision) performance on key LQCD kernels of about 50% of peak speed

    Élaboration d'un modèle d'abstraction des communications point-à-point pour une plateforme (SOC) multiprocesseur hétérogène

    Get PDF
    Les systèmes embarqués modernes -- Problématique -- Les communications dans un système-sur-puce -- Les modèles de communication pour MPSoC -- Les architectures de communication -- Abstraction des communications à haut niveau -- Génération des interfaces logiciel-matériel -- Une plateforme virtuelle hétérogène et extensible pour SPACE -- La plateforme CoreConnect d'IBM implémentée par Xilinx -- Le PowerPC405FX -- Intégration de l'ISS du PowerPC à SpaceLib -- DirectLink : Abstraction des communications point-à -point dans la plateforme virtuelle SPACE -- Paradigme du DirectLink -- Méthodologie -- Spécification des interfaces -- Connections module/module HW-HW -- Connexions module/module HW/SW ou SW/HW -- Connexions module/module SW/SW -- Design des composants SpaceLib -- Implications au niveau de la pile logicielle -- Abstraction du DirectLink dans SPACE -- Analyse, performances et discussion -- Validation du paradigme DirectLink -- Technique d'analyse des performances -- Performances du DirectLink -- Impact sur l'utilisation des ressources matérielles -- Accélération d'une application dans SPACE avec le DirectLink -- Extensibilité du paradigme à d'autres plateformes -- Comparaison avec d'autres travaux -- Améliorations suggérées à l'architecture de communication SPACE

    FPGA Implementation for Real-Time Background Subtraction Based on Horprasert Model

    Get PDF
    Background subtraction is considered the first processing stage in video surveillance systems, and consists of determining objects in movement in a scene captured by a static camera. It is an intensive task with a high computational cost. This work proposes an embedded novel architecture on FPGA which is able to extract the background on resource-limited environments and offers low degradation (produced because of the hardware-friendly model modification). In addition, the original model is extended in order to detect shadows and improve the quality of the segmentation of the moving objects. We have analyzed the resource consumption and performance in Spartan3 Xilinx FPGAs and compared to others works available on the literature, showing that the current architecture is a good trade-off in terms of accuracy, performance and resources utilization. With less than a 65% of the resources utilization of a XC3SD3400 Spartan-3A low-cost family FPGA, the system achieves a frequency of 66.5 MHz reaching 32.8 fps with resolution 1,024 × 1,024 pixels, and an estimated power consumption of 5.76 W
    corecore