1,128 research outputs found

    Automatic generation and testing of application specific hardware accelerators on a new reconfigurable OpenSPARC platform

    Get PDF
    Specific hardware customization for scientific applications has shown a big potential to address the current holy grail in computer architecture: reducing power consumption while increasing performance. In particular, the automatic generation of domain-specific accelerators for General- Purpose Processors (GPPs) is an active field of research to the point that different leading hardware design companies (e.g. Intel, ARM) are announcing commercial platforms that integrate GPPs and FPGAs. In this paper we present a new framework with a holistic approach that addresses the challenge of design exploration of specific application accelerators. Our work focuses on a target platform consisting of a GPP with a reconfigurable functional unit. The framework includes a reconfigurable 1-core 1-thread OpenSPARC with a new programmable specific purpose unit (SPU) inside the OpenSPARC core. In order to program the SPU we have developed an automatic toolchain that profiles an application and discovers its main computing bottlenecks. With that information our toolchain is able to both design hardware specific accelerators that can be automatically mapped in the aforementioned SPU, and generate the binary code necessary to run the application using those accelerators. The OpenSPARC with the new specific application accelerators, defined in a Hardware Description Language, can then be executed and measured. Still awaiting further development, nowadays our framework is a proof-of-concept that shows that this kind of systems can be developed and programmed as easily as a GPP. In a near future it would be the source of very interesting information about the capabilities and drawbacks of those mixed GPP-FPGA systems.Postprint (published version

    Metodologí­a para la generación y evaluación automática de hardware específico

    Get PDF
    En el área de la bioinformática podemos encontrar aplicaciones que suponen un reto para el diseño de nuevas arquitecturas de procesadores en términos de rendimiento, ya que sus características difieren de las de las aplicaciones de propósito general. Por ello proponemos una nueva arquitectura con unidades funcionales reconfigurables para un dominio específico de aplicaciones. Así, el primer paso para definir la nueva arquitectura será la creación de la nueva ISA del procesador, que se compondrá de extensiones de la ISA original. Para conseguir dicho objetivo, presentamos una metodología para identificar automáticamente patrones de instrucciones y generar prototipos de las unidades funcionales que las ejecutan. Hemos implementado la metodología de manera experimental con el soporte de la infraestructura Trimaran para la identificación de extensiones de la ISA, la herramienta DWARV para la generación de código VHDL, y la plataforma MOLEN para la evaluación de los prototipos hardware específicos generados automáticamente. En las evaluaciones iniciales de los prototipos generados para una aplicación de estudio, ClustalW, se ha obtenido hasta un 8.54x de speed-up para un único acelerador, mientras que el speed-up de toda la aplicación está por encima de 2x.Postprint (published version

    MPSoCBench : um framework para avaliação de ferramentas e metodologias para sistemas multiprocessados em chip

    Get PDF
    Orientador: Rodolfo Jardim de AzevedoTese (doutorado) - Universidade Estadual de Campinas, Instituto de ComputaçãoResumo: Recentes metodologias e ferramentas de projetos de sistemas multiprocessados em chip (MPSoC) aumentam a produtividade por meio da utilização de plataformas baseadas em simuladores, antes de definir os últimos detalhes da arquitetura. No entanto, a simulação só é eficiente quando utiliza ferramentas de modelagem que suportem a descrição do comportamento do sistema em um elevado nível de abstração. A escassez de plataformas virtuais de MPSoCs que integrem hardware e software escaláveis nos motivou a desenvolver o MPSoCBench, que consiste de um conjunto escalável de MPSoCs incluindo quatro modelos de processadores (PowerPC, MIPS, SPARC e ARM), organizado em plataformas com 1, 2, 4, 8, 16, 32 e 64 núcleos, cross-compiladores, IPs, interconexões, 17 aplicações paralelas e estimativa de consumo de energia para os principais componentes (processadores, roteadores, memória principal e caches). Uma importante demanda em projetos MPSoC é atender às restrições de consumo de energia o mais cedo possível. Considerando que o desempenho do processador está diretamente relacionado ao consumo, há um crescente interesse em explorar o trade-off entre consumo de energia e desempenho, tendo em conta o domínio da aplicação alvo. Técnicas de escalabilidade dinâmica de freqüência e voltagem fundamentam-se em gerenciar o nível de tensão e frequência da CPU, permitindo que o sistema alcance apenas o desempenho suficiente para processar a carga de trabalho, reduzindo, consequentemente, o consumo de energia. Para explorar a eficiência energética e desempenho, foram adicionados recursos ao MPSoCBench, visando explorar escalabilidade dinâmica de voltaegem e frequência (DVFS) e foram validados três mecanismos com base na estimativa dinâmica de energia e taxa de uso de CPUAbstract: Recent design methodologies and tools aim at enhancing the design productivity by providing a software development platform before the definition of the final Multiprocessor System on Chip (MPSoC) architecture details. However, simulation can only be efficiently performed when using a modeling and simulation engine that supports system behavior description at a high abstraction level. The lack of MPSoC virtual platform prototyping integrating both scalable hardware and software in order to create and evaluate new methodologies and tools motivated us to develop the MPSoCBench, a scalable set of MPSoCs including four different ISAs (PowerPC, MIPS, SPARC, and ARM) organized in platforms with 1, 2, 4, 8, 16, 32, and 64 cores, cross-compilers, IPs, interconnections, 17 parallel version of software from well-known benchmarks, and power consumption estimation for main components (processors, routers, memory, and caches). An important demand in MPSoC designs is the addressing of energy consumption constraints as early as possible. Whereas processor performance comes with a high power cost, there is an increasing interest in exploring the trade-off between power and performance, taking into account the target application domain. Dynamic Voltage and Frequency Scaling techniques adaptively scale the voltage and frequency levels of the CPU allowing it to reach just enough performance to process the system workload while meeting throughput constraints, and thereby, reducing the energy consumption. To explore this wide design space for energy efficiency and performance, both for hardware and software components, we provided MPSoCBench features to explore dynamic voltage and frequency scalability (DVFS) and evaluated three mechanisms based on energy estimation and CPU usage rateDoutoradoCiência da ComputaçãoDoutora em Ciência da Computaçã

    Compiler-Driven Leakage Energy Reduction

    Get PDF
    Abstract. Tomorrow's embedded devices need to run high-resolution multimedia applications which need an enormous computational complexity with a very low energy consumption constraint. In this context, the register file is one of the key sources of power consumption and its inappropriate design and management can severely affect the performance of the system. In this paper, we present a new approach to reduce the energy of the shared register file in upcoming embedded VLIW architectures with several processing units. Energy savings up to a 60% can be obtained in the register file without any performance penalty. It is based on a set of hardware extensions and a compiler-based energy-aware register assignment algorithm that enable the de/activation of parts of the register file (i.e. sub-banks) in an independent way at run-time, which can be easily included in these embedded architectures

    Compiler-Driven Leakage Energy Reduction in Banked Register Files

    Get PDF
    Tomorrow’s embedded devices need to run high-resolution multimedia applications which need an enormous computational complexity with a very low energy consumption constraint. In this context, the register file is one of the key sources of power consumption and its inappropriate design and management can severely affect the performance of the system. In this paper, we present a new approach to reduce the energy of the shared register file in upcoming embedded VLIW architectures with several processing units. Energy savings up to a 60% can be obtained in the register file without any performance penalty. It is based on a set of hardware extensions and a compiler-based energy-aware reg- ister assignment algorithm that enable the de/activation of parts of the register file (i.e. sub-banks) in an independent way at run-time, which can be easily included in these embedded architectures

    Tracing Hardware Monitors in the GR712RC Multicore Platform: Challenges and Lessons Learnt from a Space Case Study

    Get PDF
    The demand for increased computing performance is driving industry in critical-embedded systems (CES) domains, e.g. space, towards the use of multicores processors. Multicores, however, pose several challenges that must be addressed before their safe adoption in critical embedded domains. One of the prominent challenges is software timing analysis, a fundamental step in the verification and validation process. Monitoring and profiling solutions, traditionally used for debugging and optimization, are increasingly exploited for software timing in multicores. In particular, hardware event monitors related to requests to shared hardware resources are building block to assess and restraining multicore interference. Modern timing analysis techniques build on event monitors to track and control the contention tasks can generate each other in a multicore platform. In this paper we look into the hardware profiling problem from an industrial perspective and address both methodological and practical problems when monitoring a multicore application. We assess pros and cons of several profiling and tracing solutions, showing that several aspects need to be taken into account while considering the appropriate mechanism to collect and extract the profiling information from a multicore COTS platform. We address the profiling problem on a representative COTS platform for the aerospace domain to find that the availability of directly-accessible hardware counters is not a given, and it may be necessary to the develop specific tools that capture the needs of both the user’s and the timing analysis technique requirements. We report challenges in developing an event monitor tracing tool that works for bare-metal and RTEMS configurations and show the accuracy of the developed tool-set in profiling a real aerospace application. We also show how the profiling tools can be exploited, together with handcrafted benchmarks, to characterize the application behavior in terms of multicore timing interference.This work has been partially supported by a collaboration agreement between Thales Research and the Barcelona Supercomputing Center, and the European Research Council (ERC) under the EU’s Horizon 2020 research and innovation programme (grant agreement No. 772773). MINECO partially supported Jaume Abella under Ramon y Cajal postdoctoral fellowship (RYC2013-14717).Peer ReviewedPostprint (published version

    Energy-Aware Compilation and Hardware Design for VLIW Embedded Systems

    Get PDF
    Tomorrow's embedded devices need to run multimedia applications demanding high computational power with low energy consumption constraints. In this context, the register file is a key source of power consumption and its inappropriate design and management severely affects system power. In this paper, we present a new approach to reduce the energy of shared register files in forthcoming embedded VLIW processors running real-life applications up to 60% without performance penalty. This approach relies on limited hardware extensions and a compiler-based energy-aware register assignment algorithm to deactivate at run-time parts of the register file (i.e., sub-banks) in an independent way
    corecore