17 research outputs found

    MPSoCBench : um framework para avaliação de ferramentas e metodologias para sistemas multiprocessados em chip

    Get PDF
    Orientador: Rodolfo Jardim de AzevedoTese (doutorado) - Universidade Estadual de Campinas, Instituto de ComputaçãoResumo: Recentes metodologias e ferramentas de projetos de sistemas multiprocessados em chip (MPSoC) aumentam a produtividade por meio da utilização de plataformas baseadas em simuladores, antes de definir os últimos detalhes da arquitetura. No entanto, a simulação só é eficiente quando utiliza ferramentas de modelagem que suportem a descrição do comportamento do sistema em um elevado nível de abstração. A escassez de plataformas virtuais de MPSoCs que integrem hardware e software escaláveis nos motivou a desenvolver o MPSoCBench, que consiste de um conjunto escalável de MPSoCs incluindo quatro modelos de processadores (PowerPC, MIPS, SPARC e ARM), organizado em plataformas com 1, 2, 4, 8, 16, 32 e 64 núcleos, cross-compiladores, IPs, interconexões, 17 aplicações paralelas e estimativa de consumo de energia para os principais componentes (processadores, roteadores, memória principal e caches). Uma importante demanda em projetos MPSoC é atender às restrições de consumo de energia o mais cedo possível. Considerando que o desempenho do processador está diretamente relacionado ao consumo, há um crescente interesse em explorar o trade-off entre consumo de energia e desempenho, tendo em conta o domínio da aplicação alvo. Técnicas de escalabilidade dinâmica de freqüência e voltagem fundamentam-se em gerenciar o nível de tensão e frequência da CPU, permitindo que o sistema alcance apenas o desempenho suficiente para processar a carga de trabalho, reduzindo, consequentemente, o consumo de energia. Para explorar a eficiência energética e desempenho, foram adicionados recursos ao MPSoCBench, visando explorar escalabilidade dinâmica de voltaegem e frequência (DVFS) e foram validados três mecanismos com base na estimativa dinâmica de energia e taxa de uso de CPUAbstract: Recent design methodologies and tools aim at enhancing the design productivity by providing a software development platform before the definition of the final Multiprocessor System on Chip (MPSoC) architecture details. However, simulation can only be efficiently performed when using a modeling and simulation engine that supports system behavior description at a high abstraction level. The lack of MPSoC virtual platform prototyping integrating both scalable hardware and software in order to create and evaluate new methodologies and tools motivated us to develop the MPSoCBench, a scalable set of MPSoCs including four different ISAs (PowerPC, MIPS, SPARC, and ARM) organized in platforms with 1, 2, 4, 8, 16, 32, and 64 cores, cross-compilers, IPs, interconnections, 17 parallel version of software from well-known benchmarks, and power consumption estimation for main components (processors, routers, memory, and caches). An important demand in MPSoC designs is the addressing of energy consumption constraints as early as possible. Whereas processor performance comes with a high power cost, there is an increasing interest in exploring the trade-off between power and performance, taking into account the target application domain. Dynamic Voltage and Frequency Scaling techniques adaptively scale the voltage and frequency levels of the CPU allowing it to reach just enough performance to process the system workload while meeting throughput constraints, and thereby, reducing the energy consumption. To explore this wide design space for energy efficiency and performance, both for hardware and software components, we provided MPSoCBench features to explore dynamic voltage and frequency scalability (DVFS) and evaluated three mechanisms based on energy estimation and CPU usage rateDoutoradoCiência da ComputaçãoDoutora em Ciência da Computaçã

    Uma ferramenta para modelagem e simulação de computação aproximada em hardware

    Get PDF
    Orientador: Lucas Francisco WannerDissertação (mestrado) - Universidade Estadual de Campinas, Instituto de ComputaçãoResumo: Pesquisas recentes têm introduzido unidades de hardware que produzem resultados incorretos de maneira determinística ou probabilística para um pequeno conjunto de entradas. Por outro lado, permitem um maior desempenho ou um consumo de energia significativamente menor em comparação com versões precisas das mesmas unidades. Como integrar, validar e avaliar essas alternativas em uma arquitetura ou processador, porém, permanece um desafio. A falta de ferramentas para representar e avaliar hardware aproximado leva desenvolvedores a verificar suas soluções de maneira independente, sem considerar interações com outros componentes, exigindo um grande esforço em modelagem e simulação. Neste trabalho, introduzimos ADeLe, uma linguagem de alto nível para descrever, configurar e integrar unidades de hardware aproximado em um processador. ADeLe reduz o esforço de desenvolvimento de hardware aproximado por modelar aproximações em um alto nível de abstração e injetá-las automaticamente em um modelo de processador para simulação arquitetural. Na ferramenta relacionada a ADeLe, aproximações podem modificar ou substituir completamente o comportamento de instruções de hardware através de políticas definidas pelo usuário. As instruções podem ser modificadas deterministicamente ou probabilisticamente (por exemplo, baseado em tensão de operação e frequência). Para proporcionar um ambiente de teste controlado, as aproximações podem ser ligadas e desligadas a partir do software em execução. O consumo de energia é automaticamente computado com base em modelos customizáveis no sistema. Assim, a ferramenta proporciona um método de verificação genérico e flexível, permitindo uma fácil avaliação da troca entre energia e qualidade de aplicações sujeitadas a hardware aproximado. Demonstramos a ferramenta pela introdução de variadas técnicas de aproximação em um modelo de processador, com o qual aplicações selecionadas foram executadas. Ao modelar módulos de hardware aproximado dedicados, mostramos como ADeLe representa unidades aritméticas aproximadas e unidades funcionais de precisão reduzida executando 4 aplicações de processamento de imagens e 2 de computação de ponto flutuante. Com outro método de aproximação, também mostramos como a ferramenta é utilizada para estudar o impacto de memórias alimentadas por tensão ajustável sobre 9 aplicações. Nossos experimentos demonstram as capacidades da ferramenta e como ela pode ser utilizada para gerar processadores virtuais aproximados e avaliar o equilíbrio entre energia e qualidade para diferentes aplicações com esforço reduzidoAbstract: Recent research has introduced approximate hardware units that produce incorrect outputs deterministically or probabilistically for some small subset of inputs. On the other hand, they allow significantly higher throughput or lower power than their error-free counterparts. The integration, validation, and evaluation of these approximate units in architectures and processors, however, remains challenging. The lack of tools to represent and evaluate approximate hardware leads designers to verify their solutions independently, not considering interactions with other components, demanding high-effort modeling and simulation. In this work, we introduce ADeLe, a high-level language for the description, configuration, and integration of approximate hardware units into processors. ADeLe reduces the design effort for approximate hardware by modeling approximations at a high level of abstraction and automatically injecting them into a processor model for architectural simulation. In the ADeLe framework, approximations may modify or completely replace the functional behavior of instructions according to user-defined policies. Instructions may be approximated deterministically or probabilistically (e.g., based on operating voltage and frequency). To allow for controlled testing, approximations may be enabled and disabled from software. Energy is automatically accounted for based on customizable models that consider the potential power savings of the approximations that are enabled in the system. Thus, the framework provides a generic and flexible verification method, allowing for easy evaluation of the energy-quality trade-off of applications subjected to approximate hardware. We demonstrate the framework by introducing different approximation techniques into a processor model, on top of which we run selected applications. Modeling dedicated hardware modules, we show how ADeLe can represent approximate arithmetic and reduced precision computation units executing 4 image processing and 2 floating point applications. Using a different method of approximation, we also show how the framework is used to study the impact of voltage-overscaled memories over 9 applications. Our experiments show the framework capabilities and how it may be used to generate approximate virtual CPUs and to evaluate energy-quality trade-offs for different applications with reduced effortMestradoCiência da ComputaçãoMestre em Ciência da Computação2017/08015-8  FAPES

    OpenISA, um conjunto de instruções híbrido

    Get PDF
    Orientador: Edson BorinTese (doutorado) - Universidade Estadual de Campinas, Instituto de ComputaçãoResumo: OpenISA é concebido como a interface de processadores que pretendem ser altamente flexíveis. Isto é conseguido por meio de três estratégias: em primeiro lugar, o ISA é empiricamente escolhido para ser facilmente traduzido para outros, possibilitando flexibilidade do software no caso de um processador OpenISA físico não estar disponível. Neste caso, não há nenhuma necessidade de aplicar um processador virtual OpenISA em software. O ISA está preparado para ser estaticamente traduzido para outros ISAs. Segundo, o ISA não é um ISA concreto nem um ISA virtual, mas um híbrido com a capacidade de admitir modificações nos opcodes sem afetar a compatibilidade retroativa. Este mecanismo permite que as futuras versões do ISA possam sofrer modificações em vez de extensões simples das versões anteriores, um problema comum com ISA concretos, como o x86. Em terceiro lugar, a utilização de uma licença permissiva permite o ISA ser usado livremente por qualquer parte interessada no projeto. Nesta tese de doutorado, concentramo-nos nas instruções de nível de usuário do OpenISA. A tese discute (1) alternativas para ISAs, alternativas para distribuição de programas e o impacto de cada opção, (2) características importantes de OpenISA para atingir seus objetivos e (3) fornece uma completa avaliação do ISA escolhido com respeito a emulação de desempenho em duas CPUs populares, uma projetada pela Intel e outra pela ARM. Concluímos que a versão do OpenISA apresentada aqui pode preservar desempenho próximo do nativo quando traduzida para outros hospedeiros, funcionando como um modelo promissor para ISAs flexíveis da próxima geração que podem ser facilmente estendidos preservando a compatibilidade. Ainda, também mostramos como isso pode ser usado como um formato de distribuição de programas no nível de usuárioAbstract: OpenISA is designed as the interface of processors that aim to be highly flexible. This is achieved by means of three strategies: first, the ISA is empirically chosen to be easily translated to others, providing software flexibility in case a physical OpenISA processor is not available. Second, the ISA is not a concrete ISA nor a virtual ISA, but a hybrid one with the capability of admitting modifications to opcodes without impacting backwards compatibility. This mechanism allows future versions of the ISA to have real changes instead of simple extensions of previous versions, a common problem with concrete ISAs such as the x86. Third, the use of a permissive license allows the ISA to be freely used by any party interested in the project. In this PhD. thesis, we focus on the user-level instructions of OpenISA. The thesis discusses (1) ISA alternatives, program distribution alternatives and the impact of each choice, (2) important features of OpenISA to achieve its goals and (3) provides a thorough evaluation of the chosen ISA with respect to emulation performance on two popular host CPUs, one from Intel and another from ARM. We conclude that the version of OpenISA presented here can preserve close-to-native performance when translated to other hosts, working as a promising model for next-generation, flexible ISAs that can be easily extended while preserving backwards compatibility. Furthermore, we show how this can also be a program distribution format at user-levelDoutoradoCiência da ComputaçãoDoutor em Ciência da Computação2011/09630-1FAPES

    A Benchmarking Platform For Network-On-Chip (NOC) Multiprocessor System-On- Chips

    Get PDF
    Network-on-Chip (NOC) based designs have garnered significant attention from both researchers and industry over the past several years. The analysis of these designs has focused on broad topics such as NOC component micro-architecture, fault-tolerant communication, and system memory architecture. Nonetheless, the design of lowlatency, high-bandwidth, low-power and area-efficient NOC is extremely complex due to the conflicting nature of these design objectives. Benchmarks are an indispensable tool in the design process; providing thorough measurement and fair comparison between designs in order to achieve optimal results (i.e performance, cost, quality of service). This research proposes a benchmarking platform called NoCBench for evaluating the performance of Network-on-chip. Although previous research has proposed standard guidelines to develop benchmarks for Network-on-Chip, this work moves forward and proposes a System-C based simulation platform for system-level design exploration. It will provide an initial set of synthetic benchmarks for on-chip network interconnection validation along with an initial set of standardized processing cores, NOC components, and system-wide services. The benchmarks were constructed using synthetic applications described by Task Graphs For Free (TGFF) task graphs extracted from the E3S benchmark suite. Two benchmarks were used for characterization: Consumer and Networking. They are characterized based on throughput and latency. Case studies show how they can be used to evaluate metrics beyond throughput and latency (i.e. traffic distribution). The contribution of this work is two-fold: 1) This study provides a methodology for benchmark creation and characterization using NoCBench that evaluates important metrics in NOC design (i.e. end-to-end packet delay, throughput). 2) The developed full-system simulation platform provides a complete environment for further benchmark characterization on NOC based MpSoC as well as system-level design space exploration

    Instruction-set architecture synthesis for VLIW processors

    Get PDF

    Reliable Software for Unreliable Hardware - A Cross-Layer Approach

    Get PDF
    A novel cross-layer reliability analysis, modeling, and optimization approach is proposed in this thesis that leverages multiple layers in the system design abstraction (i.e. hardware, compiler, system software, and application program) to exploit the available reliability enhancing potential at each system layer and to exchange this information across multiple system layers

    GPU-based Architecture Modeling and Instruction Set Extension for Signal Processing Applications

    Get PDF
    The modeling of embedded systems attempts to estimate the performance and costs prior to the implementation. The early stage predictions for performance and power dissipation reduces the more costly late stage design modifications. Workload modeling is an approach where an abstract application is evaluated against an abstract architecture. The challenge in modeling is the balance between fidelity and simplicity, where fidelity refers to the correctness of the predictions and the simplicity relates to the simulation time of the model and its ease of comprehension for the developer. A model named GSLA for performance and power modeling is presented, which extends existing architecture modeling by including GPUs as parallel processing elements. The performance model showed an average fidelity of 93% and the power model demonstrated an average fidelity of 84% between the models and several application measurements. The GSLA model is very simple: only 2 parameters that can be obtained by automated scripts. Besides the modeling, this thesis addresses lower level signal processing system improvements by proposing Instruction Set Architecture (ISA) extensions for RISC-V processors. A vehicle classifier neural network model was used as a case study, in which the benefit of Bit Manipulation Instructions (BMI) is shown. The result is a new PopCount instruction extension that is verified in ETISS simulator. The PopCount extension of RISC-V ISA showed a performance improvement of more than double for the vehicle classifier application. In addition, the design flow for adding a new instruction extension for a re-configurable platform is presented. The GPU modeling and the RISC-V ISA extension added new features to the state of the art. They improve the modeling features as well as reduce the execution costs in signal processing platforms

    Fault-tolerant satellite computing with modern semiconductors

    Get PDF
    Miniaturized satellites enable a variety space missions which were in the past infeasible, impractical or uneconomical with traditionally-designed heavier spacecraft. Especially CubeSats can be launched and manufactured rapidly at low cost from commercial components, even in academic environments. However, due to their low reliability and brief lifetime, they are usually not considered suitable for life- and safety-critical services, complex multi-phased solar-system-exploration missions, and missions with a longer duration. Commercial electronics are key to satellite miniaturization, but also responsible for their low reliability: Until 2019, there existed no reliable or fault-tolerant computer architectures suitable for very small satellites. To overcome this deficit, a novel on-board-computer architecture is described in this thesis.Robustness is assured without resorting to radiation hardening, but through software measures implemented within a robust-by-design multiprocessor-system-on-chip. This fault-tolerant architecture is component-wise simple and can dynamically adapt to changing performance requirements throughout a mission. It can support graceful aging by exploiting FPGA-reconfiguration and mixed-criticality.  Experimentally, we achieve 1.94W power consumption at 300Mhz with a Xilinx Kintex Ultrascale+ proof-of-concept, which is well within the powerbudget range of current 2U CubeSats. To our knowledge, this is the first COTS-based, reproducible on-board-computer architecture that can offer strong fault coverage even for small CubeSats.European Space AgencyComputer Systems, Imagery and Medi

    Seleção de padrões de codigo para sintese de datapaths especializados

    Get PDF
    Orientadores: Rodolfo Jardim de Azevedo, Guido Costa Souza de AraujoDissertação (mestrado) - Universidade Estadual de Campinas, Instituto de Matematica, Estatistica e Computação CientificaMestradoMestre em Matemátic

    Caractérisation automatisée de la consommation de puissance des processeurs pour l'estimation au niveau système

    Get PDF
    RÉSUMÉ De nos jours, la consommation de puissance est une contrainte clé et une métrique de performance essentielle lors du design des systèmes numériques. La dissipation de chaleur excessive sur les circuits intégrés diminue relativement leurs performances. Également, plus que jamais, nous avons le besoin d’augmenter le temps de vie des batteries de nouvelles électroniques portables. Avec les techniques de design classiques, RTL « Register Transfer Level », une estimation de puissance précise est possible seulement aux dernières étapes du processus de développement. Pour remédier à cette problématique, on a récemment proposé dans la littérature de hausser le niveau d’abstraction de la conception de systèmes embarqués à l’aide de la méthodologie de niveau système « Electronic System Level » (ESL). Dans cette perspective, ce travail propose une méthodologie capable de caractériser automatiquement la consommation de puissance des processeurs configurable de type « soft-processors » et de générer un modèle efficace pour l’estimation de l’énergie consommée au niveau système. À l'aide de ce modèle, une étude comparative entre trois techniques d’estimation est donc présentée. Les résultats de cinq programmes tests montrent une estimation de puissance huit mille fois plus rapide que les techniques d’estimation conventionnelles et une erreur moyenne de seulement ±3.98 % pour le processeur LEON3 et de ±10.70 % pour le processeur Microblaze.----------ABSTRACT Nowadays, power consumption is a key constraint and a digital system design essential metric of performance. Excessive heat dissipation of integrated circuits relatively decreases the performance of the system. Also, more than ever, we need to increase the battery lifetime of new portable electronics. With classical design techniques as RTL « Register Transfer Level », precise power estimation is only possible in the final stages of the development process. To solve this problem, the literature recently proposed to raise the abstraction level of embedded systems design, using ESL « Electronic System Level » methodology. In this context, this project proposes a methodology to automatically characterize configurable soft-processors power consumption and generate an effective power model for energy consumption estimation at system level. Using this model, a comparative study between three estimation techniques is also presented. The results of five benchmarks show that our power estimation is eight thousand times faster than conventional estimation techniques and an average error of only ±3.98 % for the LEON3 processor and ±10.70 % for the Microblaze processor
    corecore