Search CORE

88 research outputs found

Verificação de consistência e coerência de memória compartilhada para multiprocessamento em chip

Author: Henschel Olav Philipp
Publication venue
Publication date: 01/01/2014
Field of study

Dissertação (mestrado) - Universidade Federal de Santa Catarina, Centro Tecnológico, Programa de Pós-Graduação em Ciência da Computação, Florianópolis, 2014O multiprocessamento em chip sob a crescente demanda por desempenho leva a um número crescente de núcleos de processamento, que interagem através de uma complexa hierarquia de memória compartilhada, a qual deve obedecer a requisitos de coerência e consistência, capturados na interface hardware-software na forma de um modelo de memória. Dada uma execução de um programa paralelo, verificar se a hierarquia obedece aqueles requisitos é um problema intratável quando a observabilidade do sistema restringe-se a um trace de memória para cada processador, tal como ocorre em um checker dinâmico pós-silício. Esses checkers (baseados em inferências sobre traces) requerem o uso de backtracking para excluir falsos negativos. Por outro lado, checkers pré-silício podem se beneficiar da observabilidade ilimitada de representações de projeto para induzir um problema de verificação que pode ser resolvido em tempo polinomial (sem o uso de backtracking) e com plenas garantias de verificação (sem falsos negativos nem falsos positivos). Esta dissertação faz uma avaliação experimental comparativa de checkers dinâmicos baseados em diferentes mecanismos (inferências, emparelhamento em grafo bipartido, scoreboard única e múltiplas scoreboards). Os checkers são comparados para exatamente o mesmo conjunto de casos de teste: 200 programas paralelos não sincronizados, gerados de forma pseudo-aleatória, obtidos variando a frequência de ocorrência de instruções (4 mixes), o número de endereços compartilhados (entre 2 e 32) e o número total de operações de memória (entre 250 e 64K). A partir de uma mesma representação pré-validada do sistema, foram construídas oito representações derivadas, cada uma contendo um erro de projeto distinto. Para reproduzir condições compatíveis com as tendências arquiteturais, os checkers foram comparados ao verificar um modelo com máxima relaxação de ordem de programa (bastante similar ao usado, por exemplo, nas arquiteturas Alpha e ARMv7) para sistemas contendo de 2 a 32 núcleos de processamento. Não é do conhecimento do autor a existência na literatura de uma avaliação experimental tão ampla. Os resultados mostram a inviabilidade do uso de checkers baseados em inferências em tempo de projeto: têm o mais alto esforço computacional e a maior taxa de crescimento com o aumento do número de processadores. A avaliação indica que a forma mais eficiente de construir um checker pré-silício corresponde a uma observabilidade de três pontos de monitoramento por processador, ao uso de verificação on-the-fly (ao invés de análise post-mortem) e à utilização de múltiplos mecanismos para verificar separadamente e em paralelo os subespaços de verificação definidos pelo escopo individual de cada processador, enquanto os subespaços entre processadores são verificados globalmente. Como um desdobramento da avaliação experimental, a dissertação identifica uma deficiência comum a todos os checkers analisados: sua inadequação para verificar modelos de memória com fraca atomicidade de escrita, exatamente aqueles apontados como tendência e já presentes em arquiteturas recentes (e.g. ARMv8). Diante disso, a dissertação propõe algoritmos generalizados capazes de verificar tais modelos.Abstract: Chip multiprocessing under the growing demand for performance leads to agrowing number of processing cores, which interact through a complex shared memory hierarchy that must satisfy coherence and consistency requirements captured as a memory model in the hardware-software interface. Given an execution of a parallel program, verifying if the hierarchy complies to those requirements is an intractable problem when the system observability is limited to a memory trace per processor, as in dynamic post-silicon checkers.Those checkers (based on inferences over traces) require the use of backtracking to avoid false negatives. On the other hand, pre-silicon checkers may benefit from the unlimited observability of design representations to induce a verification problem that may be solved in polynomial time (without the use of backtracking) with full verification guarantees (i.e. neither false negatives nor false positives). This dissertation provides an experimental evaluation of dynamic checkers based on different mechanisms (inferences, bipartite graph matching, single scoreboard and multiple scoreboards). The checkers are compared under exactly the same set of test cases: 200 non-synchronized parallel programs, generated pseudo-randomly, obtained by varying the frequency of instructions (4 mixes), the number of shared addresses (between 2 and 32) and the total number of memory operations (between 250 and 64K). From the same pre-validated system representation, eight distinct representations were built, each one containing a single and unique design error. To reproduce conditions compatible with architectural trends, the checkers were compared while verifying a memory model with maximal relaxation of program order (similar, for example, to those used in Alpha and ARMv7 architectures) and systems containing 2 to 32 processing cores. To the author's best knowledge, no broader experimental evaluation is available in the literature. The results show that the use of inference-based checkers at design time is impractical: they have the highest computational effort and the highest rate of growth with the number of cores. The evaluation shows that the most efficient way of building a pre-silicon checker corresponds to three observable points per core, the use of on-the-fly analysis (instead of post-mortem) and the usage of multiple engines to check the verification subspaces defined by the scope of each processor independently and in parallel, while checking globally the inter-processor subspaces. As a spin-off from the experimental evaluation, the dissertation identifies a deficiency common to all analyzed checkers: their unsuitability to handle memory models with weak write atomicity, which are precisely those pointed out as the trend and are present in architectures already in the market (e.g. ARMv8). In face of this, the dissertation proposes generic algorithms capable of verifying such models

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Repositório Institucional da UFSC

RCAAP - Repositório Científico de Acesso Aberto de Portugal

Verificação de consistência de memória para sistemas integrados multiprocessados

Author: Rambo Eberle Andrey
Publication venue: Florianópolis, SC
Publication date: 01/01/2011
Field of study

Dissertação (mestrado) - Universidade Federal de Santa Catarina, Centro Tecnológico. Programa de Pós-Graduação em Ciência da Computação.O multiprocessamento em chip (CMP) mudou o panorama arquitetural dos servidores e computadores pessoais e agora está mudando o modo como os dispositivos pessoais móveis são projetados. CMP requer acesso a variáveis compartilhadas em hierarquias multiníveis sofisticadas onde caches privadas e compartilhadas coexistem. Ele se baseia no suporte em hardware para implicitamente gerenciar o relaxamento da ordem de programa e a atomicidade de escrita de modo a fornecer, na interface software-hardware, uma semântica de memória compartilhada bem definida, que é capturada pelos axiomas de um modelo de consistência de memória (MCM). Este trabalho aborda o problema de verificar se uma representação executável do subsistema de memória implementa um MCM especificado. Técnicas convencionais de verificação codificam os axiomas como arestas de um único grafo orientado, inferem arestas extras a partir de traces de memória e indicam um erro quando um ciclo é detectado. Usando uma abordagem diferente, esta dissertação propõe uma nova técnica que decompõe o problema de verificação em múltiplas instâncias de um problema (estendido) de emparelhamento de vértices em grafos bipartidos. Como a decomposição foi judiciosamente projetada para induzir instâncias independentes, o problema-alvo pode ser resolvido por um algoritmo paralelo de verificação. Também é proposto um gerador de sequências de instruções aleatórias distribuídas em múltiplas threads para estimular o sistema de memória sob verificação. Por ser independente do MCM sob verificação, o gerador proposto pode ser utilizado pela maioria dos verificadores. A técnica proposta, que é comprovadamente completa para diversos MCMs, superou um verificador convencional para um conjunto de 2400 casos de uso gerados aleatoriamente. Em média, o verificador proposto encontrou um maior percentual de faltas (90%) comparado ao convencional (69%) e foi, em média, 272 vezes mais rápido.Chip multiprocessing (CMP) changed the architectural landscape of servers and personal computers and is now changing the way personal mobile devices are designed. CMP requires access to shared variables in sophisticated multilevel hierarchies where private and shared caches coexist. It relies on hardware support to implicitly manage relaxed program order and write atomicity so as to provide, at the hardware-software interface, a well-defined sharedmemory semantics, which is captured by the axioms of a memory consistency model (MCM). This dissertation addresses the problem of checking if an executable representation of the memory system complies with a specified consistency model. Conventional verification techniques encode the axioms as edges of a single directed graph, infer extra edges from memory traces, and indicate an error when a cycle is detected. Unlike them, this dissertation proposes a novel technique that decomposes the verification problem into multiple instances of an extended bipartite graph matching problem. Since the decomposition was judiciously designed to induce independent instances, the target problem can be solved by a parallel verification algorithm. To stimulate the memory system under verification, the dissertation also proposes a generator of multi-threading random-instruction sequences. It complies with an arbitrary MCM and can be used by most checkers. Our technique, which is proven to be complete for several MCMs, outperformed a conventional checker for a suite of 2400 randomly-generated use cases. On average, it found a higher percentage of faults (90%) as compared to that checker (69%) and did it, on average, 272 times faster

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Repositório Institucional da UFSC

RCAAP - Repositório Científico de Acesso Aberto de Portugal

Exploiting canonical dependence chains and address biasing constraints to improve random test generation for shared-memory veridication

Author: Andrade Gabriel Arthur Gerber
Publication venue
Publication date: 01/01/2017
Field of study

Dissertação (mestrado) - Universidade Federal de Santa Catarina, Centro Tecnológico, Programa de Pós-Graduação em Ciência da Computação, Florianópolis, 2017.Introdução A verificação funcional do projeto de um sistema com multiprocessamento em chip (CMP) vem se tornando cada vez mais desafiadora por causa da crescente complexidade para suportar a abstração de memória compartilhada coerente, a qual provavelmente manterá seu papel crucial para multiprocessamento em chip, mesmo na escala de centenas de processadores. A verificação funcional baseia-se principalmente na geração de programas de teste aleatórios.Trabalhos Correlatos e Gerador Proposto Embora frameworks de verificação funcional que se baseiam na solução de problemas de satisfação de restrições possuam a vantagem de oferecer uma abordagem unificada para gerar estímulos aleatórios capazes de verificar todo o sistema, eles não são projetados para explorar não-determinismo, que é um importante mecanismo para expôr erros de memória compartilhada. Esta dissertação reporta novas técnicas que se baseiam em lições aprendidas de ambos? os frameworks de verificação de propósitos gerais e as abordagens especializadas em verificar o modelo de memória. Elas exploram restrições sobre endereços e cadeias canônicas de dependência para melhorar a geração de testes aleatórios enquanto mantêm o papel crucial do não-determinismo como um mecanismo-chave para a exposição de erros. Geração de Sequências Ao invés de selecionar instruções aleatoriamente, como faz uma técnica convencional, o gerador proposto seleciona instruções de acordo com cadeias de dependências pré-definidas que são comprovadamente significativas para preservar o modelo de memória sob verificação. Esta dissertação explora cadeias canônicas, definidas por Gharachorloo, para evitar a indução de instruções que, sendo desnecessárias para preservar o modelo de memória sob verificação, resultem na geração de testes ineficazes. Assinalamento de Endereços Em vez de selecionar aleatoriamente padrões binários para servir de endereços efetivos de memória, como faz um gerador convencional, o gerador proposto aceita restrições à formação desses endereços de forma a forçar o alinhamento de objetos em memória, evitar falso compartilhamento entre variáveis e especificar o grau de competição de endereços por uma mesma linha de cache. Avaliação Experimental Um novo gerador, construído com as técnicas propostas, foi comparado com um gerador convencional de testes aleatórios. Ambos foram avaliados em arquiteturas de 8, 16, e 32 núcleos, ao sintetizar 1200 programas de testes distintos para verificar 5 projetos derivados, cada um contendo um diferente tipo de erro (6000 casos de uso por arquitetura). Os testes sintetizados exploraram uma ampla variedade de parâmetros de geração (5 tamanhos de programas, 4 quantidades de posições compartilhadas de memória, 4 mixes de instruções, e 15 sementes aleatórias). Os resultados experimentais mostram que, em comparação com um convencional, o novo gerador tende a expor erros para um maior número de configurações dos parâmetros: ele aumentou em 38% o potencial de expor erros de projeto. Pela análise dos resultados da verificação sobre todo o espectro de parâmetros, descobriu-se que os geradores requerem um número bastante distinto de posições de memória para alcançar sua melhor exposição. Os geradores foram comparados quando cada um explorou a quantidade de posições de memória correspondente à sua melhor exposição. Nestas condições, quando destinados a projetos com 32 núcleos através da exploração de todo o espectro de tamanhos de testes, o novo gerador expôs um tipo de erro tão frequentemente quanto a técnica convencional, dois tipos com 11% mais frequência, um tipo duas vezes, e um tipo 4 vezes mais frequentemente. Com os testes mais longos (64000 operações) ambos os geradores foram capazes de expor todos os tipos de erros, mas o novo gerador precisou de 1,5 a 15 vezes menor esforço para expor cada erro, exceto por um (para o qual uma degradação de 19% foi observada). Conclusões e Perspectivas Com base na avaliação realizada, conclui-se que, quando se escolhe um número suficientemente grande de variáveis compartilhadas como parâmetro, o gerador proposto requer programas de teste mais curtos para expor erros de projeto e, portanto, resulta em menor esforço, quando comparado a um gerador convencional.Abstract : Albeit general functional processor verification frameworks relying on the solution of constraint satisfaction problems have the advantage of offering a unified approach for generating random stimuli to verify the whole system, they are not designed to exploit non-determinism, which is an important mechanism to expose shared-memory errors. This dissertation reports new techniques that build upon the lessons learned from both - the general verification frameworks and the approaches specifically targeting memory-model verification. They exploit address biasing constraints and canonical dependence chains to improve random test generation while keeping the crucial role of non-determinism as a key mechanism to error exposure. A new generator, built with the proposed techniques, was compared to a conventional random test generator. Both were evaluated for 8, 16, and 32-core architectures, when synthesizing 1200 distinct test programs for verifying 5 derivative designs containing each a different type of error (6000 use cases per architecture). The synthesized tests explored a wide variety of generation parameters (5 program sizes, 4 shared-location counts, 4 instruction mixes, and 15 random seeds). The experimental results show that, as compared to a conventional one, the new generator tends to expose errors for a larger number of parameter settings: it increased by 38% the potential for exposing design errors. By analyzing the verification out-comes over the full parameter ranges, we found out that the generators require quite distinct numbers of shared locations to reach best exposure. We compared them when each generator exploited the location count leading to its best exposure. In such conditions, when targeting32-core designs by exploring the whole range of test lengths, the new generator exposed one type of error as often as the conventional technique, two types 11% more often, one type twice as often, and one type4 times as often. With the longest tests (64000 operations) both generators were able to expose all types of errors, but the new generator required from 1.5 to 15 times less effort to expose each error, except for one (for which a degradation of 19% was observed)

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Repositório Institucional da UFSC

Aceleradores e multiprocessadores em chip: o impacto da execução fora de ordem na verificação de funcionalidade e de consistência

Author: Freitas Leandro da Silva
Publication venue
Publication date: 01/01/2012
Field of study

Dissertação (mestrado) - Universidade Federal de Santa Catarina, Centro Tecnológico, Programa de Pós-Graduação em Ciência da Computação, Florianópolis, 2012Este trabalho aborda duas classes de problemas enfrentados na verificação de projetos que exibem comportamentos fora de ordem, especificamente a verificação funcional de aceleradores em hardware e a verificação de consistência em sistemas de memória compartilhada. Comportamentos fora de ordem surgem quando relaxam-se restrições de precedência para aumentar a taxa de uso de componentes de hardware concorrentes e, portanto, aumentar o desempenho. Entretanto, o projeto de um sistema que apresenta comportamentos fora de ordem é suscetível a erros pelo fato de o relaxamento de ordem requerer controle sofisticado. Este trabalho compara as garantias de verificação de três classes de checkers dinâmicos para módulos com suporte a eventos fora de ordem. Comprovadamente, scoreboards relaxados podem ser construídos com plenas garantias de verificação contanto que utilizem regras de atualização baseadas na remoção de dominadores. Resultados experimentais mostram que um scoreboard relaxado assim projetado requer aproximadamente 1/2 do esforço exigido por um scoreboard convencional. Verificar a conformidade do hardware com um modelo de consistência é um problema relevante cuja complexidade depende da observabilidade dos eventos de memória. Este trabalho também descreve uma nova técnica de verificação de consistência de memória on-the-fly a partir de uma representação executável de um sistema multi-core. Para aumentar a eficiência sem afetar as garantias de verificação, são monitorados três pontos por núcleo, ao invés de um ou dois, como proposto em trabalhos correlatos anteriores. Os três pontos foram selecionados para serem altamente independentes da microarquitetura do core. A técnica usa scoreboards relaxados concorrentes para detectar violações em cada core. Para detectar violações globais, utiliza-se a ordem linear de eventos induzida por um caso de teste. Comprovadamente, a técnica não induz falsos positivos nem falsos negativos quando o caso de teste expõe um erro que afeta as sequências monitoradas, tornando-se o primeiro checker on-the-fly com plenas garantias de verificação. Resultados experimentais mostram que ele requer aproximadamente 1/4 a 3/4 do esforço global exigido por um checker post-mortem que monitora duas sequências por processador. A técnica é pelo menos 100 vezes mais rápida do que um checker que monitora uma única sequência por processador.Abstract : This work addresses two classes of problems faced when verifying designs exhibiting out-of-order behaviors, namely the functional verification of hardware accelerators and the verification of consistency in shared-memory systems. Out-of-order behaviors result from relaxing precedence constraints to increase the usage rate of concurrent hardware components and, therefore, lead to a performance improvement. However, the design of a system handling out-of-order behaviors is error prone, since order relaxation asks for sophisticated control. This work compares the verification guarantees of three classes of dynamic checkers for modules handling out-of-order behaviors. Provenly, relaxed scoreboards can be built with full verification guarantees, as far as they employ an update rule based on the removal of dominators. Experimental results show that such a relaxed scoreboard needs approximately 1/2 of the effort required by a conventional one. Verifying the hardware compliance with a consistency model is a relevant problem, whose complexity depends on the observability of memory events. This work also describes a novel on-the-fly technique for verifying memory consistency from an executable representation of a multi-core system. To increase efficiency without hampering verification guarantees, three points are monitored per core, instead of one or two, as proposed in previous related works. The points were selected to be largely independent from the core#s microarchitecture. The technique relies on concurrent relaxed scoreboards to check for consistency violations in each core. To check for global violations, it employs a linear order of events induced by a given test case. Provenly, the technique neither indicates false negatives nor false positives when the test case exposes an error that affects the sampled sequences, making it the first on-the-fly checker with full guarantees. Experimental results show that it needs approximately 1/4 to 3/4 of the overall verification effort required by a post-mortem checker sampling two sequences per processor. The technique is at least 100 times faster than a checker sampling a single sequence per processor

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Repositório Institucional da UFSC

A configurable vector processor for accelerating speech coding algorithms

Author: Konstantia Koutsomyti (7201031)
Publication venue
Publication date: 01/01/2007
Field of study

The growing demand for voice-over-packer (VoIP) services and multimedia-rich applications has made increasingly important the efficient, real-time implementation of low-bit rates speech coders on embedded VLSI platforms. Such speech coders are designed to substantially reduce the bandwidth requirements thus enabling dense multichannel gateways in small form factor. This however comes at a high computational cost which mandates the use of very high performance embedded processors. This thesis investigates the potential acceleration of two major ITU-T speech coding algorithms, namely G.729A and G.723.1, through their efficient implementation on a configurable extensible vector embedded CPU architecture. New scalar and vector ISAs were introduced which resulted in up to 80% reduction in the dynamic instruction count of both workloads. These instructions were subsequently encapsulated into a parametric, hybrid SISD (scalar processor)–SIMD (vector) processor. This work presents the research and implementation of the vector datapath of this vector coprocessor which is tightly-coupled to a Sparc-V8 compliant CPU, the optimization and simulation methodologies employed and the use of Electronic System Level (ESL) techniques to rapidly design SIMD datapaths

Loughborough University Institutional Repository

Performance and area evaluations of processor-based benchmarks on FPGA devices

Author: Jiunn-Tyng Kao (7215932)
Publication venue
Publication date: 01/01/2014
Field of study

The computing system on SoCs is being long-term research since the FPGA technology has emerged due to its personality of re-programmable fabric, reconfigurable computing, and fast development time to market. During the last decade, uni-processor in a SoC is no longer to deal with the high growing market for complex applications such as Mobile Phones audio and video encoding, image and network processing. Due to the number of transistors on a silicon wafer is increasing, the recent FPGAs or embedded systems are advancing toward multi-processor-based design to meet tremendous performance and benefit this kind of systems are possible. Therefore, is an upcoming age of the MPSoC. In addition, most of the embedded processors are soft-cores, because they are flexible and reconfigurable for specific software functions and easy to build homogenous multi-processor systems for parallel programming. Moreover, behavioural synthesis tools are becoming a lot more powerful and enable to create datapath of logic units from high-level algorithms such as C to HDL and available for partitioning a HW/SW concurrent methodology. A range of embedded processors is able to implement on a FPGA-based prototyping to integrate the CPUs on a programmable device. This research is, firstly represent different types of computer architectures in modern embedded processors that are followed in different type of software applications (eg. Multi-threading Operations or Complex Functions) on FPGA-based SoCs; and secondly investigate their capability by executing a wide-range of multimedia software codes (Integer-algometric only) in different models of the processor-systems (uni-processor or multi-processor or Co-design), and finally compare those results in terms of the benchmarks and resource utilizations within FPGAs. All the examined programs were written in standard C and executed in a variety numbers of soft-core processors or hardware units to obtain the execution times. However, the number of processors and their customizable configuration or hardware datapath being generated are limited by a target FPGA resource, and designers need to understand the FPGA-based tradeoffs that have been considered - Speed versus Area. For this experimental purpose, I defined benchmarks into DLP / HLS catalogues, which are "data" and "function" intensive respectively. The programs of DLP will be executed in LEON3 MP and LE1 CMP multi-processor systems and the programs of HLS in the LegUp Co-design system on target FPGAs. In preliminary, the performance of the soft-core processors will be examined by executing all the benchmarks. The whole story of this thesis work centres on the issue of the execute times or the speed-up and area breakdown on FPGA devices in terms of different programs

Loughborough University Institutional Repository

Decompose and Conquer: Addressing Evasive Errors in Systems on Chip

Author: Lee Doowon
Publication venue
Publication date
Field of study

Modern computer chips comprise many components, including microprocessor cores, memory modules, on-chip networks, and accelerators. Such system-on-chip (SoC) designs are deployed in a variety of computing devices: from internet-of-things, to smartphones, to personal computers, to data centers. In this dissertation, we discuss evasive errors in SoC designs and how these errors can be addressed efficiently. In particular, we focus on two types of errors: design bugs and permanent faults. Design bugs originate from the limited amount of time allowed for design verification and validation. Thus, they are often found in functional features that are rarely activated. Complete functional verification, which can eliminate design bugs, is extremely time-consuming, thus impractical in modern complex SoC designs. Permanent faults are caused by failures of fragile transistors in nano-scale semiconductor manufacturing processes. Indeed, weak transistors may wear out unexpectedly within the lifespan of the design. Hardware structures that reduce the occurrence of permanent faults incur significant silicon area or performance overheads, thus they are infeasible for most cost-sensitive SoC designs. To tackle and overcome these evasive errors efficiently, we propose to leverage the principle of decomposition to lower the complexity of the software analysis or the hardware structures involved. To this end, we present several decomposition techniques, specific to major SoC components. We first focus on microprocessor cores, by presenting a lightweight bug-masking analysis that decomposes a program into individual instructions to identify if a design bug would be masked by the program's execution. We then move to memory subsystems: there, we offer an efficient memory consistency testing framework to detect buggy memory-ordering behaviors, which decomposes the memory-ordering graph into small components based on incremental differences. We also propose a microarchitectural patching solution for memory subsystem bugs, which augments each core node with a small distributed programmable logic, instead of including a global patching module. In the context of on-chip networks, we propose two routing reconfiguration algorithms that bypass faulty network resources. The first computes short-term routes in a distributed fashion, localized to the fault region. The second decomposes application-aware routing computation into simple routing rules so to quickly find deadlock-free, application-optimized routes in a fault-ridden network. Finally, we consider general accelerator modules in SoC designs. When a system includes many accelerators, there are a variety of interactions among them that must be verified to catch buggy interactions. To this end, we decompose such inter-module communication into basic interaction elements, which can be reassembled into new, interesting tests. Overall, we show that the decomposition of complex software algorithms and hardware structures can significantly reduce overheads: up to three orders of magnitude in the bug-masking analysis and the application-aware routing, approximately 50 times in the routing reconfiguration latency, and 5 times on average in the memory-ordering graph checking. These overhead reductions come with losses in error coverage: 23% undetected bug-masking incidents, 39% non-patchable memory bugs, and occasionally we overlook rare patterns of multiple faults. In this dissertation, we discuss the ideas and their trade-offs, and present future research directions.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/147637/1/doowon_1.pd

Deep Blue Documents at the University of Michigan

Exploring resource/performance trade-offs for streaming applications on embedded multiprocessors

Author: Yang Yang
Publication venue: Technische Universiteit Eindhoven
Publication date: 01/01/2012
Field of study

Embedded system design is challenged by the gap between the ever-increasing customer demands and the limited resource budgets. The tough competition demands ever-shortening time-to-market and product lifecycles. To solve or, at least to alleviate, the aforementioned issues, designers and manufacturers need model-based quantitative analysis techniques for early design-space exploration to study trade-offs of different implementation candidates. Moreover, modern embedded applications, especially the streaming applications addressed in this thesis, face more and more dynamic input contents, and the platforms that they are running on are more flexible and allow runtime configuration. Quantitative analysis techniques for embedded system design have to be able to handle such dynamic adaptable systems. This thesis has the following contributions: - A resource-aware extension to the Synchronous Dataflow (SDF) model of computation. - Trade-off analysis techniques, both in the time-domain and in the iterationdomain (i.e., on an SDF iteration basis), with support for resource sharing. - Bottleneck-driven design-space exploration techniques for resource-aware SDF. - A game-theoretic approach to controller synthesis, guaranteeing performance under dynamic input. As a first contribution, we propose a new model, as an extension of static synchronous dataflow graphs (SDF) that allows the explicit modeling of resources with consistency checking. The model is called resource-aware SDF (RASDF). The extension enables us to investigate resource sharing and to explore different scheduling options (ways to allocate the resources to the different tasks) using state-space exploration techniques. Consistent SDF and RASDF graphs have the property that an execution occurs in so-called iterations. An iteration typically corresponds to the processing of a meaningful piece of data, and it returns the graph to its initial state. On multiprocessor platforms, iterations may be executed in a pipelined fashion, which makes performance analysis challenging. As the second contribution, this thesis develops trade-off analysis techniques for RASDF, both in the time-domain and in the iteration-domain (i.e., on an SDF iteration basis), to dimension resources on platforms. The time-domain analysis allows interleaving of different iterations, but the size of the explored state space grows quickly. The iteration-based technique trades the potential of interleaving of iterations for a compact size of the iteration state space. An efficient bottleneck-driven designspace exploration technique for streaming applications, the third main contribution in this thesis, is derived from analysis of the critical cycle of the state space, to reveal bottleneck resources that are limiting the throughput. All techniques are based on state-based exploration. They enable system designers to tailor their platform to the required applications, based on their own specific performance requirements. Pruning techniques for efficient exploration of the state space have been developed. Pareto dominance in terms of performance and resource usage is used for exact pruning, and approximation techniques are used for heuristic pruning. Finally, the thesis investigates dynamic scheduling techniques to respond to dynamic changes in input streams. The fourth contribution in this thesis is a game-theoretic approach to tackle controller synthesis to select the appropriate schedules in response to dynamic inputs from the environment. The approach transforms the explored iteration state space of a scenario- and resource-aware SDF (SARA SDF) graph to a bipartite game graph, and maps the controller synthesis problem to the problem of finding a winning positional strategy in a classical mean payoff game. A winning strategy of the game can be used to synthesize the controller of schedules for the system that is guaranteed to satisfy the throughput requirement given by the designer

Repository TU/e

Pure OAI Repository

Co-simulation techniques based on virtual platforms for SoC design and verification in power electronics applications

Author: Díaz Llerena Edel
Publication venue
Publication date: 01/01/2022
Field of study

En las últimas décadas, la inversión en el ámbito energético ha aumentado considerablemente. Actualmente, existen numerosas empresas que están desarrollando equipos como convertidores de potencia o máquinas eléctricas con sistemas de control de última generación. La tendencia actual es usar System-on-chips y Field Programmable Gate Arrays para implementar todo el sistema de control. Estos dispositivos facilitan el uso de algoritmos de control más complejos y eficientes, mejorando la eficiencia de los equipos y habilitando la integración de los sistemas renovables en la red eléctrica. Sin embargo, la complejidad de los sistemas de control también ha aumentado considerablemente y con ello la dificultad de su verificación. Los sistemas Hardware-in-the-loop (HIL) se han presentado como una solución para la verificación no destructiva de los equipos energéticos, evitando accidentes y pruebas de alto coste en bancos de ensayo. Los sistemas HIL simulan en tiempo real el comportamiento de la planta de potencia y su interfaz para realizar las pruebas con la placa de control en un entorno seguro. Esta tesis se centra en mejorar el proceso de verificación de los sistemas de control en aplicaciones de electrónica potencia. La contribución general es proporcionar una alternativa a al uso de los HIL para la verificación del hardware/software de la tarjeta de control. La alternativa se basa en la técnica de Software-in-the-loop (SIL) y trata de superar o abordar las limitaciones encontradas hasta la fecha en el SIL. Para mejorar las cualidades de SIL se ha desarrollado una herramienta software denominada COSIL que permite co-simular la implementación e integración final del sistema de control, sea software (CPU), hardware (FPGA) o una mezcla de software y hardware, al mismo tiempo que su interacción con la planta de potencia. Dicha plataforma puede trabajar en múltiples niveles de abstracción e incluye soporte para realizar co-simulación mixtas en distintos lenguajes como C o VHDL. A lo largo de la tesis se hace hincapié en mejorar una de las limitaciones de SIL, su baja velocidad de simulación. Se proponen diferentes soluciones como el uso de emuladores software, distintos niveles de abstracción del software y hardware, o relojes locales en los módulos de la FPGA. En especial se aporta un mecanismo de sincronizaron externa para el emulador software QEMU habilitando su emulación multi-core. Esta aportación habilita el uso de QEMU en plataformas virtuales de co-simulacion como COSIL. Toda la plataforma COSIL, incluido el uso de QEMU, se ha analizado bajo diferentes tipos de aplicaciones y bajo un proyecto industrial real. Su uso ha sido crítico para desarrollar y verificar el software y hardware del sistema de control de un convertidor de 400 kVA

e_Buah - Biblioteca Digital de la Universidad de Alcalá