8 research outputs found

    FPU designs with NEM relays

    Get PDF
    Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2013.This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.Includes bibliographical references (pages 71-74).Nano-electromechanical (NEM) relays are an alternative to CMOS transistors as the fabric of digital circuits. Circuits with NEM relays offer energy-efficiency benefits over CMOS since they have zero leakage power and are strategically designed to maintain throughput that is competitive with CMOS despite their slow actuation times. The floating-point unit (FPU) is the most complex arithmetic unit in a computational system. This thesis investigates if the energy-efficiency promise of NEM relays demonstrated before on smaller circuit blocks holds for complex computational structures such as the FPU. The energy, performance, and area trade-offs of FPU designs with NEM relays are examined and compared with that of state-of-the-art CMOS designs in an equivalent scaled process. Circuits that are critical path bottlenecks, including primarily the leading zero detector (LZD) and leading zero anticipator (LZA) blocks, are carefully identified and optimized for low latency and device count. We manage to drop the NEM relay FPU latency from 71 mechanical delays in a CMOS-style implementation to 16 mechanical delays in a NEM relay pass-logic style implementation. The FPU designed with NEM relays features 15x lower energy per operation compared to CMOS.by Sumit Dutta.S.M

    Solving Constraints on the Intermediate Result of Decimal Floating-Point Operations

    Full text link
    The draft revision of the IEEE Standard for Floating-Point Arithmetic (IEEE P754) includes a definition for dec-imal floating-point (FP) in addition to the widely used bi-nary FP specification. The decimal standard raises new concerns with regard to the verification of hardware- and software-based designs. The verification process normally emphasizes intricate cor-ner cases and uncommon events. The decimal format intro-duces several new classes of such events in addition to those characteristic of binary FP. Our work addresses the following problem: Given a dec-imal floating-point operation, a constraint on the interme-diate result, and a constraint on the representation selected for the result, find random inputs for the operation that yield an intermediate result compatible with these specifications. The paper supplies efficient analytic solutions for addi-tion and for some cases of multiplication and division. We provide probabilistic algorithms for the remaining cases. These algorithms prove to be efficient in the actual imple-mentation.

    Symbolic crosschecking of data-parallel floating-point code

    Get PDF

    A Functional Verification Methodology for an Improved Coverage of System-on-Chips

    Get PDF
    The increasing popularity of System-on-Chip (SoC) circuits results in many new design challenges. One major challenge is to ensure the functional correctness of such complicated circuits. Functional verification is a verification technique used to verify the functional correctness of SoCs. Coverage Directed Test Generation (CDTG) is an essential part of functional verification, where the objective is to generate input stimulus that maximize the coverage of a design. Coverage helps to determine how well the input stimulus verified the design under verification. CDTG techniques analyze coverage results and adapt the input stimulus generation process to improve the coverage. One of the important component of CDTG based tools is the constraint solver. The time efficiency of the verification process depends on the efficiency of the solver. But the constraint solvers associated with CDTG tools require large amount of memory and time to generate input stimuli for SoCs. The solvers cannot generate solutions which are evenly distributed in search space, in order to attain the required coverage. The aim of this thesis is to provide a practical framework that enables the generation of evenly distributed input stimuli. A basic feature of the search space (data set) is that it contains k sub populations or clusters. Partitioning the search space into clusters and generating solutions from the partitions can improve the evenness of the solutions generated by the solver. Hence one of our main contribution is a novel domain partitioning algorithm. The domain partitioning algorithm relies on solution generated by a consistency search algorithm developed for our purpose. The number of partitions (required by the domain partitioning algorithm) is determined by using an algorithm which can find the optimal number of clusters present in a data set. To demonstrate the effectiveness of our approach, we apply our methodology on Constraint Satisfaction Problems (CSPs) and some real life applications

    Verification of the Performance Properties of Embedded Streaming Applications via Constraint-Based Scheduling

    Get PDF
    RÉSUMÉ Les capacités et, en conséquence, la complexité de la conception de systèmes embarqués ont énormément augmenté ces dernières années, surfant sur la vague de la loi de Moore. Au contraire, le temps de mise en marché a diminué, ce qui oblige les concepteurs à faire face à certains défis, ce qui les poussent à adopter de nouvelles méthodes de conception pour accroître leur productivité. En réponse à ces nouvelles pressions, les systèmes modernes ont évolué vers des technologies multiprocesseurs sur puce. De nouvelles architectures sont apparues dans le multitraitement sur puce afin d'utiliser les énormes progrès des technologies de fabrication. Les systèmes multiprocesseurs sur puce (MPSoCs) ont été adoptés comme plates-formes appropriées pour l'exécution d'applications embarquées complexes. Pour réduire le coût de la plate-forme matérielle, les applications partagent des ressources, ce qui peut entraîner des interférences dans le temps entre les applications dues à des conflits dans la demande des ressources. Les caractéristiques d'un SoC typique imposent de grands défis sur la vérification SoC à deux égards. Tout d'abord, la grande échelle de l'intégration du matériel mène à des interactions matériel-matériel sophistiquées. Puisqu’un SoC a de multiples composants, les interactions entre ceux-ci pourraient donner lieu à des propriétés émergentes qui ne sont pas présentes dans un seul composant. En second lieu, l'introduction de logiciels dans le comportement du matériel mène à des interactions matériel-logiciel sophistiqué. Puisqu’un SoC a au moins un processeur, le logiciel constitue une nouvelle dimension des comportements du SoC et donc apporte une nouvelle dimension à la vérification. Cela rend la vérification d'une tâche difficile, en particulier pour les applications de communication et de multimédia. Cela est dû à des contraintes non-fonctionnelles des modules matériel et logiciel, tels que la vitesse du processeur, la taille de la mémoire tampon, le budget de l'énergie, la politique de planification, et la combinaison de multiples applications. Cette thèse préconise la programmation par contraintes (CP) comme un outil puissant pour la vérification des mesures de performance de MPSoCs. Dans ce travail, nous avons considéré des applications de diffusion sur l'architecture cible d’un système-sur-puce (MPSoC) multi-processeur comme un problème d'ordonnancement à base de contraintes. Nous l’avons testé séparément et en interaction avec d'autres types d'applications. L'idée est de créer un scénario au niveau du système qui prend en compte le flux de travail au niveau du système par rapport aux ressources du système et des exigences de performance, à savoir les délais de la tâche, le temps de réponse, le CPU et l’utilisation de la mémoire, ainsi que la taille de la mémoire tampon. Plus précisément, nous examinons si le comportement des différentes interactions entre les composants du système d'exécution des tâches différentes peut être efficacement exprimé comme un problème d'ordonnancement à base de contraintes sur l'espace des entrées possibles du système, afin de déterminer si nous pouvons traiter des cas similaires d'échec en utilisant ce modèle. Résoudre ce problème consiste à trouver une meilleure façon d’inspecter le système en cours de vérification dans une phase de conception qui arrive très tôt et dans un délai beaucoup plus raisonnable. Notre approche proposée a été testée avec diverses applications, différents flux d'entrée et des architectures différentes. Nous avons construit notre modèle en prenant en considération les architectures existantes sur le marché, des applications choisies qui sont en courante et comparé les résultats de notre modèle avec les résultats provenant de l'exécution des applications réelles sur le système. Les résultats montrent que la méthode permet de déterminer les conditions de défaillance du système dans une fraction du temps nécessaire à la vérification par simulation. Il donne à l’ingénieur d’essai la possibilité d'explorer l'espace de conception et d'en déduire la meilleure politique. Il contribue également à choisir une architecture appropriée pour des applications en cours d'exécution.----------ABSTRACT The abilities and, accordingly, the design complexity of embedded systems have expanded enormously in recent years, riding the wave of Moore’s law. On the contrary, time to market has shrunk, forcing challenges onto designers who in turn, seek to adopt new design methods to increase their productivity. As a response to these new pressures, modern-day systems have moved towards on-chip multiprocessing technologies. New architectures have emerged in on-chip multiprocessing in order to utilize the tremendous advances of fabrication technology. Multiprocessor Systems on a Chip (MPSoCs) were adopted as suitable platforms for executing complex embedded applications. To reduce the cost of the hardware platform, applications share resources, which may result in inter-application timing interference due to resource request conflicts. The features of a typical SoC impose great challenges on SoC verification in two respects. First, the large scale of hardware integration leads to sophisticated hardware-hardware interactions. Since a SoC has multiple components, the interactions between them could give rise to emerging properties that are not present in any single component. Second, the introduction of software into hardware behaviour leads to sophisticated hardware-software interactions. Since an SoC has at least one processor, software forms a new dimension of the SoC’s behaviours and hence brings a new dimension to verification. This makes verification a challenging task, in particular for communication and multimedia applications. This is due to the non-functional constraints of hardware and software modules, such as processor speed, buffer size, energy budget, and scheduling policy, and the combination of multiple applications. This thesis advocates Constraint Programming (CP) as a powerful tool for the verification of performance metrics of MPSoCs. In this work, we mapped streaming applications onto a target Multi-Processor System-on-Chip (MPSoC) architecture as a constraint-based scheduling problem. We tested it separately and in interaction with other application types. The idea is to create a system-level scenario that takes into account the system level work-flow with respect to System resources and performance requirements, namely task deadlines, response time, CPU and memory usage, and buffer size. Specifically, we investigate whether the behaviour of different interactions among system components executing different tasks can be effectively re-expressed as a constraint-based scheduling problem over the space of possible inputs to the system, finding if we can address similar cases of failure using this model. Solving this problem means finding a better way to investigate and verify the System under verification in a very early design stage and in a much more reasonable time. Our proposed approach was tested with various applications, different input streams and different architectures. We built our model for existing architectures on the market running chosen applications and compared our model results with the results coming from running the actual applications on the system. Results show that the methodology is able to identify system failure conditions in a fraction of the time needed by simulation-based verification. It gives the Test Engineer the ability to explore the design space and deduce the best policy. It also helps choose a proper architecture for the applications running

    Symbolic execution of verification languages and floating-point code

    Get PDF
    The focus of this thesis is a program analysis technique named symbolic execution. We present three main contributions to this field. First, an investigation into comparing several state-of-the-art program analysis tools at the level of an intermediate verification language over a large set of benchmarks, and improvements to the state-of-the-art of symbolic execution for this language. This is explored via a new tool, Symbooglix, that operates on the Boogie intermediate verification language. Second, an investigation into performing symbolic execution of floating-point programs via a standardised theory of floating-point arithmetic that is supported by several existing constraint solvers. This is investigated via two independent extensions of the KLEE symbolic execution engine to support reasoning about floating-point operations (with one tool developed by the thesis author). Third, an investigation into the use of coverage-guided fuzzing as a means for solving constraints over finite data types, inspired by the difficulties associated with solving floating-point constraints. The associated prototype tool, JFS, which builds on the LibFuzzer project, can at present be applied to a wide range of SMT queries over bit-vector and floating-point variables, and shows promise on floating-point constraints.Open Acces

    Symbolic Crosschecking of Data-Parallel Floating Point Code

    No full text
    In this thesis we present a symbolic execution-based technique for cross-checking programs accelerated using SIMD or OpenCL against an unaccelerated version, as well as a technique for detecting data races in OpenCL programs. Our techniques are implemented in KLEE-CL, a symbolic execution engine based on KLEE that supports symbolic reasoning on the equivalence between expressions involving both integer and floating-point operations. While the current generation of constraint solvers provide good support for integer arithmetic, there is little support available for floating-point arithmetic, due to the complexity inherent in such computations. The key insight behind our approach is that floating-point values are only reliably equal if they are essentially built by the same operations. This allows us to use an algorithm based on symbolic expression matching augmented with canonicalisation rules to determine path equivalence. Under symbolic execution, we have to verify equivalence along every feasible control-flow path. We reduce the branching factor of this process by aggressively merging conditionals, if-converting branches into select operations via an aggressive phi-node folding transformation. To support the Intel Streaming SIMD Extension (SSE) instruction set, we lower SSE instructions to equivalent generic vector operations, which in turn are interpreted in terms of primitive integer and floating-point operations. To support OpenCL programs, we symbolically model the OpenCL environment using an OpenCL runtime library targeted to symbolic execution. We detect data races by keeping track of all memory accesses using a memory log, and reporting a race whenever we detect that two accesses conflict. By representing the memory log symbolically, we are also able to detect races associated with symbolically indexed accesses of memory objects. We used KLEE-CL to find a number of issues in a variety of open source projects that use SSE and OpenCL, including mismatches between implementations, memory errors, race conditions and compiler bugs

    FPgen - A Test Generation Framework for Datapath Floating-Point Verification

    No full text
    FPgen is a new test generation framework targeted toward the verification of the floating-point (FP) datapath, through the generation of test cases. This framework provides the capacity to define virtually any architectural FP coverage model, consisting of verification tasks. The tool supplies strong constraint solving capabilities, allowing the generation of random tests that target these tasks. We present an overview of FPgen's functionality, describe the results of its use for the verification of several FP units, and compare its efficiency with existing test generators. 1
    corecore