312 research outputs found

    AutoModel: Automatic Synthesis of Models from Communication Traces of SoC Designs

    Full text link
    Modeling system-level behaviors of intricate System-on-Chip (SoC) designs is crucial for design analysis, testing, and validation. However, the complexity and volume of SoC traces pose significant challenges in this task. This paper proposes an approach to automatically infer concise and abstract models from SoC communication traces, capturing the system-level protocols that govern message exchange and coordination between design blocks for various system functions. This approach, referred to as model synthesis, constructs a causality graph with annotations obtained from the SoC traces. The annotated causality graph represents all potential causality relations among messages under consideration. Next, a constraint satisfaction problem is formulated from the causality graph, which is then solved by a satisfiability modulo theories (SMT) solver to find satisfying solutions. Finally, finite state models are extracted from the generated solutions, which can be used to explain and understand the input traces. The proposed approach is validated through experiments using synthetic traces obtained from simulating a transaction-level model of a multicore SoC design and traces collected from running real programs on a realistic multicore SoC modeled with gem5.Comment: IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 202

    Validation and verification of the interconnection of hardware intellectual property blocks for FPGA-based packet processing systems

    Get PDF
    As networks become more versatile, the computational requirement for supporting additional functionality increases. The increasing demands of these networks can be met by Field Programmable Gate Arrays (FPGA), which are an increasingly popular technology for implementing packet processing systems. The fine-grained parallelism and density of these devices can be exploited to meet the computational requirements and implement complex systems on a single chip. However, the increasing complexity of FPGA-based systems makes them susceptible to errors and difficult to test and debug. To tackle the complexity of modern designs, system-level languages have been developed to provide abstractions suited to the domain of the target system. Unfortunately, the lack of formality in these languages can give rise to errors that are not caught until late in the design cycle. This thesis presents three techniques for verifying and validating FPGA-based packet processing systems described in a system-level description language. First, a type system is applied to the system description language to detect errors before implementation. Second, system-level transaction monitoring is used to observe high-level events on-chip following implementation. Third, the high-level information embodied in the system description language is exploited to allow the system to be automatically instrumented for on-chip monitoring. This thesis demonstrates that these techniques catch errors which are undetected by traditional verification and validation tools. The locations of faults are specified and errors are caught earlier in the design flow, which saves time by reducing synthesis iterations

    Doctor of Philosophy

    Get PDF
    dissertationOver the last decade, cyber-physical systems (CPSs) have seen significant applications in many safety-critical areas, such as autonomous automotive systems, automatic pilot avionics, wireless sensor networks, etc. A Cps uses networked embedded computers to monitor and control physical processes. The motivating example for this dissertation is the use of fault- tolerant routing protocol for a Network-on-Chip (NoC) architecture that connects electronic control units (Ecus) to regulate sensors and actuators in a vehicle. With a network allowing Ecus to communicate with each other, it is possible for them to share processing power to improve performance. In addition, networked Ecus enable flexible mapping to physical processes (e.g., sensors, actuators), which increases resilience to Ecu failures by reassigning physical processes to spare Ecus. For the on-chip routing protocol, the ability to tolerate network faults is important for hardware reconfiguration to maintain the normal operation of a system. Adding a fault-tolerance feature in a routing protocol, however, increases its design complexity, making it prone to many functional problems. Formal verification techniques are therefore needed to verify its correctness. This dissertation proposes a link-fault-tolerant, multiflit wormhole routing algorithm, and its formal modeling and verification using two different methodologies. An improvement upon the previously published fault-tolerant routing algorithm, a link-fault routing algorithm is proposed to relax the unrealistic node-fault assumptions of these algorithms, while avoiding deadlock conservatively by appropriately dropping network packets. This routing algorithm, together with its routing architecture, is then modeled in a process-algebra language LNT, and compositional verification techniques are used to verify its key functional properties. As a comparison, it is modeled using channel-level VHDL which is compiled to labeled Petri-nets (LPNs). Algorithms for a partial order reduction method on LPNs are given. An optimal result is obtained from heuristics that trace back on LPNs to find causally related enabled predecessor transitions. Key observations are made from the comparison between these two verification methodologies

    Towards hardware as a reconfigurable, elastic, and specialized service

    Get PDF
    As modern Data Center workloads become increasingly complex, constrained, and critical, mainstream CPU-centric computing has had ever more difficulty in keeping pace. Future data centers are moving towards a more fluid and heterogeneous model, with computation and communication no longer localized to commodity CPUs and routers. Next generation data-centric Data Centers will compute everywhere, whether data is stationary (e.g. in memory) or on the move (e.g. in network). While deploying FPGAs in NICS, as co-processors, in the router, and in Bump-in-the-Wire configurations is a step towards implementing the data-centric model, it is only part of the overall solution. The other part is actually leveraging this reconfigurable hardware. For this to happen, two problems must be addressed: code generation and deployment generation. By code generation we mean transforming abstract representations of an algorithm into equivalent hardware. Deployment generation refers to the runtime support needed to facilitate the execution of this hardware on an FPGA. Efforts at creating supporting tools in these two areas have thus far provided limited benefits. This is because the efforts are limited in one or more of the following ways: They i) do not provide fundamental solutions to a number of challenges, which makes them useful only to a limited group of (mostly) hardware developers, ii) are constrained in their scope, or iii) are ad hoc, i.e., specific to a single usage context, FPGA vendor, or Data Center configuration. Moreover, efforts in these areas have largely been mutually exclusive, which results in incompatibility across development layers; this requires wrappers to be designed to make interfaces compatible. As a result there is significant complexity and effort required to code and deploy efficient custom hardware for FPGAs; effort that may be orders-of-magnitude greater than for analogous software environments. The goal of this dissertation is to create a framework that enables reconfigurable logic in Data Centers to be targeted with the same level of effort as for a single CPU core. The underlying mechanism to this is a framework, which we refer to as Hardware as a Reconfigurable, Elastic and Specialized Service, or HaaRNESS. In this dissertation, we address two of the core challenges of HaaRNESS: reducing the complexity of code generation by constraining High Level Synthesis (HLS) toolflows, and replacing ad hoc models of deployment generation by generalizing and formalizing what is needed for a hardware Operating System. These parts are unified by the back-end of HLS toolflows which link generated compute pipelines with the operating system, and provide appropriate APIs, wrappers, and software runtimes. The contributions of this dissertation are the following: i) an empirically guided set of systematic transformations for generating high quality HLS code; ii) a framework for instrumenting HLS compiler to identify and remove optimization blockers; iii) a framework for RTL simulation and IP generation of HLS kernels for rapid turnaround; and iv) a framework for generalization and formalization of hardware operating systems to address the {\it ad hoc}'ness of existing deployment generation and ensure uniform structure and APIs

    Automated Debugging Methodology for FPGA-based Systems

    Get PDF
    Electronic devices make up a vital part of our lives. These are seen from mobiles, laptops, computers, home automation, etc. to name a few. The modern designs constitute billions of transistors. However, with this evolution, ensuring that the devices fulfill the designer’s expectation under variable conditions has also become a great challenge. This requires a lot of design time and effort. Whenever an error is encountered, the process is re-started. Hence, it is desired to minimize the number of spins required to achieve an error-free product, as each spin results in loss of time and effort. Software-based simulation systems present the main technique to ensure the verification of the design before fabrication. However, few design errors (bugs) are likely to escape the simulation process. Such bugs subsequently appear during the post-silicon phase. Finding such bugs is time-consuming due to inherent invisibility of the hardware. Instead of software simulation of the design in the pre-silicon phase, post-silicon techniques permit the designers to verify the functionality through the physical implementations of the design. The main benefit of the methodology is that the implemented design in the post-silicon phase runs many order-of-magnitude faster than its counterpart in pre-silicon. This allows the designers to validate their design more exhaustively. This thesis presents five main contributions to enable a fast and automated debugging solution for reconfigurable hardware. During the research work, we used an obstacle avoidance system for robotic vehicles as a use case to illustrate how to apply the proposed debugging solution in practical environments. The first contribution presents a debugging system capable of providing a lossless trace of debugging data which permits a cycle-accurate replay. This methodology ensures capturing permanent as well as intermittent errors in the implemented design. The contribution also describes a solution to enhance hardware observability. It is proposed to utilize processor-configurable concentration networks, employ debug data compression to transmit the data more efficiently, and partially reconfiguring the debugging system at run-time to save the time required for design re-compilation as well as preserve the timing closure. The second contribution presents a solution for communication-centric designs. Furthermore, solutions for designs with multi-clock domains are also discussed. The third contribution presents a priority-based signal selection methodology to identify the signals which can be more helpful during the debugging process. A connectivity generation tool is also presented which can map the identified signals to the debugging system. The fourth contribution presents an automated error detection solution which can help in capturing the permanent as well as intermittent errors without continuous monitoring of debugging data. The proposed solution works for designs even in the absence of golden reference. The fifth contribution proposes to use artificial intelligence for post-silicon debugging. We presented a novel idea of using a recurrent neural network for debugging when a golden reference is present for training the network. Furthermore, the idea was also extended to designs where golden reference is not present

    Secure and Energy-Efficient Processors

    Full text link
    Security has become an essential part of digital information storage and processing. Both high-end and low-end applications, such as data centers and Internet of Things (IoT), rely on robust security to ensure proper operation. Encryption of information is the primary means for enabling security. Among all encryption standards, Advanced Encryption Standard (AES) is a widely adopted cryptographic algorithm, due to its simplicity and high security. Although encryption standards in general are extremely difficult to break mathematically, they are vulnerable to so-called side channel attacks, which exploit electrical signatures of operating chips, such as power trace or magnetic field radiation, to crack the encryption. Differential Power Analysis (DPA) attack is a representative and powerful side-channel attack method, which has demonstrated high effectiveness in cracking secure chips. This dissertation explores circuits and architectures that offer protection against DPA attacks in high-performance security applications and in low-end IoT applications. The effectiveness of the proposed technologies is evaluated. First, a 128-bit Advanced Encryption Standard (AES) core for high-performance security applications is designed, fabricated and evaluated in a 65nm CMOS technology. A novel charge-recovery logic family, called Bridge Boost Logic (BBL), is introduced in this design to achieve switching-independent energy dissipation and provide intrinsic high resistance against DPA attacks. Based on measurements, the AES core achieves a throughput of 16.90Gbps and power consumption of 98mW, exhibiting 720x higher DPA resistance and 30% lower power than a conventional CMOS counterpart implemented on the same die and operated at the same clock frequency. Second, an AES core designed for low-cost and energy-efficient IoT security applications is designed and fabricated in a 65nm CMOS technology. A novel Dual-Rail Flush Logic (DRFL) with switching-independent power profile is used to yield intrinsic resistance against DPA attacks with minimum area and energy consumption. Measurement results show that this 0.048mm2 core achieves energy consumption as low as 1.25pJ/bit, while providing at least 2604x higher DPA resistance over its conventional CMOS counterpart on the same die, marking the smallest, most energy-efficient and most secure full-datapath AES core published to date.PHDElectrical EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/138791/1/luss_1.pd

    Reconfigurable Instruction Cell Architecture Reconfiguration and Interconnects

    Get PDF

    Methodologies and Toolflows for the Predictable Design of Reliable and Low-Power NoCs

    Get PDF
    There is today the unmistakable need to evolve design methodologies and tool ows for Network-on-Chip based embedded systems. In particular, the quest for low-power requirements is nowadays a more-than-ever urgent dilemma. Modern circuits feature billion of transistors, and neither power management techniques nor batteries capacity are able to endure the increasingly higher integration capability of digital devices. Besides, power concerns come together with modern nanoscale silicon technology design issues. On one hand, system failure rates are expected to increase exponentially at every technology node when integrated circuit wear-out failure mechanisms are not compensated for. However, error detection and/or correction mechanisms have a non-negligible impact on the network power. On the other hand, to meet the stringent time-to-market deadlines, the design cycle of such a distributed and heterogeneous architecture must not be prolonged by unnecessary design iterations. Overall, there is a clear need to better discriminate reliability strategies and interconnect topology solutions upfront, by ranking designs based on power metric. In this thesis, we tackle this challenge by proposing power-aware design technologies. Finally, we take into account the most aggressive and disruptive methodology for embedded systems with ultra-low power constraints, by migrating NoC basic building blocks to asynchronous (or clockless) design style. We deal with this challenge delivering a standard cell design methodology and mainstream CAD tool ows, in this way partially relaxing the requirement of using asynchronous blocks only as hard macros

    An FPGA implementation of an investigative many-core processor, Fynbos : in support of a Fortran autoparallelising software pipeline

    Get PDF
    Includes bibliographical references.In light of the power, memory, ILP, and utilisation walls facing the computing industry, this work examines the hypothetical many-core approach to finding greater compute performance and efficiency. In order to achieve greater efficiency in an environment in which Moore’s law continues but TDP has been capped, a means of deriving performance from dark and dim silicon is needed. The many-core hypothesis is one approach to exploiting these available transistors efficiently. As understood in this work, it involves trading in hardware control complexity for hundreds to thousands of parallel simple processing elements, and operating at a clock speed sufficiently low as to allow the efficiency gains of near threshold voltage operation. Performance is there- fore dependant on exploiting a new degree of fine-grained parallelism such as is currently only found in GPGPUs, but in a manner that is not as restrictive in application domain range. While removing the complex control hardware of traditional CPUs provides space for more arithmetic hardware, a basic level of control is still required. For a number of reasons this work chooses to replace this control largely with static scheduling. This pushes the burden of control primarily to the software and specifically the compiler, rather not to the programmer or to an application specific means of control simplification. An existing legacy tool chain capable of autoparallelising sequential Fortran code to the degree of parallelism necessary for many-core exists. This work implements a many-core architecture to match it. Prototyping the design on an FPGA, it is possible to examine the real world performance of the compiler-architecture system to a greater degree than simulation only would allow. Comparing theoretical peak performance and real performance in a case study application, the system is found to be more efficient than any other reviewed, but to also significantly under perform relative to current competing architectures. This failing is apportioned to taking the need for simple hardware too far, and an inability to implement static scheduling mitigating tactics due to lack of support for such in the compiler
    • …
    corecore