13 research outputs found

    Achieving Functional Correctness in Large Interconnect Systems.

    Full text link
    In today's semi-conductor industry, large chip-multiprocessors and systems-on-chip are being developed, integrating a large number of components on a single chip. The sheer size of these designs and the intricacy of the communication patterns they exhibit have propelled the development of network-on-chip (NoC) interconnects as the basis for the communication infrastructure in these systems. Faced with the interconnect's growing size and complexity, several challenges hinder its effective validation. During the interconnect's development, the functional verification process relies heavily on the use of emulation and post-silicon validation platforms. However, detecting and debugging errors on these platforms is a difficult endeavour due to the limited observability, and in turn the low verification capabilities, they provide. Additionally, with the inherent incompleteness of design-time validation efforts, the potential of design bugs escaping into the interconnect of a released product is also a concern, as these bugs can threaten the viability of the entire system. This dissertation provides solutions to enable the development of functionally correct interconnect designs. We first address the challenges encountered during design-time verification efforts, by providing two complementary mechanisms that allow emulation and post-silicon verification frameworks to capture a detailed overview of the functional behaviour of the interconnect. Our first solution re-purposes the contents of in-flight traffic to log debug data from the interconnect's execution. This approach enables the validation of the interconnect using synthetic traffic workloads, while attaining over 80% observability of the routes followed by packets and capturing valuable debugging information. We also develop an alternative mechanism that boosts observability by taking periodic snapshots of execution, thus extending the verification capabilities to run both synthetic traffic and real-application workloads. The collected snapshots enhance detection and debugging support, and they provide observability of over 50% of packets and reconstructs at least half of each of their routes. Moreover, we also develop error detection and recovery solutions to address the threat of design bugs escaping into the interconnect's runtime operation. Our runtime techniques can overcome communication errors without needing to store replicate copies of all in-flight packets, thereby achieving correctness at minimal area costsPhDComputer Science and EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/116741/1/rawanak_1.pd

    Protocol-directed trace signal selection for post-silicon validation

    Get PDF
    Due to the increasing complexity of modern digital designs using NoC (network-on-chip) communication, post-silicon validation has become an arduous task that consumes much of the development time of the product. The process of finding the root cause of bugs during post-silicon validation is very difficult because of the lack of observability of all signals on the chip. To increase observability for post-silicon validation, an effective silicon debug technique is to use an on-chip trace buffer to monitor and capture the circuit response of certain selected signals during its post-silicon operation. However, because of area limitations for debug structures on chip and routing concerns, the signals that are selected to be traced are a very small subset of all available signals. Traditionally, these trace signals were chosen manually by system designers who determined what signals may be needed for debug once the design reaches post-silicon. However, because modern digital designs have become very complex with many concurrent processes, this method is no longer reliable. Recent work has concentrated on automating the selection of low-level signals from a gate-level analysis. But none of them has ever been able to interpret the trace signals as high-level meaningful debugging information. In this work, we present an automated protocol-directed trace selection where the guiding force is the set of system-level protocols. We use a probabilistic formulation to select messages for tracing and then further analyze these solutions. This method produces traces that allow a debugger to observe when behavior has deviated from the correct path of execution and localize this incorrect behavior for further analysis. Most importantly, unlike the previous gate-level analysis based methods, this method can be applied during the chip design phase when most of the debug features are also designed. In addition, this method drastically reduces the time needed to select signals, as we automate a currently manual process

    Network-on-Multi-Chip (NoMC) with Monitoring and Debugging Support, Journal of Telecommunications and Information Technology, 2011, nr 3

    Get PDF
    This paper summarizes recent research on network-on-multi-chip (NoMC) at Poznań University of Technology. The proposed network architecture supports hierarchical addressing and multicast transition mode. Such an approach provides new debugging functionality hardly attainable in classical hardware testing methodology. A multicast transmission also enables real-time packet monitoring. The introduced features of NoC network allow to elaborate a model of hardware video codec that utilizes distributed processing on many FPGAs. Final performance of the designed network was assessed using a model of AVC coder and multi-FPGA platforms. In such a system, the introduced multicast transmission mode yields overall gain of bandwidth up to 30%. Moreover, synthesis results show that the basic network components designed in Verilog language are suitable and easily synthesizable for FPGA devices

    Zoom Out and See Better: Scalable Message Tracing for Post-Silicon SoC Debug

    Get PDF
    We present a method for selecting trace messages for post-silicon validation of System-on-Chip (SoC). Our message selection is guided by specifications of interacting flows in common user applications. In current practice, such messages are selected based on designer expertise. We formulate the problem as an optimization of mutual information gain and trace buffer utilization. Our approach scales to systems far beyond the capacity of current signal selection techniques. We achieve an average trace buffer utilization of 98.96% with an average flow specification coverage of 94.3% and an average bug localization to only 21.11% of the potential root causes in our large-scale debugging effort. We present efficacy of our selected messages in debugging and root cause analysis using five realistic case studies consisting of complex and subtle bugs from the OpenSPARC T2 processor.IBMOpe

    Methodologies and Toolflows for the Predictable Design of Reliable and Low-Power NoCs

    Get PDF
    There is today the unmistakable need to evolve design methodologies and tool ows for Network-on-Chip based embedded systems. In particular, the quest for low-power requirements is nowadays a more-than-ever urgent dilemma. Modern circuits feature billion of transistors, and neither power management techniques nor batteries capacity are able to endure the increasingly higher integration capability of digital devices. Besides, power concerns come together with modern nanoscale silicon technology design issues. On one hand, system failure rates are expected to increase exponentially at every technology node when integrated circuit wear-out failure mechanisms are not compensated for. However, error detection and/or correction mechanisms have a non-negligible impact on the network power. On the other hand, to meet the stringent time-to-market deadlines, the design cycle of such a distributed and heterogeneous architecture must not be prolonged by unnecessary design iterations. Overall, there is a clear need to better discriminate reliability strategies and interconnect topology solutions upfront, by ranking designs based on power metric. In this thesis, we tackle this challenge by proposing power-aware design technologies. Finally, we take into account the most aggressive and disruptive methodology for embedded systems with ultra-low power constraints, by migrating NoC basic building blocks to asynchronous (or clockless) design style. We deal with this challenge delivering a standard cell design methodology and mainstream CAD tool ows, in this way partially relaxing the requirement of using asynchronous blocks only as hard macros

    Harnessing Simulation Acceleration to Solve the Digital Design Verification Challenge.

    Full text link
    Today, design verification is by far the most resource and time-consuming activity of any new digital integrated circuit development. Within this area, the vast majority of the verification effort in industry relies on simulation platforms, which are implemented either in hardware or software. A "simulator" includes a model of each component of a design and has the capability of simulating its behavior under any input scenario provided by an engineer. Thus, simulators are deployed to evaluate the behavior of a design under as many input scenarios as possible and to identify and debug all incorrect functionality. Two features are critical in simulators for the validation effort to be effective: performance and checking/debugging capabilities. A wide range of simulator platforms are available today: on one end of the spectrum there are software-based simulators, providing a very rich software infrastructure for checking and debugging the design's functionality, but executing only at 1-10 simulation cycles per second (while actual chips operate at GHz speeds). At the other end of the spectrum, there are hardware-based platforms, such as accelerators, emulators and even prototype silicon chips, providing higher performances by 4 to 9 orders of magnitude, at the cost of very limited or non-existent checking/debugging capabilities. As a result, today, simulation-based validation is crippled: one can either have satisfactory performance on hardware-accelerated platforms or critical infrastructures for checking/debugging on software simulators, but not both. This dissertation brings together these two ends of the spectrum by presenting solutions that offer high-performance simulation with effective checking and debugging capabilities. Specifically, it addresses the performance challenge of software simulators by leveraging inexpensive off-the-shelf graphics processors as massively parallel execution substrates, and then exposing the parallelism inherent in the design model to that architecture. For hardware-based platforms, the dissertation provides solutions that offer enhanced checking and debugging capabilities by abstracting the relevant data to be logged during simulation so to minimize the cost of collection, transfer and processing. Altogether, the contribution of this dissertation has the potential to solve the challenge of digital design verification by enabling effective high-performance simulation-based validation.PHDComputer Science and EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/99781/1/dchatt_1.pd

    Vorhersagbares und zur Laufzeit adaptierbares On-Chip Netzwerk für gemischt kritische Echtzeitsysteme

    Get PDF
    The industry of safety-critical and dependable embedded systems calls for even cheaper, high performance platforms that allow flexibility and an efficient verification of safety and real-time requirements. To cope with the increasing complexity of interconnected functions and to reduce the cost and power consumption of the system, multicore systems are used to efficiently integrate different processing units in the same chip. Networks-on-chip (NoCs), as a modular interconnect, are used as a promising solution for such multiprocessor systems on chip (MPSoCs), due to their scalability and performance. For safety-critical systems, a major goal is the avoidance of hazards. For this, safety-critical systems are qualified or even certified to prove the correctness of the functioning under all possible cases. A predictable behaviour of the NoC can help to ease the qualification process of the system. To achieve the required predictability, designers have two classes of solutions: quality of service mechanisms and (formal) analysis. For mixed-criticality systems, isolation and analysis approaches must be combined to efficiently achieve the desired predictability. Traditional NoC analysis and architecture concepts tackle only a subpart of the challenges: they focus on either performance or predictability. Existing, predictable NoCs are deemed too expensive and inflexible to host a variety of applications with opposing constraints. And state-of-the-art analyses neglect certain platform properties to verify the behaviour. Together this leads to a high over-provisioning of the hardware resources as well as adverse impacts on system performance, and on the flexibility of the system. In this work we tackle these challenges and develop a predictable and runtime-adaptable NoC architecture that efficiently integrates mixed-critical applications with opposing constraints. Additionally, we present a modelling and analysis framework for NoCs that accounts for backpressure. This framework enables to evaluate the performance and reliability early at design time. Hence, the designer can assess multiple design decisions by using abstract models and formal approaches.Die Industrie der sicherheitskritischen und zuverlässigen eingebetteten Systeme verlangt nach noch günstigeren, leistungsfähigeren Plattformen, welche Flexibilität und eine effiziente Überprüfung der Sicherheits- und Echtzeitanforderungen ermöglichen. Um der zunehmenden Komplexität der zunehmend vernetzten Funktionen gerecht zu werden und die Kosten und den Stromverbrauch eines Systems zu reduzieren, werden Mehrkern-Systeme eingesetzt. On-Chip Netzwerke werden aufgrund ihrer Skalierbarkeit und Leistung als vielversprechende Lösung für solch Mehrkern-Systeme eingesetzt. Bei sicherheitskritischen Systemen ist die Vermeidung von Gefahren ein wesentliches Ziel. Dazu werden sicherheitskritische Systeme qualifiziert oder zertifiziert, um die Funktionsfähigkeit in allen möglichen Fällen nachzuweisen. Ein vorhersehbares Verhalten des on-Chip Netzwerks kann dabei helfen, den Qualifizierungsprozess des Systems zu erleichtern. Um die erforderliche Vorhersagbarkeit zu erreichen, gibt es zwei Klassen von Lösungen: Quality of Service Mechanismen und (formale) Analyse. Für Systeme mit gemischter Relevanz müssen Isolationsmechanismen und Analyseansätze kombiniert werden, um die gewünschte Vorhersagbarkeit effizient zu erreichen. Traditionelle Analyse- und Architekturkonzepte für on-Chip Netzwerke lösen nur einen Teil dieser Herausforderungen: sie konzentrieren sich entweder auf Leistung oder Vorhersagbarkeit. Existierende vorhersagbare on-Chip Netzwerke werden als zu teuer und unflexibel erachtet, um eine Vielzahl von Anwendungen mit gegensätzlichen Anforderungen zu integrieren. Und state-of-the-art Analysen vernachlässigen bzw. vereinfachen bestimmte Plattformeigenschaften, um das Verhalten überprüfen zu können. Dies führt zu einer hohen Überbereitstellung der Hardware-Ressourcen als auch zu negativen Auswirkungen auf die Systemleistung und auf die Flexibilität des Systems. In dieser Arbeit gehen wir auf diese Herausforderungen ein und entwickeln eine vorhersehbare und zur Laufzeit anpassbare Architektur für on-Chip Netzwerke, welche gemischt-kritische Anwendungen effizient integriert. Zusätzlich stellen wir ein Modellierungs- und Analyseframework für on-Chip Netzwerke vor, das den Paketrückstau berücksichtigt. Dieses Framework ermöglicht es, Designentscheidungen anhand abstrakter Modelle und formaler Ansätze frühzeitig beurteilen
    corecore