8 research outputs found

    Design, Implementation, and Test of a Multi-Model Systolic Neural-Network Accelerator

    Get PDF

    Studies of inspection algorithms and associated microprogrammable hardware implementations

    Get PDF
    This work is concerned with the design and development of real-time algorithms for industrial inspection applications. Rather than implement algorithms in dedicated hardware, microprogrammable machines were considered essential in order to maintain flexibility. After a survey of image pattern recognition where algorithms applicable to real-time use are cited, this thesis presents industrial inspection algorithms that locate and scrutinise actual manufactured products. These are fast and robust - a necessary requirement in industrial environments. The National Physical Laboratory have developed a Linear Array Processor (LAP) specifically designed for industrial recognition work. As with most array processors, the LAP has a greater performance than conventional processors, yet is strictly limited to parallel algorithms for optimum performance. It was therefore necessary to incorporate sequentialism into the design of a multiprocessor system. A microcoded bit-slice Sequential Image Processor (SIP) has been designed and built at RHBNC in conjunction with the NPL. This was primarily intended as a post-processor for the LAP based on the VMEbus but in fact has proved its usefulness as a stand-alone processor. This is described along with an assembler written for SIP which translates assembly language mnemonics to microcode. This work, which includes a review of current architectures, leads to the specification of a hybrid (SIMD/NIMD) architecture consisting of multiple autonomous sequential processors. This involves an analysis of various configurations and entails an investigation of the source of bottlenecks within each design. Such systems require a significant amount of interprocessor communication: methods for achieving this are discussed, some of which have only become practical with the decrease incost of electronic components. This eventually leads to a system for which algorithm execution speed increases approximately linearly with the number of processors. The algorithms described in earlier chapters are examined on the system and the practicalities of such a design are analysed in detail. Overall, this thesis has arrived at designs of programmable real-time inspection systems, and has obtained guidelines which will help with the implementation of future inspection systems.<p

    Decompose and Conquer: Addressing Evasive Errors in Systems on Chip

    Full text link
    Modern computer chips comprise many components, including microprocessor cores, memory modules, on-chip networks, and accelerators. Such system-on-chip (SoC) designs are deployed in a variety of computing devices: from internet-of-things, to smartphones, to personal computers, to data centers. In this dissertation, we discuss evasive errors in SoC designs and how these errors can be addressed efficiently. In particular, we focus on two types of errors: design bugs and permanent faults. Design bugs originate from the limited amount of time allowed for design verification and validation. Thus, they are often found in functional features that are rarely activated. Complete functional verification, which can eliminate design bugs, is extremely time-consuming, thus impractical in modern complex SoC designs. Permanent faults are caused by failures of fragile transistors in nano-scale semiconductor manufacturing processes. Indeed, weak transistors may wear out unexpectedly within the lifespan of the design. Hardware structures that reduce the occurrence of permanent faults incur significant silicon area or performance overheads, thus they are infeasible for most cost-sensitive SoC designs. To tackle and overcome these evasive errors efficiently, we propose to leverage the principle of decomposition to lower the complexity of the software analysis or the hardware structures involved. To this end, we present several decomposition techniques, specific to major SoC components. We first focus on microprocessor cores, by presenting a lightweight bug-masking analysis that decomposes a program into individual instructions to identify if a design bug would be masked by the program's execution. We then move to memory subsystems: there, we offer an efficient memory consistency testing framework to detect buggy memory-ordering behaviors, which decomposes the memory-ordering graph into small components based on incremental differences. We also propose a microarchitectural patching solution for memory subsystem bugs, which augments each core node with a small distributed programmable logic, instead of including a global patching module. In the context of on-chip networks, we propose two routing reconfiguration algorithms that bypass faulty network resources. The first computes short-term routes in a distributed fashion, localized to the fault region. The second decomposes application-aware routing computation into simple routing rules so to quickly find deadlock-free, application-optimized routes in a fault-ridden network. Finally, we consider general accelerator modules in SoC designs. When a system includes many accelerators, there are a variety of interactions among them that must be verified to catch buggy interactions. To this end, we decompose such inter-module communication into basic interaction elements, which can be reassembled into new, interesting tests. Overall, we show that the decomposition of complex software algorithms and hardware structures can significantly reduce overheads: up to three orders of magnitude in the bug-masking analysis and the application-aware routing, approximately 50 times in the routing reconfiguration latency, and 5 times on average in the memory-ordering graph checking. These overhead reductions come with losses in error coverage: 23% undetected bug-masking incidents, 39% non-patchable memory bugs, and occasionally we overlook rare patterns of multiple faults. In this dissertation, we discuss the ideas and their trade-offs, and present future research directions.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/147637/1/doowon_1.pd

    Algorithm to layout (ATL) systems for VLSI design

    Get PDF
    PhD ThesisThe complexities involved in custom VLSI design together with the failure of CAD techniques to keep pace with advances in the fabrication technology have resulted in a design bottleneck. Powerful tools are required to exploit the processing potential offered by the densities now available. Describing a system in a high level algorithmic notation makes writing, understanding, modification, and verification of a design description easier. It also removes some of the emphasis on the physical issues of VLSI design, and focus attention on formulating a correct and well structured design. This thesis examines how current trends in CAD techniques might influence the evolution of advanced Algorithm To Layout (ATL) systems. The envisaged features of an example system are specified. Particular attention is given to the implementation of one its features COPTS (Compilation Of Occam Programs To Schematics). COPTS is capable of generating schematic diagrams from which an actual layout can be derived. It takes a description written in a subset of Occam and generates a high level schematic diagram depicting its realisation as a VLSI system. This diagram provides the designer with feedback on the relative placement and interconnection of the operators used in the source code. It also gives a visual representation of the parallelism defined in the Occam description. Such diagrams are a valuable aid in documenting the implementation of a design. Occam has also been selected as the input to the design system that COPTS is a feature of. The choice of Occam was made on the assumption that the most appropriate algorithmic notation for such a design system will be a suitable high level programming language. This is in contrast to current automated VLSI design systems, which typically use a hardware des~ription language for input. These special purpose languages currently concentrate on handling structural/behavioural information and have limited ability to express algorithms. Using a language such as Occam allows a designer to write a behavioural description which can be compiled and executed as a simulator, or prototype, of the system. The programmability introduced into the design process enables designers to concentrate on a design's underlying algorithm. The choice of this algorithm is the most crucial decision since it determines the performance and area of the silicon implementation. The thesis is divided into four sections, each of several chapters. The first section considers VLSI design complexity, compares the expert systems and silicon compilation approaches to tackling it, and examines its parallels with software complexity. The second section reviews the advantages of using a conventional programming language for VLSI system descriptions. A number of alternative high level programming languages are considered for application in VLSI design. The third section defines the overall ATL system COPTS is envisaged to be part of, and considers the schematic representation of Occam programs. The final section presents a summary of the overall project and suggestions for future work on realising the full ATL system

    MULTI-OBJECTIVE DESIGN AUTOMATION FOR RECONFIGURABLE MULTI-PROCESSOR SYSTEMS

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH
    corecore