291 research outputs found

    Methodology and Ecosystem for the Design of a Complex Network ASIC

    Full text link
    Performance of HPC systems has risen steadily. While the 10 Petaflop/s barrier has been breached in the year 2011 the next large step into the exascale era is expected sometime between the years 2018 and 2020. The EXTOLL project will be an integral part in this venture. Originally designed as a research project on FPGA basis it will make the transition to an ASIC to improve its already excelling performance even further. This transition poses many challenges that will be presented in this thesis. Nowadays, it is not enough to look only at single components in a system. EXTOLL is part of complex ecosystem which must be optimized overall since everything is tightly interwoven and disregarding some aspects can cause the whole system either to work with limited performance or even to fail. This thesis examines four different aspects in the design hierarchy and proposes efficient solutions or improvements for each of them. At first it takes a look at the design implementation and the differences between FPGA and ASIC design. It introduces a methodology to equip all on-chip memory with ECC logic automatically without the user’s input and in a transparent way so that the underlying code that uses the memory does not have to be changed. In the next step the floorplanning process is analyzed and an iterative solution is worked out based on physical and logical constraints of the EXTOLL design. Besides, a work flow for collaborative design is presented that allows multiple users to work on the design concurrently. The third part concentrates on the high-speed signal path from the chip to the connector and how it is affected by technological limitations. All constraints are analyzed and a package layout for the EXTOLL chip is proposed that is seen as the optimal solution. The last part develops a cost model for wafer and package level test and raises technological concerns that will affect the testing methodology. In order to run testing internally it proposes the development of a stand-alone test platform that is able to test packaged EXTOLL chips in every aspect

    Optimal use of computing equipment in an automated industrial inspection context

    Get PDF
    This thesis deals with automatic defect detection. The objective was to develop the techniques required by a small manufacturing business to make cost-efficient use of inspection technology. In our work on inspection techniques we discuss image acquisition and the choice between custom and general-purpose processing hardware. We examine the classes of general-purpose computer available and study popular operating systems in detail. We highlight the advantages of a hybrid system interconnected via a local area network and develop a sophisticated suite of image-processing software based on it. We quantitatively study the performance of elements of the TCP/IP networking protocol suite and comment on appropriate protocol selection for parallel distributed applications. We implement our own distributed application based on these findings. In our work on inspection algorithms we investigate the potential uses of iterated function series and Fourier transform operators when preprocessing images of defects in aluminium plate acquired using a linescan camera. We employ a multi-layer perceptron neural network trained by backpropagation as a classifier. We examine the effect on the training process of the number of nodes in the hidden layer and the ability of the network to identify faults in images of aluminium plate. We investigate techniques for introducing positional independence into the network's behaviour. We analyse the pattern of weights induced in the network after training in order to gain insight into the logic of its internal representation. We conclude that the backpropagation training process is sufficiently computationally intensive so as to present a real barrier to further development in practical neural network techniques and seek ways to achieve a speed-up. Weconsider the training process as a search problem and arrive at a process involving multiple, parallel search "vectors" and aspects of genetic algorithms. We implement the system as the mentioned distributed application and comment on its performance
    • …
    corecore