808 research outputs found

    Radiation safety based on the sky shine effect in reactor

    Get PDF
    In the reactor operation, neutrons and gamma rays are the most dominant radiation. As protection, lead and concrete shields are built around the reactor. However, the radiation can penetrate the water shielding inside the reactor pool. This incident leads to the occurrence of sky shine where a physical phenomenon of nuclear radiation sources was transmitted panoramic that extends to the environment. The effect of this phenomenon is caused by the fallout radiation into the surrounding area which causes the radiation dose to increase. High doses of exposure cause a person to have stochastic effects or deterministic effects. Therefore, this study was conducted to measure the radiation dose from sky shine effect that scattered around the reactor at different distances and different height above the reactor platform. In this paper, the analysis of the radiation dose of sky shine effect was measured using the experimental metho

    Hardware Acceleration of Deep Convolutional Neural Networks on FPGA

    Get PDF
    abstract: The rapid improvement in computation capability has made deep convolutional neural networks (CNNs) a great success in recent years on many computer vision tasks with significantly improved accuracy. During the inference phase, many applications demand low latency processing of one image with strict power consumption requirement, which reduces the efficiency of GPU and other general-purpose platform, bringing opportunities for specific acceleration hardware, e.g. FPGA, by customizing the digital circuit specific for the deep learning algorithm inference. However, deploying CNNs on portable and embedded systems is still challenging due to large data volume, intensive computation, varying algorithm structures, and frequent memory accesses. This dissertation proposes a complete design methodology and framework to accelerate the inference process of various CNN algorithms on FPGA hardware with high performance, efficiency and flexibility. As convolution contributes most operations in CNNs, the convolution acceleration scheme significantly affects the efficiency and performance of a hardware CNN accelerator. Convolution involves multiply and accumulate (MAC) operations with four levels of loops. Without fully studying the convolution loop optimization before the hardware design phase, the resulting accelerator can hardly exploit the data reuse and manage data movement efficiently. This work overcomes these barriers by quantitatively analyzing and optimizing the design objectives (e.g. memory access) of the CNN accelerator based on multiple design variables. An efficient dataflow and hardware architecture of CNN acceleration are proposed to minimize the data communication while maximizing the resource utilization to achieve high performance. Although great performance and efficiency can be achieved by customizing the FPGA hardware for each CNN model, significant efforts and expertise are required leading to long development time, which makes it difficult to catch up with the rapid development of CNN algorithms. In this work, we present an RTL-level CNN compiler that automatically generates customized FPGA hardware for the inference tasks of various CNNs, in order to enable high-level fast prototyping of CNNs from software to FPGA and still keep the benefits of low-level hardware optimization. First, a general-purpose library of RTL modules is developed to model different operations at each layer. The integration and dataflow of physical modules are predefined in the top-level system template and reconfigured during compilation for a given CNN algorithm. The runtime control of layer-by-layer sequential computation is managed by the proposed execution schedule so that even highly irregular and complex network topology, e.g. GoogLeNet and ResNet, can be compiled. The proposed methodology is demonstrated with various CNN algorithms, e.g. NiN, VGG, GoogLeNet and ResNet, on two different standalone FPGAs achieving state-of-the art performance. Based on the optimized acceleration strategy, there are still a lot of design options, e.g. the degree and dimension of computation parallelism, the size of on-chip buffers, and the external memory bandwidth, which impact the utilization of computation resources and data communication efficiency, and finally affect the performance and energy consumption of the accelerator. The large design space of the accelerator makes it impractical to explore the optimal design choice during the real implementation phase. Therefore, a performance model is proposed in this work to quantitatively estimate the accelerator performance and resource utilization. By this means, the performance bottleneck and design bound can be identified and the optimal design option can be explored early in the design phase.Dissertation/ThesisDoctoral Dissertation Electrical Engineering 201

    Design and implementation of high-performance memory systems for future packet buffers

    Get PDF
    In this paper, we address the design of a future high-speed router that supports line rates as high as OC-3072 (160 Gb/s), around one hundred ports and several service classes. Building such a high-speed router would raise many technological problems, one of them being the packet buffer design, mainly because in router design it is important to provide worst-case bandwidth guarantees and not just average-case optimizations. A previous packet buffer design provides worst-case bandwidth guarantees by using a hybrid SRAM/DRAM approach. Next-generation routers need to support hundreds of interfaces (i.e., ports and service classes). Unfortunately, high bandwidth for hundreds of interfaces requires the previous design to use large SRAMs which become a bandwidth bottleneck. The key observation we make is that the SRAM size is proportional to the DRAM access time but we can reduce the effective DRAM access time by overlapping multiple accesses to different banks, allowing us to reduce the SRAM size. The key challenge is that to keep the worst-case bandwidth guarantees, we need to guarantee that there are no bank conflicts while the accesses are in flight. We guarantee bank conflicts by reordering the DRAM requests using a modern issue-queue-like mechanism. Because our design may lead to fragmentation of memory across packet buffer queues, we propose to share the DRAM space among multiple queues by renaming the queue slots. To the best of our knowledge, the design proposed in this paper is the fastest buffer design using commodity DRAM to be published to date.Peer ReviewedPostprint (published version

    A DRAM/SRAM memory scheme for fast packet buffers

    Get PDF
    We address the design of high-speed packet buffers for Internet routers. We use a general DRAM/SRAM architecture for which previous proposals can be seen as particular cases. For this architecture, large SRAMs are needed to sustain high line rates and a large number of interfaces. A novel algorithm for DRAM bank allocation is presented that reduces the SRAM size requirements of previously proposed schemes by almost an order of magnitude, without having memory fragmentation problems. A technological evaluation shows that our design can support thousands of queues for line rates up to 160 Gbps.Peer ReviewedPostprint (published version

    Performance Improvement in Packet Buffers for High Bandwidth Routers

    Get PDF
    High-speed routers rely on well-designed packet buffers that support multiple queues, provide large capacity and short response times. Some researchers suggested combined SRAM/DRAM hierarchical buffer architectures to meet these challenges. However, these architectures suffer from either large SRAM requirement or high time-complexity in the memory management. In this paper, we present scalable, efficient, and novel distributed packet buffer architecture. Two fundamental issues need to be addressed to make this architecture feasible: 1) how to minimize the overhead of an individual packet buffer; and 2) how to design scalable packet buffers using independent buffer subsystems. We address these issues by first designing an efficient compact buffer that reduces the SRAM size requirement by (k - 1)/k. Then, we introduce a feasible way of coordinating multiple subsystems with a load-balancing algorithm that maximizes the overall system performance. Both theoretical analysis and experimental results demonstrate that our load-balancing algorithm and the distributed packet buffer architecture can easily scale to meet the buffering needs of high bandwidth links and satisfy the requirements of scale and support for multiple queues
    • …
    corecore