20,686 research outputs found

    Evaluation of Single-Chip, Real-Time Tomographic Data Processing on FPGA - SoC Devices

    Get PDF
    A novel approach to tomographic data processing has been developed and evaluated using the Jagiellonian PET (J-PET) scanner as an example. We propose a system in which there is no need for powerful, local to the scanner processing facility, capable to reconstruct images on the fly. Instead we introduce a Field Programmable Gate Array (FPGA) System-on-Chip (SoC) platform connected directly to data streams coming from the scanner, which can perform event building, filtering, coincidence search and Region-Of-Response (ROR) reconstruction by the programmable logic and visualization by the integrated processors. The platform significantly reduces data volume converting raw data to a list-mode representation, while generating visualization on the fly.Comment: IEEE Transactions on Medical Imaging, 17 May 201

    Computer hardware and software for robotic control

    Get PDF
    The KSC has implemented an integrated system that coordinates state-of-the-art robotic subsystems. It is a sensor based real-time robotic control system performing operations beyond the capability of an off-the-shelf robot. The integrated system provides real-time closed loop adaptive path control of position and orientation of all six axes of a large robot; enables the implementation of a highly configurable, expandable testbed for sensor system development; and makes several smart distributed control subsystems (robot arm controller, process controller, graphics display, and vision tracking) appear as intelligent peripherals to a supervisory computer coordinating the overall systems

    An improved artificial dendrite cell algorithm for abnormal signal detection

    Get PDF
    In dendrite cell algorithm (DCA), the abnormality of a data point is determined by comparing the multi-context antigen value (MCAV) with anomaly threshold. The limitation of the existing threshold is that the value needs to be determined before mining based on previous information and the existing MCAV is inefficient when exposed to extreme values. This causes the DCA fails to detect new data points if the pattern has distinct behavior from previous information and affects detection accuracy. This paper proposed an improved anomaly threshold solution for DCA using the statistical cumulative sum (CUSUM) with the aim to improve its detection capability. In the proposed approach, the MCAV were normalized with upper CUSUM and the new anomaly threshold was calculated during run time by considering the acceptance value and min MCAV. From the experiments towards 12 benchmark and two outbreak datasets, the improved DCA is proven to have a better detection result than its previous version in terms of sensitivity, specificity, false detection rate and accuracy

    Performance evaluation over HW/SW co-design SoC memory transfers for a CNN accelerator

    Get PDF
    Many FPGAs vendors have recently included embedded processors in their devices, like Xilinx with ARM-Cortex A cores, together with programmable logic cells. These devices are known as Programmable System on Chip (PSoC). Their ARM cores (embedded in the processing system or PS) communicates with the programmable logic cells (PL) using ARM-standard AXI buses. In this paper we analyses the performance of exhaustive data transfers between PS and PL for a Xilinx Zynq FPGA in a co-design real scenario for Convolutional Neural Networks (CNN) accelerator, which processes, in dedicated hardware, a stream of visual information from a neuromorphic visual sensor for classification. In the PS side, a Linux operating system is running, which recollects visual events from the neuromorphic sensor into a normalized frame, and then it transfers these frames to the accelerator of multi-layered CNNs, and read results, using an AXI-DMA bus in a per-layer way. As these kind of accelerators try to process information as quick as possible, data bandwidth becomes critical and maintaining a good balanced data throughput rate requires some considerations. We present and evaluate several data partitioning techniques to improve the balance between RX and TX transfer and two different ways of transfers management: through a polling routine at the userlevel of the OS, and through a dedicated interrupt-based kernellevel driver. We demonstrate that for longer enough packets, the kernel-level driver solution gets better timing in computing a CNN classification example. Main advantage of using kernel-level driver is to have safer solutions and to have tasks scheduling in the OS to manage other important processes for our application, like frames collection from sensors and their normalization.Ministerio de Economía y Competitividad TEC2016-77785-

    FPGA Implementation of Convolutional Neural Networks with Fixed-Point Calculations

    Full text link
    Neural network-based methods for image processing are becoming widely used in practical applications. Modern neural networks are computationally expensive and require specialized hardware, such as graphics processing units. Since such hardware is not always available in real life applications, there is a compelling need for the design of neural networks for mobile devices. Mobile neural networks typically have reduced number of parameters and require a relatively small number of arithmetic operations. However, they usually still are executed at the software level and use floating-point calculations. The use of mobile networks without further optimization may not provide sufficient performance when high processing speed is required, for example, in real-time video processing (30 frames per second). In this study, we suggest optimizations to speed up computations in order to efficiently use already trained neural networks on a mobile device. Specifically, we propose an approach for speeding up neural networks by moving computation from software to hardware and by using fixed-point calculations instead of floating-point. We propose a number of methods for neural network architecture design to improve the performance with fixed-point calculations. We also show an example of how existing datasets can be modified and adapted for the recognition task in hand. Finally, we present the design and the implementation of a floating-point gate array-based device to solve the practical problem of real-time handwritten digit classification from mobile camera video feed
    corecore