6 research outputs found
Efficient Incremental Training for Deep Convolutional Neural Networks
While the deep convolutional neural networks (DCNNs) have shown excellent performance in various applications, such as image classification, training a DCNN model from scratch is computationally expensive and time consuming. In recent years, a lot of studies have been done to accelerate the training of DCNNs, but most of them were performed in a one-time manner. Considering the learning patterns of the human beings, people typically feel more comfortable to learn things in an incremental way and may be overwhelmed when absorbing a large amount of new information at once. Therefore, we demonstrate a new training schema that splits the whole training process into several sub-training steps. In this study, we propose an efficient DCNN training framework where we learn the new classes of concepts incrementally. The experiments are conducted on CIFAR-100 with VGG-19 as the backbone network. Our proposed framework demonstrates a comparable accuracy compared with the model trained from scratch and has shown 1.42x faster training speed
A Power Efficient Neural Network Implementation on Heterogeneous FPGA and GPU Devices
Deep neural networks (DNNs) have seen tremendous industrial successes in various applications, including image recognition, machine translation, audio processing, etc. However, they require massive amounts of computations and take a lot of time to process. This quickly becomes a problem in mobile and handheld devices where real-time multimedia applications such as face detection, disaster management, and CCTV require lightweight, fast, and effective computing solutions. The objective of this project is to utilize specialized devices such as Field Programmable Gate Arrays (FPGAs) and Graphics Processing Units (GPUs) in a heterogeneous computing environment to accelerate the deep learning computations with the constraints of power efficiency. We investigate an efficient DNN implementation and make use of FPGA for fully-connected layer and GPU for floating-point operations. This requires the deep neural network architecture to be implemented in a model parallelism system where the DNN model is broken down and processed in a distributed fashion. The proposed heterogeneous framework idea is implemented using an Nvidia TX2 GPU and a Xilinx Artix-7 FPGA. Experimental results indicate that the proposed framework can achieve faster computation and much lower power consumption
Recommended from our members
Unconstrained Flood Event Detection Using Adversarial Data Augmentation
Nowadays, the world faces extreme climate changes, resulting in an increase of natural disaster events and their severities. In these conditions, the necessity of disaster information management systems has become more imperative. Specifically, in this paper, the problem of flood event detection from images with real-world conditions is addressed. That is, the images may be taken in several conditions, including day, night, blurry, clear, foggy, rainy, different lighting conditions, etc. All these abnormal scenarios significantly reduce the performance of the learning algorithms. In addition, many existing image classification methods use datasets that usually include high-resolution images without considering real-world noise. In this paper, we propose a new image classification framework based on adversarial data augmentation and deep learning algorithms to address the aforementioned problems. We validate the performance of the flood event detection framework on a real-world noisy visual dataset collected from social networks
A preclinical positron emission tomography (PET) and electron paramagnetic resonance imaging (EPRI) hybrid system: PET detector module
We report a PET detector module (DM) designed for developing a preclinical positron emission tomography (PET)-electron paramagnetic resonance imaging (EPRI) hybrid system. The DM consists of a linear array of eight detector units, each of which is a 12×12 array of lutetium–yttrium oxyorthosilicate (LYSO) crystals read by a 4×4 silicon photomultiplier (SiPM) array. The crystal size is approximately 1.0×1.0×10 mm3. All surfaces of the crystal are polished; except for that coupled to SiPMs they are also covered with BaSO4 to reduce light loss. The pitches of the LYSO and SiPM arrays are about 1.05 mm and 3.2 mm, respectively. The front face of the resulting DM is about 1.28×10.24 cm2 in extext and its thickness is approximately 1.8 cm. A highly multiplexing readout is devised to produce only six outputs for a DM, including two outputs that are derived from the SiPM cathode signals for determining the event time and the active DU, and four outputs that are derived from the SiPM anode signals for determining the event energy and the active crystal within the active DU. At present, these outputs are acquired by waveform sampling and analyzed offline. We have successfully developed two DMs, both showing well discriminated DUs and crystals, and an average energy resolution of about 15%. Even though time-of-flight (ToF) is not needed for the proposed system, our data shows that the DM can potentially achieve a 300-400 ps ToF resolution. </p