1,030 research outputs found

    NullHop: A Flexible Convolutional Neural Network Accelerator Based on Sparse Representations of Feature Maps

    Get PDF
    Convolutional neural networks (CNNs) have become the dominant neural network architecture for solving many state-of-the-art (SOA) visual processing tasks. Even though Graphical Processing Units (GPUs) are most often used in training and deploying CNNs, their power efficiency is less than 10 GOp/s/W for single-frame runtime inference. We propose a flexible and efficient CNN accelerator architecture called NullHop that implements SOA CNNs useful for low-power and low-latency application scenarios. NullHop exploits the sparsity of neuron activations in CNNs to accelerate the computation and reduce memory requirements. The flexible architecture allows high utilization of available computing resources across kernel sizes ranging from 1x1 to 7x7. NullHop can process up to 128 input and 128 output feature maps per layer in a single pass. We implemented the proposed architecture on a Xilinx Zynq FPGA platform and present results showing how our implementation reduces external memory transfers and compute time in five different CNNs ranging from small ones up to the widely known large VGG16 and VGG19 CNNs. Post-synthesis simulations using Mentor Modelsim in a 28nm process with a clock frequency of 500 MHz show that the VGG19 network achieves over 450 GOp/s. By exploiting sparsity, NullHop achieves an efficiency of 368%, maintains over 98% utilization of the MAC units, and achieves a power efficiency of over 3TOp/s/W in a core area of 6.3mm2^2. As further proof of NullHop's usability, we interfaced its FPGA implementation with a neuromorphic event camera for real time interactive demonstrations

    Real-Time Deep Learning-Based Face Recognition System

    Get PDF
    This research proposes Real-time Deep Learning-based Face recognition algorithms using MATLAB and Python. Generally, Face recognition is defined as the process through which people are identified using facial images. This technology is applied broadly in biometrics, security information, accessing controlled areas, etc. The facial recognition system can be built by following two steps. In the first step, the facial features are picked up or extracted, then the second step involves pattern classification. Deep learning, specifically the convolutional neural network (CNN), has recently made more progress in face recognition technology. Convolution Neural Network is one among the Deep Learning approaches and has shown excellent performance in many fields, such as image recognition of a large amount of training data (such as ImageNet). However, due to hardware limitations and insufficient training datasets, high performance is not achieved. Therefore, in this work, the Transfer Learning method is used to improve the performance of the face-recognition system even for a smaller number of images. For this, two pre-trained models, namely, GoogLeNet CNN (in MATLAB) and FaceNet (in Python) are used. Transfer learning is used to perform fine-tuning on the last layer of CNN model for new classification tasks. FaceNet presents a unified system for face verification (is this the same person?), recognition (who is this person?) and clustering (finds common people among these faces) using the method based on learning a Euclidean embedding per image using a deep convolutional network

    Segmentation Panoptique basé sur YOLO

    Get PDF
    International audienceGiven the recent challenge of Panoptic Segmentation, where every pixel in an image must be given a label, as in semantic segmentation, and an instance id, a new YOLO-based architecture is proposed here for this computer vision task. This network uses the YOLOv3 architecture, plus parallel semantic and instance segmentation heads to perform full scene parsing. A set of solutions for each of these two segmentation tasks are proposed and evaluated, where a Pyramid Pooling Module is found to be the best semantic feature extractor given a set of feature maps from the Darknet-53 backbone network. The network gives good segmentation results for both stuff and thing classes by training with a frozen backbone, where boundaries between background classes are consistent with the ground truth and the instance masks match closely the true shapes of the objects present in a scene.Compte tenu du défi récent de la segmentation panoptique, où chaque pixel d’une image doit recevoir une étiquette, comme dans la segmentation sémantique,et un identifiant d’instance, une nouvelle architecture basée sur YOLO est proposée ici pour cette tâche de vision par ordinateur. Ce réseau utilise l’architectureYOLOv3, ainsi que des têtes de segmentation sémantique et d’instance parallèles pour effectuer une analyse complète de la scène. Un ensemble de solutions pour chacune de ces deux tâches de segmentation est proposé et évalué, où un Pyramid Pooling Module se révèle être le meilleur extracteur de caractéristiques sémantiques compte tenu d’un ensemble de caractéristiques du réseau de base Darknet-53. Le réseau donne de bons résultats de segmentation pour les classes de choses et d’objets en s’entraînant avec une backbone figée, où les frontières entre les classes d’arrière-plan sont cohérentes avec la ground-truth et les masques d’instance correspondent étroitement aux vraies formes des objets présents dans une scène

    Naval Mine Detection and Seabed Segmentation in Sonar Images with Deep Learning

    Get PDF
    Underwater mines are a cost-effective method in asymmetric warfare, and are commonly used to block shipping lanes and restrict naval operations. Consequently, they threaten commercial and military vessels, disrupt humanitarian aids, and damage sea environments. There is a strong international interest in using sonars and AI for mine countermeasures and undersea surveillance. High-resolution imaging sonars are well-suited for detecting underwater mines and other targets. Compared to other sensors, sonars are more effective for undersea environments with low visibility. This project aims to investigate deep learning algorithms for two important tasks in undersea surveillance: naval mine detection and seabed terrain segmentation. Our goal is to automatically classify the composition of the seabed and localise naval mines. This research utilises the real sonar data provided by the Defence Science and Technology Group (DSTG). To conduct the experiments, we annotated 150 sonar images for semantic segmentation; the annotation is guided by experts from the DSTG.We also used 152 sonar images with mine detection annotations prepared by members of Centre for Signal and Information Processing at the University of Wollongong. Our results show Faster-RCNN to achieve the highest performance in object detection. We evaluated transfer learning and data augmentation for object detection. Each method improved our detection models mAP by 11.9% and 16.9% and mAR by 17.8% and 21.1%, respectively. Furthermore, we developed a data augmentation algorithm called Evolutionary Cut-Paste which yielded a 20.2% increase in performance. For segmentation, we found highly-tuned DeepLabV3 and U-Nett++models perform best. We evaluate various configurations of optimisers, learning rate schedules and encoder networks for each model architecture. Additionally, model hyper-parameters are tuned prior to training using various tests. Finally, we apply Median Frequency Balancing to mitigate model bias towards frequently occurring classes. We favour DeepLabV3 due to its reliable detection of underrepresented classes as opposed to the accurate boundaries produced by U-Nett++. All of the models satisfied the constraint of real-time operation when running on an NVIDIA GTX 1070

    WordFences: Text localization and recognition

    Get PDF
    En col·laboració amb la Universitat de Barcelona (UB) i la Universitat Rovira i Virgili (URV)In recent years, text recognition has achieved remarkable success in recognizing scanned document text. However, word recognition in natural images is still an open problem, which generally requires time consuming post-processing steps. We present a novel architecture for individual word detection in scene images based on semantic segmentation. Our contributions are twofold: the concept of WordFence, which detects border areas surrounding each individual word and a unique pixelwise weighted softmax loss function which penalizes background and emphasizes small text regions. WordFence ensures that each word is detected individually, and the new loss function provides a strong training signal to both text and word border localization. The proposed technique avoids intensive post-processing by combining semantic word segmentation with a voting scheme for merging segmentations of multiple scales, producing an end-to-end word detection system. We achieve superior localization recall on common benchmark datasets - 92% recall on ICDAR11 and ICDAR13 and 63% recall on SVT. Furthermore, end-to-end word recognition achieves state-of-the-art 86% F-Score on ICDAR13

    RNNs Implicitly Implement Tensor Product Representations

    Full text link
    Recurrent neural networks (RNNs) can learn continuous vector representations of symbolic structures such as sequences and sentences; these representations often exhibit linear regularities (analogies). Such regularities motivate our hypothesis that RNNs that show such regularities implicitly compile symbolic structures into tensor product representations (TPRs; Smolensky, 1990), which additively combine tensor products of vectors representing roles (e.g., sequence positions) and vectors representing fillers (e.g., particular words). To test this hypothesis, we introduce Tensor Product Decomposition Networks (TPDNs), which use TPRs to approximate existing vector representations. We demonstrate using synthetic data that TPDNs can successfully approximate linear and tree-based RNN autoencoder representations, suggesting that these representations exhibit interpretable compositional structure; we explore the settings that lead RNNs to induce such structure-sensitive representations. By contrast, further TPDN experiments show that the representations of four models trained to encode naturally-occurring sentences can be largely approximated with a bag of words, with only marginal improvements from more sophisticated structures. We conclude that TPDNs provide a powerful method for interpreting vector representations, and that standard RNNs can induce compositional sequence representations that are remarkably well approximated by TPRs; at the same time, existing training tasks for sentence representation learning may not be sufficient for inducing robust structural representations.Comment: Accepted to ICLR 201

    Developing an Efficient Real-Time Terrestrial Infrastructure Inspection System Using Autonomous Drones and Deep Learning

    Get PDF
    Unmanned aerial vehicles (UAV), commonly referred to as drones (Dynamic Remotely Operated Navigation Equipment), show promise for deploying regular, automated structural inspections remotely. Deep learning has shown great potential for robustly detecting structural faults from collected images, through convolutional neural networks (CNN). However, running computationally demanding tasks (such as deep learning algorithms) on-board drones is difficult due to on-board memory and processing constraints. Moreover, the potential for fully automating drone navigation for structural data collection while optimizing deep learning models deployed to computationally constrained on-board processing units has yet to be realized for infrastructure inspection. Thus, an efficient, fully autonomous drone infrastructure inspection system is introduced. Using inertial sensors, mounted time-of-flight (ToF) and optical sensors to calculate distance readings for obstacle avoidance, a drone can autonomously track around structures. The drone can localize and extract faults in real-time on low-power processing units, through pixel-wise segmentation of faults from structural images collected by an on-board digital camera. Furthermore, proposed modifications to a CNN-based U-Net architecture show notable improvements to the baseline U-Net, in terms of pixel-wise segmentation accuracy and efficiency on computationally constrained on-board devices. After fault segmentation, the fault points corresponding to the predicted fault pixels are passed into a custom fault tracking algorithm; based on a robust line estimation technique, modifications are proposed using a quadtree data structure and a smart sampling approach. Using this approach, the drone is capable of following along faults robustly and efficiently during inspection to better gauge the extent of the spread of the faults
    • …
    corecore