722 research outputs found

    Low Power MobileNets Acceleration In Cuda And OpenCL

    Get PDF
    Convolutional Neural Network (CNN) has been used widely for the tasks of object recognition and facial recognition because of their remarkable results on these common visual tasks. In order to evaluate the performance of CNN for embedded devices effectively, it is essential to provide a comprehensive benchmark evaluation environment. Even though there are many benchmark suites available for use, but these benchmark suites require installation of various packages and proprietary libraries. This creates a bottleneck in using them in applications which are executed on resource constraint devices like embedded devices. In this paper, we propose an evaluation platform which can be used for evaluation on any platform that supports Cuda and OpenCL. This evaluation platform was executed on Nvidia TX2 Jetson board embedded device and commodity hardware without needing any extra proprietary libraries to execute the model. We also achieved 4.5-fold gain in execution speed of the Cuda and OpenCL model. The model also exactly predicts images as the Python based with 100% accuracy. We also provide in-depth statistics about the CNN network execution pattern by executing the model on embedded devices and commodity hardware

    Visual Analysis Algorithms for Embedded Systems

    Get PDF
    The main contribution of this thesis is the design and development of an optimized framework to realize the deep neural classifiers on the embedded platforms. Deep convolutional networks exhibit unmatched performance in image classification. However, these deep classifiers demand huge computational power and memory storage. That is an issue on embedded devices due to limited onboard resources. The computational demand of neural networks mainly stems from the convolutional layers. A significant improvement in performance can be obtained by reducing the computational complexity of these convolutional layers, making them realizable on embedded platforms. In this thesis, we proposed a CUDA (Compute Unified Device Architecture)-based accelerated scheme to realize the deep architectures on the embedded platforms by exploiting the already trained networks. All required functions and layers to replicate the trained neural networks were implemented and accelerated using concurrent resources of embedded GPU. Performance of our CUDA-based proposed scheme was significantly improved by performing convolutions in the transform domain. This matrix multiplication based convolution was also compared with the traditional approach to analyze the improvement in inference performance. The second part of this thesis focused on the optimization of the proposed framework. The flow of our CUDA-based framework was optimized using unified memory scheme and hardware-dependent utilization of computational resources. The proposed flow was evaluated over three different image classification networks on Jetson TX1 embedded board and Nvidia Shield K1 tablet. The performance of proposed GPU-only flow was compared with its sequential and heterogeneous versions. The results showed that the proposed scheme brought the higher performance and enabled the real-time image classification on the embedded platforms with lesser storage requirements. These results motivated us towards the realization of useful real-time classification and recognition problems on the embedded platforms. Finally, we utilized the proposed framework to realize the neural network-based automatic license plate recognition (ALPR) system on a mobile platform. This highly-precise and computationally demanding system was deployed by simplifying the flow of trained deep architecture developed for powerful desktop and server environments. A comparative analysis of computational complexity, recognition accuracy and inference performance was performed

    System for automatic detection and classification of cars in traffic

    Get PDF
    Objective: To develop a system for automatic detection and classification of cars in traffic in the form of a device for autonomic, real-time car detection, license plate recognition, and car color, model, and make identification from video. Methods: Cars were detected using the You Only Look Once (YOLO) v4 detector. The YOLO output was then used for classification in the next step. Colors were classified using the k-Nearest Neighbors (kNN) algorithm, whereas car models and makes were identified with a single-shot detector (SSD). Finally, license plates were detected using the OpenCV library and Tesseract-based optical character recognition. For the sake of simplicity and speed, the subsystems were run on an embedded Raspberry Pi computer. Results: A camera was mounted on the inside of the windshield to monitor cars in front of the camera. The system processed the camera’s video feed and provided information on the color, license plate, make, and model of the observed car. Knowing the license plate number provides access to details about the car owner, roadworthiness, car or license place reports missing, as well as whether the license plate matches the car. Car details were saved to file and displayed on the screen. The system was tested on real-time images and videos. The accuracies of car detection and car model classification (using 8 classes) in images were 88.5% and 78.5%, respectively. The accuracies of color detection and full license plate recognition were 71.5% and 51.5%, respectively. The system operated at 1 frame per second (1 fps). Conclusion: These results show that running standard machine learning algorithms on low-cost hardware may enable the automatic detection and classification of cars in traffic. However, there is significant room for improvement, primarily in license plate recognition. Accordingly, potential improvements in the future development of the system are proposed

    AON: Towards Arbitrarily-Oriented Text Recognition

    Full text link
    Recognizing text from natural images is a hot research topic in computer vision due to its various applications. Despite the enduring research of several decades on optical character recognition (OCR), recognizing texts from natural images is still a challenging task. This is because scene texts are often in irregular (e.g. curved, arbitrarily-oriented or seriously distorted) arrangements, which have not yet been well addressed in the literature. Existing methods on text recognition mainly work with regular (horizontal and frontal) texts and cannot be trivially generalized to handle irregular texts. In this paper, we develop the arbitrary orientation network (AON) to directly capture the deep features of irregular texts, which are combined into an attention-based decoder to generate character sequence. The whole network can be trained end-to-end by using only images and word-level annotations. Extensive experiments on various benchmarks, including the CUTE80, SVT-Perspective, IIIT5k, SVT and ICDAR datasets, show that the proposed AON-based method achieves the-state-of-the-art performance in irregular datasets, and is comparable to major existing methods in regular datasets.Comment: Accepted by CVPR201

    Edge-Computing Deep Learning-Based Computer Vision Systems

    Get PDF
    Computer vision has become ubiquitous in today\u27s society, with applications ranging from medical imaging to visual diagnostics to aerial monitoring to self-driving vehicles and many more. Common to many of these applications are visual perception systems which consist of classification, localization, detection, and segmentation components, just to name a few. Recently, the development of deep neural networks (DNN) have led to great advancements in pushing state-of-the-art performance in each of these areas. Unlike traditional computer vision algorithms, DNNs have the ability to generalize features previously hand-crafted by engineers specific to the application; this assumption models the human visual system\u27s ability to generalize its surroundings. Moreover, convolutional neural networks (CNN) have been shown to not only match, but exceed performance of traditional computer vision algorithms as the filters of the network are able to learn important features present in the data. In this research we aim to develop numerous applications including visual warehouse diagnostics and shipping yard managements systems, aerial monitoring and tracking from the perspective of the drone, perception system model for an autonomous vehicle, and vehicle re-identification for surveillance and security. The deep learning models developed for each application attempt to match or exceed state-of-the-art performance in both accuracy and inference time; however, this is typically a trade-off when designing a network where one or the other can be maximized. We investigate numerous object-detection architectures including Faster R-CNN, SSD, YOLO, and a few other variations in an attempt to determine the best architecture for each application. We constrain performance metrics to only investigate inference times rather than training times as none of the optimizations performed in this research have any effect on training time. Further, we will also investigate re-identification of vehicles as a separate application add-on to the object-detection pipeline. Re-identification will allow for a more robust representation of the data while leveraging techniques for security and surveillance. We also investigate comparisons between architectures that could possibly lead to the development of new architectures with the ability to not only perform inference relatively quickly (or in close-to real-time), but also match the state-of-the-art in accuracy performance. New architecture development, however, depends on the application and its requirements; some applications need to run on edge-computing (EC) devices, while others have slightly larger inference windows which allow for cloud computing with powerful accelerators
    • …
    corecore