205,773 research outputs found

    Fast and accurate object detection in high resolution 4K and 8K video using GPUs

    Full text link
    Machine learning has celebrated a lot of achievements on computer vision tasks such as object detection, but the traditionally used models work with relatively low resolution images. The resolution of recording devices is gradually increasing and there is a rising need for new methods of processing high resolution data. We propose an attention pipeline method which uses two staged evaluation of each image or video frame under rough and refined resolution to limit the total number of necessary evaluations. For both stages, we make use of the fast object detection model YOLO v2. We have implemented our model in code, which distributes the work across GPUs. We maintain high accuracy while reaching the average performance of 3-6 fps on 4K video and 2 fps on 8K video.Comment: 6 pages, 12 figures, Best Paper Finalist at IEEE High Performance Extreme Computing Conference (HPEC) 2018; copyright 2018 IEEE; (DOI will be filled when known

    Adapting On Orbit: Conclusions of the STP-H6 Spacecraft Supercomputing for Image and Video Processing Experiment

    Get PDF
    Spacecraft Supercomputing for Image and Video Processing (SSIVP) was a payload aboard the Department of Defense Space Test Program – Houston 6 pallet deployed on the International Space Station. SSIVP was designed and constructed by graduate students at the NSF Center for Space, High-Performance, and Resilient Computing (SHREC) at the University of Pittsburgh. The primary objective of this experiment was to evaluate resilient- and parallel-computing capabilities in a small-satellite form factor. Five flight computers, each combining radiation-tolerant and commercial-off-the-shelf technologies, were networked by high-speed interconnects, enabling a reliable space-supercomputing paradigm. Image-processing and computer-vision experiments were conducted on Earth-observation imagery acquired from two five-megapixel cameras. The system operated for 30 months, serving as an adaptable and reconfigurable platform to host academic and industry research. Despite on-orbit challenges with thermal constraints and operations, all mission objectives were completed successfully. SSIVP resulted in a dataset of nearly 20,000 images, radiation-effects data, and an increase in the technology-readiness level for two SHREC flight computers. Its designers and operators hope that SSIVP serves as a model for future reconfigurable and adaptable space computing platforms

    Auto-Adaptive Multi-Sensor Architecture

    Get PDF
    International audienceTo overcome luminosity problems, modern embedded vision systems often integrate technologically heterogeneous sensors. Also, it has to provide different functionalities such as photo or video mode, image improvement or data fusion, according to the user environment. Therefore, nowadays vision systems should be context-aware and adapt their performance parameters automatically. In this context, we propose a novel auto-adaptive architecture enabling on-the-fly and automatic frame rate and resolution adaptation by a frequency tuning method. This method also intends to reduce power consumption as an alternative to existing power gating method. Performance evaluation in a FPGA implementation demonstrates an inter-frame adaptation capability with a relative low area overhead. I. INTRODUCTION From decades, the ability of computer vision systems increases thanks to the multiplication of integrated sensors. Multi-sensor systems enable many high-level vision applications such as stereo vision, data fusion [1] or 3D stereo view [2]. Also smart camera networks take advantage of the multi-sensor concept for large-scale surveillance applications [3]. More and more vision systems involve several heterogeneous sensors such as color, infrared or intensified low-light sensor [4] to overcome the variable luminosity conditions or improve the application robustness. Frequently, the considered vision system accomplishes various tasks such as video streaming, photo capture or high level processing (i.e. face detection, object tracking, ...). Each one of these tasks imposes different performance computing ability to the hardware resources, according to the applicative context and used sensor. That is why, nowadays vision systems have to be context-aware and to possess the ability to adapt their performance according to the user environment [5]. Fig. 1 illustrates the differences between video and photo user mode parameters: latency, frame rate, resolution, image quality and power consumption. While a video mode needs a high frame rate and low latency, a photo mode rather expects a higher resolution and higher image quality. In this context, we expect the system architecture adapt itself on-the-fly to the required frame rate or resolution while minimizing the use-case transition time when the user mode changes. In addition, the frame rate and the resolution of the involved sensors are not supposed to be known in advance. Numerous adaptable architectures exist for high-performance image processing [6]–[8] and also even for energy aware heterogeneous vision systems [2], they do not enable such dynamic adaptation of the frame rate or the resolution. In this paper, we propose a novel pixel frequency tuning approach for heterogeneous multi-sensor vision systems. Th

    Learning in AI Processor

    Get PDF
    AI processor, which can run artificial intelligence algorithms, is a state-of-the-art accelerator,in essence, to perform special algorithm in various applications. In particular,these are four AI applications: VR/AR smartphone games, high-performance computing, Advanced Driver Assistance Systems and IoT. Deep learning using convolutional neural networks (CNNs) involves embedding intelligence into applications to perform tasks and has achieved unprecedented accuracy [1]. Usually, the powerful multi-core processors and the on-chip tensor processing accelerator unit are prominent hardware features of deep learning AI processor. After data is collected by sensors, tools such as image processing technique, voice recognition and autonomous drone navigation, are adopted to pre-process and analyze data. In recent years, plenty of technologies associating with deep learning Al processor including cognitive spectrum sensing, computer vision and semantic reasoning become a focus in current research

    Accelerating Deep Learning Applications in Space

    Get PDF
    Computing at the edge offers intriguing possibilities for the development of autonomy and artificial intelligence. The advancements in autonomous technologies and the resurgence of computer vision have led to a rise in demand for fast and reliable deep learning applications. In recent years, the industry has introduced devices with impressive processing power to perform various object detection tasks. However, with real-time detection, devices are constrained in memory, computational capacity, and power, which may compromise the overall performance. This could be solved either by optimizing the object detector or modifying the images. In this paper, we investigate the performance of CNN-based object detectors on constrained devices when applying different image compression techniques. We examine the capabilities of a NVIDIA Jetson Nano; a low-power, high-performance computer, with an integrated GPU, small enough to fit on-board a CubeSat. We take a closer look at the Single Shot MultiBox Detector (SSD) and Region-based Fully Convolutional Network (R-FCN) that are pre-trained on DOTA – a Large Scale Dataset for Object Detection in Aerial Images. The performance is measured in terms of inference time, memory consumption, and accuracy. By applying image compression techniques, we are able to optimize performance. The two techniques applied, lossless compression and image scaling, improves speed and memory consumption with no or little change in accuracy. The image scaling technique achieves a 100% runnable dataset and we suggest combining both techniques in order to optimize the speed/memory/accuracy trade-off

    Scalable Neural Network Architecture Search Applied to Super-Resolution Networks

    Get PDF
    Based on attention mechanisms, the transformer architecture has been widely adopted in machine learning applications, including natural language processing and computer vision. A common strategy for improving transformer models has been to greatly increase the number of parameters. However, training large transformer models requires prohibitively large computation for consumer hardware. High-performance computing clusters equipped with massively-parallel hardware such as graphical processing units (GPU) are the perfect environments to improve model performance by automatically searching possible transformer model architectures, thus removing the requirement of manually tuning the architecture. This project aims to demonstrate the potential of HPC clusters in training huge vision transformer models through Neural Architecture Search (NAS) methods. We chose image super-resolution as our low-level vision task and the Swin Transformer as our vision transformer backbone. This work demonstrates how transformer networks can be improved after conducting NAS on HPC clusters, followed by distributed deep learning training. Our results show that HPC clusters will drastically reduce the searching and training time in NAS while at the same time producing more fine-tuned and accurate super-resolution transformer models.No embargoAcademic Major: Computer Science and Engineerin

    Real-Time Human Detection Using Deep Learning on Embedded Platforms: A Review

    Get PDF
    The detection of an object such as a human is very important for image understanding in the field of computer vision. Human detection in images can provide essential information for a wide variety of applications in intelligent systems. In this paper, human detection is carried out using deep learning that has developed rapidly and achieved extraordinary success in various object detection implementations. Recently, several embedded systems have emerged as powerful computing boards to provide high processing capabilities using the graphics processing unit (GPU). This paper aims to provide a comprehensive survey of the latest achievements in this field brought about by deep learning techniques in the embedded platforms. NVIDIA Jetson was chosen as a low power system designed to accelerate deep learning applications. This review highlights the performance of human detection models such as PedNet, multiped, SSD MobileNet V1, SSD MobileNet V2, and SSD inception V2 on edge computing. This survey aims to provide an overview of these methods and compare their performance in accuracy and computation time for real-time applications. The experimental results show that the SSD MobileNet V2 model provides the highest accuracy with the fastest computation time compared to other models in our video datasets with several scenarios

    A study of smart device-based mobile imaging and implementation for engineering applications

    Get PDF
    Title from PDF of title page, viewed on June 12, 2013Thesis advisor: ZhiQiang ChenVitaIncludes bibliographic references (pages 76-82)Thesis (M.S.)--School of Computing and Engineering. University of Missouri--Kansas City, 2013Mobile imaging has become a very active research topic in recent years thanks to the rapid development of computing and sensing capabilities of mobile devices. This area features multi-disciplinary studies of mobile hardware, imaging sensors, imaging and vision algorithms, wireless network and human-machine interface problems. Due to the limitation of computing capacity that early mobile devices have, researchers proposed client-server module, which push the data to more powerful computing platforms through wireless network, and let the cloud or standalone servers carry out all the computing and processing work. This thesis reviewed the development of mobile hardware and software platform, and the related research done on mobile imaging for the past 20 years. There are several researches on mobile imaging, but few people aim at building a framework which helps engineers solving problems by using mobile imaging. With higher-resolution imaging and high-performance computing power built into smart mobile devices, more and more imaging processing tasks can be achieved on the device rather than the client-server module. Based on this fact, a framework of collaborative mobile imaging is introduced for civil infrastructure condition assessment to help engineers solving technical challenges. Another contribution in this thesis is applying mobile imaging application into home automation. E-SAVE is a research project focusing on extensive use of automation in conserving and using energy wisely in home automation. Mobile users can view critical information such as energy data of the appliances with the help of mobile imaging. OpenCV is an image processing and computer vision library. The applications in this thesis use functions in OpenCV including camera calibration, template matching, image stitching and Canny edge detection. The application aims to help field engineers is interactive crack detection. The other one uses template matching to recognize appliances in the home automation system.Introduction -- Background and related work -- Basic imaging processing methods for mobile applications -- Collaborative and interactive mobile imaging -- Mobile imaging for smart energy -- Conclusion and recommendation
    • …
    corecore