Search CORE

3,944 research outputs found

Eye Semantic Segmentation with a Lightweight Model

Author: Huynh Van Thong
Kim Soo-Hyung
Lee Guee-Sang
Yang Hyung-Jeong
Publication venue
Publication date: 04/11/2019
Field of study

In this paper, we present a multi-class eye segmentation method that can run the hardware limitations for real-time inference. Our approach includes three major stages: get a grayscale image from the input, segment three distinct eye region with a deep network, and remove incorrect areas with heuristic filters. Our model based on the encoder decoder structure with the key is the depthwise convolution operation to reduce the computation cost. We experiment on OpenEDS, a large scale dataset of eye images captured by a head-mounted display with two synchronized eye facing cameras. We achieved the mean intersection over union (mIoU) of 94.85% with a model of size 0.4 megabytes. The source code are available https://github.com/th2l/Eye_VR_SegmentationComment: To appear in ICCVW 2019. Pre-trained models and source code are available https://github.com/th2l/Eye_VR_Segmentatio

arXiv.org e-Print Archive

Deep Learning for LiDAR Point Clouds in Autonomous Driving: A Review

Author: Cao Dongpu
Chapman Michael A.
Li Jonathan
Li Ying
Liu Fei
Ma Lingfei
Zhong Zilong
Publication venue
Publication date: 19/05/2020
Field of study

Recently, the advancement of deep learning in discriminative feature learning from 3D LiDAR data has led to rapid development in the field of autonomous driving. However, automated processing uneven, unstructured, noisy, and massive 3D point clouds is a challenging and tedious task. In this paper, we provide a systematic review of existing compelling deep learning architectures applied in LiDAR point clouds, detailing for specific tasks in autonomous driving such as segmentation, detection, and classification. Although several published research papers focus on specific topics in computer vision for autonomous vehicles, to date, no general survey on deep learning applied in LiDAR point clouds for autonomous vehicles exists. Thus, the goal of this paper is to narrow the gap in this topic. More than 140 key contributions in the recent five years are summarized in this survey, including the milestone 3D deep architectures, the remarkable deep learning applications in 3D semantic segmentation, object detection, and classification; specific datasets, evaluation metrics, and the state of the art performance. Finally, we conclude the remaining challenges and future researches.Comment: 21 pages, submitted to IEEE Transactions on Neural Networks and Learning System

arXiv.org e-Print Archive

VarGNet: Variable Group Convolutional Neural Network for Efficient Embedded Computing

Author: Li Jianjun
Li Zhichao
Meng Wenming
Song Liangchen
Wang Guoli
Yao Meng
Zhang Qian
Zhang Xuezhi
Zhou Helong
Publication venue
Publication date: 29/04/2020
Field of study

In this paper, we propose a novel network design mechanism for efficient embedded computing. Inspired by the limited computing patterns, we propose to fix the number of channels in a group convolution, instead of the existing practice that fixing the total group numbers. Our solution based network, named Variable Group Convolutional Network (VarGNet), can be optimized easier on hardware side, due to the more unified computing schemes among the layers. Extensive experiments on various vision tasks, including classification, detection, pixel-wise parsing and face recognition, have demonstrated the practical value of our VarGNet.Comment: Technical repor

arXiv.org e-Print Archive

Towards an Embodied Semantic Fovea: Semantic 3D scene reconstruction from ego-centric eye-tracker videos

Author: Faisal A Aldo
Leutenegger Stefan
Li Mickey
Orlov Pavel
Songur Noyan
Publication venue
Publication date: 27/07/2018
Field of study

Incorporating the physical environment is essential for a complete understanding of human behavior in unconstrained every-day tasks. This is especially important in ego-centric tasks where obtaining 3 dimensional information is both limiting and challenging with the current 2D video analysis methods proving insufficient. Here we demonstrate a proof-of-concept system which provides real-time 3D mapping and semantic labeling of the local environment from an ego-centric RGB-D video-stream with 3D gaze point estimation from head mounted eye tracking glasses. We augment existing work in Semantic Simultaneous Localization And Mapping (Semantic SLAM) with collected gaze vectors. Our system can then find and track objects both inside and outside the user field-of-view in 3D from multiple perspectives with reasonable accuracy. We validate our concept by producing a semantic map from images of the NYUv2 dataset while simultaneously estimating gaze position and gaze classes from recorded gaze data of the dataset images

arXiv.org e-Print Archive

Sensor Fusion for Joint 3D Object Detection and Semantic Segmentation

Author: Charland Jake
Hegde Darshan
Laddha Ankit
Meyer Gregory P.
Vallespi-Gonzalez Carlos
Publication venue
Publication date: 25/04/2019
Field of study

In this paper, we present an extension to LaserNet, an efficient and state-of-the-art LiDAR based 3D object detector. We propose a method for fusing image data with the LiDAR data and show that this sensor fusion method improves the detection performance of the model especially at long ranges. The addition of image data is straightforward and does not require image labels. Furthermore, we expand the capabilities of the model to perform 3D semantic segmentation in addition to 3D object detection. On a large benchmark dataset, we demonstrate our approach achieves state-of-the-art performance on both object detection and semantic segmentation while maintaining a low runtime.Comment: Accepted for publication at CVPR Workshop on Autonomous Driving 201

arXiv.org e-Print Archive

Security of Facial Forensics Models Against Adversarial Attacks

Author: Echizen Isao
Fang Fuming
Huang Rong
Nguyen Huy H.
Yamagishi Junichi
Publication venue
Publication date: 20/05/2020
Field of study

Deep neural networks (DNNs) have been used in digital forensics to identify fake facial images. We investigated several DNN-based forgery forensics models (FFMs) to examine whether they are secure against adversarial attacks. We experimentally demonstrated the existence of individual adversarial perturbations (IAPs) and universal adversarial perturbations (UAPs) that can lead a well-performed FFM to misbehave. Based on iterative procedure, gradient information is used to generate two kinds of IAPs that can be used to fabricate classification and segmentation outputs. In contrast, UAPs are generated on the basis of over-firing. We designed a new objective function that encourages neurons to over-fire, which makes UAP generation feasible even without using training data. Experiments demonstrated the transferability of UAPs across unseen datasets and unseen FFMs. Moreover, we conducted subjective assessment for imperceptibility of the adversarial perturbations, revealing that the crafted UAPs are visually negligible. These findings provide a baseline for evaluating the adversarial security of FFMs.Comment: Accepted by ICIP 202

arXiv.org e-Print Archive

Looking Fast and Slow: Memory-Guided Mobile Video Object Detection

Author: Kalenichenko Dmitry
Li Yinxiao
Liu Mason
White Marie
Zhu Menglong
Publication venue
Publication date: 25/03/2019
Field of study

With a single eye fixation lasting a fraction of a second, the human visual system is capable of forming a rich representation of a complex environment, reaching a holistic understanding which facilitates object recognition and detection. This phenomenon is known as recognizing the "gist" of the scene and is accomplished by relying on relevant prior knowledge. This paper addresses the analogous question of whether using memory in computer vision systems can not only improve the accuracy of object detection in video streams, but also reduce the computation time. By interleaving conventional feature extractors with extremely lightweight ones which only need to recognize the gist of the scene, we show that minimal computation is required to produce accurate detections when temporal memory is present. In addition, we show that the memory contains enough information for deploying reinforcement learning algorithms to learn an adaptive inference policy. Our model achieves state-of-the-art performance among mobile methods on the Imagenet VID 2015 dataset, while running at speeds of up to 70+ FPS on a Pixel 3 phone

arXiv.org e-Print Archive

Shallow and Deep Convolutional Networks for Saliency Prediction

Author: Giro-i-Nieto Xavier
McGuinness Kevin
O'Connor Noel
Pan Junting
Sayrol Elisa
Publication venue
Publication date: 02/03/2016
Field of study

The prediction of salient areas in images has been traditionally addressed with hand-crafted features based on neuroscience principles. This paper, however, addresses the problem with a completely data-driven approach by training a convolutional neural network (convnet). The learning process is formulated as a minimization of a loss function that measures the Euclidean distance of the predicted saliency map with the provided ground truth. The recent publication of large datasets of saliency prediction has provided enough data to train end-to-end architectures that are both fast and accurate. Two designs are proposed: a shallow convnet trained from scratch, and a another deeper solution whose first three layers are adapted from another network trained for classification. To the authors knowledge, these are the first end-to-end CNNs trained and tested for the purpose of saliency prediction.Comment: Preprint of the paper accepted at 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Source code and models available at https://github.com/imatge-upc/saliency-2016-cvpr. Junting Pan and Kevin McGuinness contributed equally to this wor

arXiv.org e-Print Archive

Reverse Attention for Salient Object Detection

Author: Chen Shuhan
Hu Xuelong
Tan Xiuli
Wang Ben
Publication venue
Publication date: 15/04/2019
Field of study

Benefit from the quick development of deep learning techniques, salient object detection has achieved remarkable progresses recently. However, there still exists following two major challenges that hinder its application in embedded devices, low resolution output and heavy model weight. To this end, this paper presents an accurate yet compact deep network for efficient salient object detection. More specifically, given a coarse saliency prediction in the deepest layer, we first employ residual learning to learn side-output residual features for saliency refinement, which can be achieved with very limited convolutional parameters while keep accuracy. Secondly, we further propose reverse attention to guide such side-output residual learning in a top-down manner. By erasing the current predicted salient regions from side-output features, the network can eventually explore the missing object parts and details which results in high resolution and accuracy. Experiments on six benchmark datasets demonstrate that the proposed approach compares favorably against state-of-the-art methods, and with advantages in terms of simplicity, efficiency (45 FPS) and model size (81 MB).Comment: ECCV 201

arXiv.org e-Print Archive

SS3D: Single Shot 3D Object Detector

Author: Desappan Kumar
Limaye Aniket
Maji Debapriya
Mathew Manu
Nagori Soyeb
Swami Pramod Kumar
Publication venue
Publication date: 02/05/2020
Field of study

Single stage deep learning algorithm for 2D object detection was made popular by Single Shot MultiBox Detector (SSD) and it was heavily adopted in several embedded applications. PointPillars is a state of the art 3D object detection algorithm that uses a Single Shot Detector adapted for 3D object detection. The main downside of PointPillars is that it has a two stage approach with learned input representation based on fully connected layers followed by the Single Shot Detector for 3D detection. In this paper we present Single Shot 3D Object Detection (SS3D) - a single stage 3D object detection algorithm which combines straight forward, statistically computed input representation and a Single Shot Detector (based on PointPillars). Computing the input representation is straight forward, does not involve learning and does not have much computational cost. We also extend our method to stereo input and show that, aided by additional semantic segmentation input; our method produces similar accuracy as state of the art stereo based detectors. Achieving the accuracy of two stage detectors using a single stage approach is important as single stage approaches are simpler to implement in embedded, real-time applications. With LiDAR as well as stereo input, our method outperforms PointPillars. When using LiDAR input, our input representation is able to improve the AP3D of Cars objects in the moderate category from 74.99 to 76.84. When using stereo input, our input representation is able to improve the AP3D of Cars objects in the moderate category from 38.13 to 45.13. Our results are also better than other popular 3D object detectors such as AVOD and F-PointNet

arXiv.org e-Print Archive