Search CORE

17,994 research outputs found

?????? ?????? ????????? ?????? ?????? ?????? ????????? ?????? ????????? ?????? ?????? ????????????

Author: Seo Yong-Hyeok
Publication venue: Ulsan National Institute of Science and Technology
Publication date: 01/02/2021
Field of study

Department of Electrical EngineeringRotation invariance has been an important topic in computer vision tasks such as face detection [1], texture classification [2] and character recognition [3], to name a few. The importance of rotation invariant properties for computer vision methods still remains for recent DNN based approaches. In general, DNNs often require a lot more parameters with data augmentation with rotations to yield rotational-invariant outputs. Max pooling helps alleviating this issue, but since it is usually 2 2 [4], it is only for images rotated with very small angles. Recently, there have been some works on rotation-invariant neural network such as rotating weights [5, 6], enlarged receptive field using dialed convolutional neural network (CNN) [7] or a pyramid pooling layer [8], rotation region proposals for recognizing arbitrarily placed texts [9] and polar transform network to extract rotation-invariant features [10]. Applications of deep neural network based object and grasp detections could be expanded, significantly when the network output is processed by a high-level reasoning over relationship of objects. Recently, robotic grasp detection and object detection with reasoning have been investigated using deep neural networks (DNNs). There have been effects to combine these multi-tasks using separate networks so that robots can deal with situations of grasping specific target objects in the cluttered, stacked, complex piles of novel objects from a single RGB-D camera. We propose a single multi-task DNN that yields an accurate detections of objects, grasp position and relationship reasoning among objects. Our proposed methods yield state-of-the-art performance with the accuracy of 98.6%and 74.2% with the computation speed of 33 and 62 frame per second on VMRD and Cornell datasets, respectively. Our methods also yielded 95.3% grasp success rate for novel object grasping tasks with a 4-axis robot arm and 86.7% grasp success rate in cluttered novel objects with a humanoid robotope

ScholarWorks@UNIST

Review of Face Detection Systems Based Artificial Neural Networks Algorithms

Author: AL-Allaf Omaima N. A.
Publication venue: 'Academy and Industry Research Collaboration Center (AIRCC)'
Publication date: 01/01/2014
Field of study

Face detection is one of the most relevant applications of image processing and biometric systems. Artificial neural networks (ANN) have been used in the field of image processing and pattern recognition. There is lack of literature surveys which give overview about the studies and researches related to the using of ANN in face detection. Therefore, this research includes a general review of face detection studies and systems which based on different ANN approaches and algorithms. The strengths and limitations of these literature studies and systems were included also.Comment: 16 pages, 12 figures, 1 table, IJMA Journa

arXiv.org e-Print Archive

Crossref

Polar Fusion Technique Analysis for Evaluating the Performances of Image Fusion of Thermal and Visual Images for Human Face Recognition

Author: Basu Dipak Kumar
Bhattacharjee Debotosh
Bhowmik Mrinal Kanti
Nasipuri Mita
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 17/06/2011
Field of study

This paper presents a comparative study of two different methods, which are based on fusion and polar transformation of visual and thermal images. Here, investigation is done to handle the challenges of face recognition, which include pose variations, changes in facial expression, partial occlusions, variations in illumination, rotation through different angles, change in scale etc. To overcome these obstacles we have implemented and thoroughly examined two different fusion techniques through rigorous experimentation. In the first method log-polar transformation is applied to the fused images obtained after fusion of visual and thermal images whereas in second method fusion is applied on log-polar transformed individual visual and thermal images. After this step, which is thus obtained in one form or another, Principal Component Analysis (PCA) is applied to reduce dimension of the fused images. Log-polar transformed images are capable of handling complicacies introduced by scaling and rotation. The main objective of employing fusion is to produce a fused image that provides more detailed and reliable information, which is capable to overcome the drawbacks present in the individual visual and thermal face images. Finally, those reduced fused images are classified using a multilayer perceptron neural network. The database used for the experiments conducted here is Object Tracking and Classification Beyond Visible Spectrum (OTCBVS) database benchmark thermal and visual face images. The second method has shown better performance, which is 95.71% (maximum) and on an average 93.81% as correct recognition rate.Comment: Proceedings of IEEE Workshop on Computational Intelligence in Biometrics and Identity Management (IEEE CIBIM 2011), Paris, France, April 11 - 15, 201

arXiv.org e-Print Archive

Crossref

Multi-view Face Detection Using Deep Convolutional Neural Networks

Author: Garcia C.
Girshick R. B.
Kaiming He S. R.
Krizhevsky A.
Martin Koestinger P. M. R.
Osadchy M.
Osadchy R.
Ramanan D.
Saberian M.
Sermanet P.
Sun Y.
Szegedy C.
Szegedy C.
Szegedy C.
Tompson Y. L.
Vaillant R.
Viola M.
Wu B.
Publication venue
Publication date: 20/04/2015
Field of study

In this paper we consider the problem of multi-view face detection. While there has been significant research on this problem, current state-of-the-art approaches for this task require annotation of facial landmarks, e.g. TSM [25], or annotation of face poses [28, 22]. They also require training dozens of models to fully capture faces in all orientations, e.g. 22 models in HeadHunter method [22]. In this paper we propose Deep Dense Face Detector (DDFD), a method that does not require pose/landmark annotation and is able to detect faces in a wide range of orientations using a single model based on deep convolutional neural networks. The proposed method has minimal complexity; unlike other recent deep learning object detection methods [9], it does not require additional components such as segmentation, bounding-box regression, or SVM classifiers. Furthermore, we analyzed scores of the proposed face detector for faces in different orientations and found that 1) the proposed method is able to detect faces from different angles and can handle occlusion to some extent, 2) there seems to be a correlation between dis- tribution of positive examples in the training set and scores of the proposed face detector. The latter suggests that the proposed methods performance can be further improved by using better sampling strategies and more sophisticated data augmentation techniques. Evaluations on popular face detection benchmark datasets show that our single-model face detector algorithm has similar or better performance compared to the previous methods, which are more complex and require annotations of either different poses or facial landmarks.Comment: in International Conference on Multimedia Retrieval 2015 (ICMR

arXiv.org e-Print Archive

Crossref

Oriented Response Networks

Author: Jiao Jianbin
Qiu Qiang
Ye Qixiang
Zhou Yanzhao
Publication venue
Publication date: 12/07/2017
Field of study

Deep Convolution Neural Networks (DCNNs) are capable of learning unprecedentedly effective image representations. However, their ability in handling significant local and global image rotations remains limited. In this paper, we propose Active Rotating Filters (ARFs) that actively rotate during convolution and produce feature maps with location and orientation explicitly encoded. An ARF acts as a virtual filter bank containing the filter itself and its multiple unmaterialised rotated versions. During back-propagation, an ARF is collectively updated using errors from all its rotated versions. DCNNs using ARFs, referred to as Oriented Response Networks (ORNs), can produce within-class rotation-invariant deep features while maintaining inter-class discrimination for classification tasks. The oriented response produced by ORNs can also be used for image and object orientation estimation tasks. Over multiple state-of-the-art DCNN architectures, such as VGG, ResNet, and STN, we consistently observe that replacing regular filters with the proposed ARFs leads to significant reduction in the number of network parameters and improvement in classification performance. We report the best results on several commonly used benchmarks.Comment: Accepted in CVPR 2017. Source code available at http://yzhou.work/OR

arXiv.org e-Print Archive

Crossref