Search CORE

99,268 research outputs found

Model based methods for locating, enhancing and recognising low resolution objects in video

Author: Kramer Annika
Publication venue: Curtin University
Publication date: 01/01/2009
Field of study

Visual perception is our most important sense which enables us to detect and recognise objects even in low detail video scenes. While humans are able to perform such object detection and recognition tasks reliably, most computer vision algorithms struggle with wide angle surveillance videos that make automatic processing difficult due to low resolution and poor detail objects. Additional problems arise from varying pose and lighting conditions as well as non-cooperative subjects. All these constraints pose problems for automatic scene interpretation of surveillance video, including object detection, tracking and object recognition.Therefore, the aim of this thesis is to detect, enhance and recognise objects by incorporating a priori information and by using model based approaches. Motivated by the increasing demand for automatic methods for object detection, enhancement and recognition in video surveillance, different aspects of the video processing task are investigated with a focus on human faces. In particular, the challenge of fully automatic face pose and shape estimation by fitting a deformable 3D generic face model under varying pose and lighting conditions is tackled. Principal Component Analysis (PCA) is utilised to build an appearance model that is then used within a particle filter based approach to fit the 3D face mask to the image. This recovers face pose and person-specific shape information simultaneously. Experiments demonstrate the use in different resolution and under varying pose and lighting conditions. Following that, a combined tracking and super resolution approach enhances the quality of poor detail video objects. A 3D object mask is subdivided such that every mask triangle is smaller than a pixel when projected into the image and then used for model based tracking. The mask subdivision then allows for super resolution of the object by combining several video frames. This approach achieves better results than traditional super resolution methods without the use of interpolation or deblurring.Lastly, object recognition is performed in two different ways. The first recognition method is applied to characters and used for license plate recognition. A novel character model is proposed to create different appearances which are then matched with the image of unknown characters for recognition. This allows for simultaneous character segmentation and recognition and high recognition rates are achieved for low resolution characters down to only five pixels in size. While this approach is only feasible for objects with a limited number of different appearances, like characters, the second recognition method is applicable to any object, including human faces. Therefore, a generic 3D face model is automatically fitted to an image of a human face and recognition is performed on a mask level rather than image level. This approach does not require an initial pose estimation nor the selection of feature points, the face alignment is provided implicitly by the mask fitting process

espace@Curtin

Query Based Face Retrieval From Automatic Reconstructed Images based on 3D Frontal View - Using EICA

Author: Dr. A. Govardhan
Prof. Y.Vijaya Lata
Publication venue: Global Journals Inc. (US)
Publication date: 10/05/2011
Field of study

Face recognition systems have been playing a vital role from several decades. Thus, various algorithms for face recognition are developed for various applications like 2018;person identification2019;, 2019;human computer interaction2019;, 2019;security systems2019;. A framework for face recognition with different poses through face reconstruction is being proposed in this paper. In the present work, the system is trained with only a single frontal face with normal illumination and expression. Instead of capturing the image of a person in different poses using camera or video, different views of the 3D face are reconstructed with the help of a 3D face shape model. This automatically increases the size of the training set. This approach outperforms the present 2D techniques with higher recognition rate. This paper refers to the face detection and recognition approach, which primarily focuses on Enhanced Independent Component Analysis(EICA) for the Query Based Face Retrieval and the implementation is done in Scilab. This method detects the static face(cropped photo as input) and also faces from group picture, and these faces are reconstructed using 3D face shape model. Image preprocessing is used inorder to reduce the error rate when there are illuminated images. Scilab2019;s SIVP toolbox is used for image analysis

Global Journal of Computer Science and Technology (GJCST)

CMS-RCNN: Contextual Multi-Scale Region-based CNN for Unconstrained Face Detection

Author: C. Lawrence Zitnick
Dong Chen
M Everingham
Markus Mathias
Matthew D. Zeiler
P Felzenszwalb
R Girshick
Tsung-Yi Lin
Publication venue
Publication date: 16/06/2016
Field of study

Robust face detection in the wild is one of the ultimate components to support various facial related problems, i.e. unconstrained face recognition, facial periocular recognition, facial landmarking and pose estimation, facial expression recognition, 3D facial model construction, etc. Although the face detection problem has been intensely studied for decades with various commercial applications, it still meets problems in some real-world scenarios due to numerous challenges, e.g. heavy facial occlusions, extremely low resolutions, strong illumination, exceptionally pose variations, image or video compression artifacts, etc. In this paper, we present a face detection approach named Contextual Multi-Scale Region-based Convolution Neural Network (CMS-RCNN) to robustly solve the problems mentioned above. Similar to the region-based CNNs, our proposed network consists of the region proposal component and the region-of-interest (RoI) detection component. However, far apart of that network, there are two main contributions in our proposed network that play a significant role to achieve the state-of-the-art performance in face detection. Firstly, the multi-scale information is grouped both in region proposal and RoI detection to deal with tiny face regions. Secondly, our proposed network allows explicit body contextual reasoning in the network inspired from the intuition of human vision system. The proposed approach is benchmarked on two recent challenging face detection databases, i.e. the WIDER FACE Dataset which contains high degree of variability, as well as the Face Detection Dataset and Benchmark (FDDB). The experimental results show that our proposed approach trained on WIDER FACE Dataset outperforms strong baselines on WIDER FACE Dataset by a large margin, and consistently achieves competitive results on FDDB against the recent state-of-the-art face detection methods

arXiv.org e-Print Archive

Crossref

3d Face Reconstruction And Emotion Analytics With Part-Based Morphable Models

Author: Jin Hai
Publication venue: DigitalCommons@WayneState
Publication date: 01/01/2018
Field of study

3D face reconstruction and facial expression analytics using 3D facial data are new and hot research topics in computer graphics and computer vision. In this proposal, we first review the background knowledge for emotion analytics using 3D morphable face model, including geometry feature-based methods, statistic model-based methods and more advanced deep learning-bade methods. Then, we introduce a novel 3D face modeling and reconstruction solution that robustly and accurately acquires 3D face models from a couple of images captured by a single smartphone camera. Two selfie photos of a subject taken from the front and side are used to guide our Non-Negative Matrix Factorization (NMF) induced part-based face model to iteratively reconstruct an initial 3D face of the subject. Then, an iterative detail updating method is applied to the initial generated 3D face to reconstruct facial details through optimizing lighting parameters and local depths. Our iterative 3D face reconstruction method permits fully automatic registration of a part-based face representation to the acquired face data and the detailed 2D/3D features to build a high-quality 3D face model. The NMF part-based face representation learned from a 3D face database facilitates effective global and adaptive local detail data fitting alternatively. Our system is flexible and it allows users to conduct the capture in any uncontrolled environment. We demonstrate the capability of our method by allowing users to capture and reconstruct their 3D faces by themselves. Based on the 3D face model reconstruction, we can analyze the facial expression and the related emotion in 3D space. We present a novel approach to analyze the facial expressions from images and a quantitative information visualization scheme for exploring this type of visual data. From the reconstructed result using NMF part-based morphable 3D face model, basis parameters and a displacement map are extracted as features for facial emotion analysis and visualization. Based upon the features, two Support Vector Regressions (SVRs) are trained to determine the fuzzy Valence-Arousal (VA) values to quantify the emotions. The continuously changing emotion status can be intuitively analyzed by visualizing the VA values in VA-space. Our emotion analysis and visualization system, based on 3D NMF morphable face model, detects expressions robustly from various head poses, face sizes and lighting conditions, and is fully automatic to compute the VA values from images or a sequence of video with various facial expressions. To evaluate our novel method, we test our system on publicly available databases and evaluate the emotion analysis and visualization results. We also apply our method to quantifying emotion changes during motivational interviews. These experiments and applications demonstrate effectiveness and accuracy of our method. In order to improve the expression recognition accuracy, we present a facial expression recognition approach with 3D Mesh Convolutional Neural Network (3DMCNN) and a visual analytics guided 3DMCNN design and optimization scheme. The geometric properties of the surface is computed using the 3D face model of a subject with facial expressions. Instead of using regular Convolutional Neural Network (CNN) to learn intensities of the facial images, we convolve the geometric properties on the surface of the 3D model using 3DMCNN. We design a geodesic distance-based convolution method to overcome the difficulties raised from the irregular sampling of the face surface mesh. We further present an interactive visual analytics for the purpose of designing and modifying the networks to analyze the learned features and cluster similar nodes in 3DMCNN. By removing low activity nodes in the network, the performance of the network is greatly improved. We compare our method with the regular CNN-based method by interactively visualizing each layer of the networks and analyze the effectiveness of our method by studying representative cases. Testing on public datasets, our method achieves a higher recognition accuracy than traditional image-based CNN and other 3D CNNs. The presented framework, including 3DMCNN and interactive visual analytics of the CNN, can be extended to other applications

Digital Commons@Wayne State University

GTAutoAct: An Automatic Datasets Generation Framework Based on Game Engine Redevelopment for Action Recognition

Author: Chen Shi
Demachi Kazuyuki
Li Zhan
Song Xingyu
Publication venue
Publication date: 24/01/2024
Field of study

Current datasets for action recognition tasks face limitations stemming from traditional collection and generation methods, including the constrained range of action classes, absence of multi-viewpoint recordings, limited diversity, poor video quality, and labor-intensive manually collection. To address these challenges, we introduce GTAutoAct, a innovative dataset generation framework leveraging game engine technology to facilitate advancements in action recognition. GTAutoAct excels in automatically creating large-scale, well-annotated datasets with extensive action classes and superior video quality. Our framework's distinctive contributions encompass: (1) it innovatively transforms readily available coordinate-based 3D human motion into rotation-orientated representation with enhanced suitability in multiple viewpoints; (2) it employs dynamic segmentation and interpolation of rotation sequences to create smooth and realistic animations of action; (3) it offers extensively customizable animation scenes; (4) it implements an autonomous video capture and processing pipeline, featuring a randomly navigating camera, with auto-trimming and labeling functionalities. Experimental results underscore the framework's robustness and highlights its potential to significantly improve action recognition model training

arXiv.org e-Print Archive

3D Human Face Reconstruction and 2D Appearance Synthesis

Author: Zhao Yajie
Publication venue: UKnowledge
Publication date: 01/01/2018
Field of study

3D human face reconstruction has been an extensive research for decades due to its wide applications, such as animation, recognition and 3D-driven appearance synthesis. Although commodity depth sensors are widely available in recent years, image based face reconstruction are significantly valuable as images are much easier to access and store. In this dissertation, we first propose three image-based face reconstruction approaches according to different assumption of inputs. In the first approach, face geometry is extracted from multiple key frames of a video sequence with different head poses. The camera should be calibrated under this assumption. As the first approach is limited to videos, we propose the second approach then focus on single image. This approach also improves the geometry by adding fine grains using shading cue. We proposed a novel albedo estimation and linear optimization algorithm in this approach. In the third approach, we further loose the constraint of the input image to arbitrary in the wild images. Our proposed approach can robustly reconstruct high quality model even with extreme expressions and large poses. We then explore the applicability of our face reconstructions on four interesting applications: video face beautification, generating personalized facial blendshape from image sequences, face video stylizing and video face replacement. We demonstrate great potentials of our reconstruction approaches on these real-world applications. In particular, with the recent surge of interests in VR/AR, it is increasingly common to see people wearing head-mounted displays. However, the large occlusion on face is a big obstacle for people to communicate in a face-to-face manner. Our another application is that we explore hardware/software solutions for synthesizing the face image with presence of HMDs. We design two setups (experimental and mobile) which integrate two near IR cameras and one color camera to solve this problem. With our algorithm and prototype, we can achieve photo-realistic results. We further propose a deep neutral network to solve the HMD removal problem considering it as a face inpainting problem. This approach doesn\u27t need special hardware and run in real-time with satisfying results

University of Kentucky

Recommended from our members

Fingers micro-gesture recognition based on holoscopic 3D imaging system

Author: Liu Yi
Publication venue: Brunel University London
Publication date: 01/01/2020
Field of study

This thesis was submitted for the award of Doctor of Philosophy and was awarded by Brunel University LondonMicro-gesture recognition has been widely research in recent years, in particular there has been a great focus on 3D micro-gesture recognition which consists of classifying the micro-gesture movements of the fingers for touch-less control applications. Holoscopic 3D imaging system mimics fly’s eye technique to capture true 3D scene which is enrich in both texture and motion information. As a result, holoscopic 3D imaging system shall be a suitable approach for robust recognition application. This PhD research focuses on innovative 3D micro-gesture recognition based on holoscopic 3D system which delivers robust and reliable performance with precision for 3D micro-gestures. Indeed this can be applied to other wide range of applications such as Internet of things (IoT), AR/VR, robotics and other touch-less interaction. Due to lack of holoscopic 3D dataset, a comprehensive 3D micro-gesture dataset (HoMG) includes both holoscopic 3D images and videos is prepared. It is a reasonable size holoscopic 3D dataset which is captured with different camera settings and conditions from 40 participants. Innovative 3D micro-gesture recognition is proposed based on 2D feature extraction methods with basic classification methods, the recognition accuracy can reach around 50.9%. For video-based data, the 3D feature extraction methods are achieved 66.7% recognition accuracy over 50.9% accuracy for micro-gesture images as the initial investigation. HoMG database held a challenge in IEEE International automatic face and gesture 2018, and 4 groups from the international research institutes joined the challenge and contributed many new methods as further development where the proposed method was published. The holoscopic 3D dataset further enrich innovative micro-gesture 3D recognition system is proposed and its performance is evaluated by carrying out like to like comparison with state of the art methods. In addition, a fast and efficient pre-processing algorithm for H3D images to extract the element images. Simplified viewpoint image extraction method are presented. A pre-trained CNN model with the attention mechanics is implemented based on VP image for the predicted probabilities of gesture. The proposed approached is further improved using voting strategy. The proposed approach achieves 87% accuracy, which outperform all existing state of the art methods on the image-based database. Advanced 3D micro-gesture recognition is investigated based on sequence video database, the end-to-end model has been used on effective H3D based micro-gesture recognition system. For front-end network, there are two method of traditional viewpoint image extraction and novel pseudo viewpoint image extraction have been used and evaluated. The pseudo viewpoint (PVP) front-end has been created, which used to deep learning networks understanding the implied 3D information of H3D imaging system. The viewpoint (VP) front-end follows the traditional H3D image method to extract and reconstruct the multi-viewpoint images. Both front-end have been feed in four popular advanced deep networks using for learning and classification. This experiments evaluated the performance of 2D/3D convolutional, mixing 2D and 3D convolutional and LSTM on the HoMG video database, which is beneficial to H3D imaging system using deep learning network. Finally, in order to obtain the high accuracies, the majority voting has been applied for further improve. The final results show that the performance is not only better than the traditional methods, but also superior to the existing deep learning based approaches, which clearly demonstrates the effectiveness of the proposed approach

Brunel University Research Archive