19 research outputs found

    Depth sensor based object detection using surface curvature

    Get PDF
    An object detection system finds objects from an image or video sequence of the real world. The good performance of object detection has been largely driven by the development of well-established robust feature sets. By using conventional color images as input, researchers have achieved major success. Recent dramatic advances in depth imaging technology triggered significant attention to revisit object detection problems using depth images as input. Using depth information, we propose a feature, Histogram of Oriented Curvature (HOC), designed specifically to capture local surface shape for object detection with depth sensor. We form the HOC feature as a concatenation of the local histograms of Gaussian curvature and mean curvature. The linear Support Vector Machine (SVM) is employed for the object detection task in this work. We evaluate our proposed HOC feature on two widely used datasets and compare the results with other well-known object detection methods applied on both RGB images and depth images. Our experimental results show that the proposed HOC feature generally outperform the HOG and HOGD features in object detection task, and can achieve similar or higher results compared with the state-of-the-art depth descriptor HONV on some object categories

    Merging Live and pre-Captured Data to support Full 3D Head Reconstruction for Telepresence

    Get PDF
    International audienceThis paper proposes a 3D head reconstruction method for low cost 3D telepresence systems that uses only a single consumer level hybrid sensor (color+depth) located in front of the users. Our method fuses the real-time, noisy and incomplete output of a hybrid sensor with a set of static, high-resolution textured models acquired in a calibration phase. A complete and fully textured 3D model of the users' head can thus be reconstructed in real-time, accurately preserving the facial expression of the user. The main features of our method are a mesh interpolation and a fusion of a static and a dynamic textures to combine respectively a better resolution and the dynamic features of the face

    Accessible options for deaf people in e-Learning platforms: technology solutions for sign language translation

    Get PDF
    AbstractThis paper presents a study on potential technology solutions for enhancing the communication process for deaf people on e-learning platforms through translation of Sign Language (SL). Considering SL in its global scope as a spatial-visual language not limited to gestures or hand/forearm movement, but also to other non-dexterity markers such as facial expressions, it is necessary to ascertain whether the existing technology solutions can be effective options for the SL integration on e-learning platforms. Thus, we aim to present a list of potential technology options for the recognition, translation and presentation of SL (and potential problems) through the analysis of assistive technologies, methods and techniques, and ultimately to contribute for the development of the state of the art and ensure digital inclusion of the deaf people in e-learning platforms. The analysis show that some interesting technology solutions are under research and development to be available for digital platforms in general, but yet some critical challenges must solved and an effective integration of these technologies in e-learning platforms in particular is still missing

    Accurate and Robust 3D Facial Capture Using a Single RGBD Camera

    Full text link
    This paper presents an automatic and robust approach that accurately captures high-quality 3D facial perfor-mances using a single RGBD camera. The key of our ap-proach is to combine the power of automatic facial feature detection and image-based 3D nonrigid registration tech-niques for 3D facial reconstruction. In particular, we de-velop a robust and accurate image-based nonrigid regis-tration algorithm that incrementally deforms a 3D template mesh model to best match observed depth image data and important facial features detected from single RGBD im-ages. The whole process is fully automatic and robust be-cause it is based on single frame facial registration frame-work. The system is flexible because it does not require any strong 3D facial priors such as blendshape models. We demonstrate the power of our approach by capturing a wide range of 3D facial expressions using a single RGBD camera and achieve state-of-the-art accuracy by comparing against alternative methods. 1

    Face recognition with the RGB-D sensor

    Get PDF
    Face recognition in unconstrained environments is still a challenge, because of the many variations of the facial appearance due to changes in head pose, lighting conditions, facial expression, age, etc. This work addresses the problem of face recognition in the presence of 2D facial appearance variations caused by 3D head rotations. It explores the advantages of the recently developed consumer-level RGB-D cameras (e.g. Kinect). These cameras provide color and depth images at the same rate. They are affordable and easy to use, but the depth images are noisy and in low resolution, unlike laser scanned depth images. The proposed approach to face recognition is able to deal with large head pose variations using RGB-D face images. The method uses the depth information to correct the pose of the face. It does not need to learn a generic face model or make complex 3D-2D registrations. It is simple and fast, yet able to deal with large pose variations and perform pose-invariant face recognition. Experiments on a public database show that the presented approach is effective and efficient under significant pose changes. Also, the idea is used to develop a face recognition software that is able to achieve real-time face recognition in the presence of large yaw rotations using the Kinect sensor. It is shown in real-time how this method improves recognition accuracy and confidence level. This study demonstrates that RGB-D sensors are a promising tool that can lead to the development of robust pose-invariant face recognition systems under large pose variations

    4D Unconstrained Real-time Face Recognition Using a Commodity Depthh Camera

    Get PDF
    Robust unconstrained real-time face recognition still remains a challenge today. The recent addition to the market of lightweight commodity depth sensors brings new possibilities for human-machine interaction and therefore face recognition. This article accompanies the reader through a succinct survey of the current literature on face recognition in general and 3D face recognition using depth sensors in particular. Consequent to the assessment of experiments performed using implementations of the most established algorithms, it can be concluded that the majority are biased towards qualitative performance and are lacking in speed. A novel method which uses noisy data from such a commodity sensor to build dynamic internal representations of faces is proposed. Distances to a surface normal to the face are measured in real-time and used as input to a specific type of recurrent neural network, namely long short-term memory. This enables the prediction of facial structure in linear time and also increases robustness towards partial occlusions

    Facial Expression Recognition Utilizing Local Direction-Based Robust Features and Deep Belief Network

    Get PDF
    Emotional health plays very vital role to improve people's quality of lives, especially for the elderly. Negative emotional states can lead to social or mental health problems. To cope with emotional health problems caused by negative emotions in daily life, we propose efficient facial expression recognition system to contribute in emotional healthcare system. Thus, facial expressions play a key role in our daily communications, and recent years have witnessed a great amount of research works for reliable facial expressions recognition (FER) systems. Therefore, facial expression evaluation or analysis from video information is very challenging and its accuracy depends on the extraction of robust features. In this paper, a unique feature extraction method is presented to extract distinguished features from the human face. For person independent expression recognition, depth video data is used as input to the system where in each frame, pixel intensities are distributed based on the distances to the camera. A novel robust feature extraction process is applied in this work which is named as local directional position pattern (LDPP). In LDPP, after extracting local directional strengths for each pixel such as applied in typical local directional pattern (LDP), top directional strength positions are considered in binary along with their strength sign bits. Considering top directional strength positions with strength signs in LDPP can differentiate edge pixels with bright as well as dark regions on their opposite sides by generating different patterns whereas typical LDP only considers directions representing the top strengths irrespective of their signs as well as position orders (i.e., directions with top strengths represent 1 and rest of them 0), which can generate the same patterns in this regard sometimes. Hence, LDP fails to distinguish edge pixels with opposite bright and dark regions in some cases which can be overcome by LDPP. Moreover, the LDPP capabilities are extended through principal component analysis (PCA) and generalized discriminant analysis (GDA) for better face characteristic illustration in expression. The proposed features are finally applied with deep belief network (DBN) for expression training and recognition

    Lip syncing method for realistic expressive three-dimensional face model

    Get PDF
    Lip synchronization of 3D face model is now being used in a multitude of important fields. It brings a more human and dramatic reality to computer games, films and interactive multimedia, and is growing in use and importance. High level realism can be used in demanding applications such as computer games and cinema. Authoring lip syncing with complex and subtle expressions is still difficult and fraught with problems in terms of realism. Thus, this study proposes a lip syncing method of realistic expressive 3D face model. Animated lips require a 3D face model capable of representing the movement of face muscles during speech and a method to produce the correct lip shape at the correct time. The 3D face model is designed based on MPEG-4 facial animation standard to support lip syncing that is aligned with input audio file. It deforms using Raised Cosine Deformation function that is grafted onto the input facial geometry. This study also proposes a method to animate the 3D face model over time to create animated lip syncing using a canonical set of visemes for all pairwise combinations of a reduced phoneme set called ProPhone. Finally, this study integrates emotions by considering both Ekman model and Plutchik’s wheel with emotive eye movements by implementing Emotional Eye Movements Markup Language to produce realistic 3D face model. The experimental results show that the proposed model can generate visually satisfactory animations with Mean Square Error of 0.0020 for neutral, 0.0024 for happy expression, 0.0020 for angry expression, 0.0030 for fear expression, 0.0026 for surprise expression, 0.0010 for disgust expression, and 0.0030 for sad expression

    Facial feature point fitting with combined color and depth information for interactive displays

    Get PDF
    Interactive displays are driven by natural interaction with the user, necessitating a computer system that recognizes body gestures and facial expressions. User inputs are not easily or reliably recognized for a satisfying user experience, as the complexities of human communication are difficult to interpret in real-time. Recognizing facial expressions in particular is a problem that requires high-accuracy and efficiency for stable interaction environments. The recent availability of the Kinect, a low cost, low resolution sensor that supplies simultaneous color and depth images, provides a breakthrough opportunity to enhance the interactive capabilities of displays and overall user experience. This new RGBD (RGB + depth) sensor generates an additional channel of depth information that can be used to improve the performance of existing state of the art technology and develop new techniques. The Active Shape Model (ASM) is a well-known deformable model that has been extensively studied for facial feature point placement. Previous shape model techniques have applied 3D reconstruction techniques using multiple cameras or other statistical methods for producing 3D information from 2D color images. These methods showed improved results compared to using only color data, but required an additional deformable model or expensive imaging equipment. In this thesis, an ASM model is trained using the RGBD image produced by the Kinect. The real-time information from the depth sensor is registered to the color image to create a pixel-for-pixel match. To improve the quality of the depth image, a temporal median filter is applied to reduce random noise produced by the sensor. The resulting combined model is designed to produce more robust fitting of facial feature points compared to a purely color based active shape model
    corecore