64,627 research outputs found

    Efficient Human Facial Pose Estimation

    Get PDF
    Pose estimation has become an increasingly important area in computer vision and more specifically in human facial recognition and activity recognition for surveillance applications. Pose estimation is a process by which the rotation, pitch, or yaw of a human head is determined. Numerous methods already exist which can determine the angular change of a face, however, these methods vary in accuracy and their computational requirements tend to be too high for real-time applications. The objective of this thesis is to develop a method for pose estimation, which is computationally efficient, while still maintaining a reasonable degree of accuracy. In this thesis, a feature-based method is presented to determine the yaw angle of a human facial pose using a combination of artificial neural networks and template matching. The artificial neural networks are used for the feature detection portion of the algorithm along with skin detection and other image enhancement algorithms. The first head model, referred to as the Frontal Position Model, determines the pose of the face using two eyes and the mouth. The second model, referred to as the Side Position Model, is used when only one eye can be viewed and determines pose based on a single eye, the nose tip, and the mouth. The two models are presented to demonstrate the position change of facial features due to pose and to provide the means to determine the pose as these features change from the frontal position. The effectiveness of this pose estimation method is examined by looking at both the manual and automatic feature detection methods. Analysis is further performed on how errors in feature detection affect the resulting pose determination. The method resulted in the detection of facial pose from 30 to -30 degrees with an average error of 4.28 degrees for the Frontal Position Model and 5.79 degrees for the Side Position Model with correct feature detection. The Intel(R) Streaming SIMD Extensions (SSE) technology was employed to enhance the performance of floating point operations. The neural networks used in the feature detection process require a large amount of floating point calculations, due to the computation of the image data with weights and biases. With SSE optimization the algorithm becomes suitable for processing images in a real-time environment. The method is capable of determining features and estimating the pose at a rate of seven frames per second on a 1.8 GHz Pentium 4 computer

    Infrared LEDs-based pose estimation with underground camera model for Boom-type roadheader in coal mining

    Get PDF
    Accurate and reliable pose estimation of boom-type roadheader is of great importance in order to maintain the efficiency of automatic coal mining. The stability and accuracy of conventional measurement methods are difficult to be guaranteed on account of vibration noise, magnetic disturbance, electrostatic interference and other factors in underground environment. In this paper a vision-based non-contact measurement method for cutting-head pose estimation is presented, which deploy a 16-point infrared LED target on the boom-type roadheader to tackle the low illumination, high dust and complicated background. By establishing monocular vision measurement system, the cutting-head pose is estimated through processing the LED target images obtained from an explosion-proof industrial camera mounted on the roadheader. After analyzing the measurement mechanism, an underground camera model based on the equivalent focal length is built to eliminate refraction errors caused by the two layer glasses for explosion-proof and dust removal glasses. Then the pose estimation processes, including infrared LEDs feature points extraction, spot center location, improved P4P method based on dual quaternions, are carried out. The influence factors of cutting-head pose estimation accuracy is further studied by modeling, and the error distribution of the main parameters is investigated and evaluated. Numerical simulation and experimental evaluation are designed to verify the performance of the proposed method. The results show that the pose estimation error is in line with the numerical prediction, achieving the requirements of cutting-head pose estimation in underground roadway construction in coal mine

    2D ์˜์ƒ์—์„œ ์‹œ๊ฐ„ ํŠน์ง•๊ณผ ์ง€์—ญ์  ํŠน์ง•์˜ ๋ถ„์„์„ ํ†ตํ•œ ์‚ฌ๋žŒ ํฌ์ฆˆ ๊ฒ€์ถœ ๋ฐ ์ถ”์ 

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ (๋ฐ•์‚ฌ) -- ์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› : ์œตํ•ฉ๊ณผํ•™๊ธฐ์ˆ ๋Œ€ํ•™์› ์œตํ•ฉ๊ณผํ•™๋ถ€(์ง€๋Šฅํ˜•์œตํ•ฉ์‹œ์Šคํ…œ์ „๊ณต), 2021. 2. ๊ณฝ๋…ธ์ค€.2D human pose estimation and tracking aim to detect the location of a person's parts and their trajectory. A pose is composed of parts of a person, and a person's part is an element of the body such as arms, legs and head. Pose estimation technique is being utilized both industrially and academically. For example, in a home training system, pose detection can detect the user's pose and help the user correct the posture. Also, in human action recognition research, human pose information can be exploited as a helpful supplementary information. In order to apply human pose studies to real-world systems, the model is required to be of high performance and also light enough to run in a real-time manner. In this paper, we have focused on improving accuracy. We have considered how to utilize the feature values to achieve high accuracy using the spatial and temporal features. Spatial feature means characteristic values such as textures, patterns, and postures that can be extracted from images. We have made better use of the spatial feature by dividing it into local and global features. The global feature is likely to include a large number of parts, while the local feature focuses on a relatively small number of parts. First, we have proposed a structure that can use the global-local feature at the same time to improve the performance. The global network intensively learns the global feature, and the local network can learn various regional information from images. The local network performs as a function of refining the pose detected in the global network sequentially. To prove the efficiency of the proposed method, experiments have been conducted on the Leeds sports dataset (LSP) data, which is one of the single-person pose estimation datasets. Secondly, we define the rare pose using global feature and solve the imbalance in poses. First of all, the poses are classified using location information of the entire pose. Experiments have shown that the poses are distributed around certain poses (standing poses, upper body poses, etc.), and an imbalance between them apparently exists. We have proposed methods such as weighted loss, synthesizing rare pose data, etc. to resolve the imbalance. Experiments are conducted using MPII and COCO data, which are widely used in multi-person pose estimation. The temporal feature refers to the varying information of poses along the time. It is usually recommended to use time information when analyzing objects in a video. Therefore thirdly, we have estimated and tracked the poses with a map that expresses the change of a person's movement. The network learns the spatial and temporal maps together to create synergy between each other. The experiment has been conducted in multi-person pose tracking data, Posetrack 2017 and 2018. Even if the proposed three methods improve different issues, utilized together. For example, a new structure is a top-down approach and has parallel two deconvolutions for spatial (Heatmap) and temporal map (TML). Additionally, the rare pose data augmentation and the local network are applied to increase performance. Thus, adopting three methods is available to improve performance and more extensible in the pose estimation field.2D ์ด๋ฏธ์ง€์—์„œ ์‚ฌ๋žŒ์˜ ํฌ์ฆˆ๋ฅผ ๊ฒ€์ถœํ•˜๋Š” ์—ฐ๊ตฌ๋Š” ์‚ฌ๋žŒ์˜ ํŒŒํŠธ๋“ค์˜ ์œ„์น˜๋ฅผ ๊ฒ€์ถœํ•˜๋Š” ๊ฒƒ์„ ๋ชฉํ‘œ๋กœํ•œ๋‹ค. ํฌ์ฆˆ๋Š” ์‚ฌ๋žŒ์˜ ํŒŒํŠธ๋“ค๋กœ ๊ตฌ์„ฑ๋˜์–ด ์žˆ๊ณ  ์‚ฌ๋žŒ์˜ ํŒŒํŠธ๋Š” ํŒ”, ๋‹ค๋ฆฌ, ๋จธ๋ฆฌ ๋“ฑ์œผ๋กœ ์‚ฌ๋žŒ์„ ๊ตฌ์„ฑํ•˜๋Š” ์‹ ์ฒด์˜ ์š”์†Œ๋“ค์„ ์˜๋ฏธํ•œ๋‹ค. ์‚ฌ๋žŒ์˜ ํฌ์ฆˆ ์ •๋ณด๋Š” ๋‹ค์–‘ํ•œ ๋ถ„์•ผ์—์„œ ํ™œ์šฉ ๋  ์ˆ˜ ์žˆ๋‹ค. ๋˜ํ•œ, ์‚ฌ๋žŒ์˜ ๋™์ž‘ ๊ฐ์ง€ ์—ฐ๊ตฌ ๋ถ„์•ผ์—์„œ๋Š” ์‚ฌ๋žŒ์˜ ํฌ์ฆˆ ์ •๋ณด๊ฐ€ ๋งค์šฐ ํ›Œ๋ฅญํ•œ ์ž…๋ ฅ ํŠน์ง• ๊ฐ’์œผ๋กœ ์‚ฌ์šฉ๋œ๋‹ค. ์‚ฌ๋žŒ์˜ ํฌ์ฆˆ ๊ฒ€์ถœ ์—ฐ๊ตฌ๋ฅผ ์‹ค์ œ ์‹œ์Šคํ…œ์— ์ ์šฉํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š” ๋†’์€ ์ •ํ™•๋„, ์‹ค์‹œ๊ฐ„์„ฑ, ๋‹ค์–‘ํ•œ ๊ธฐ๊ธฐ์— ์‚ฌ์šฉ ๊ฐ€๋Šฅํ•˜๋„๋ก ๊ฐ€๋ฒผ์šด ๋ชจ๋ธ์ด ํ•„์š”ํ•˜๋‹ค. ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ์ •ํ™•๋„๋ฅผ ๊ฐœ์„ ํ•˜๋Š” ์—ฐ๊ตฌ์— ์ดˆ์ ์„ ๋งž์ท„๋‹ค. ๋†’์€ ์ •ํ™•๋„๋ฅผ ๋‹ฌ์„ฑํ•˜๊ธฐ ์œ„ํ•ด์„œ ํŠน์ง•๊ฐ’์„ ์–ด๋–ป๊ฒŒ ํ™œ์šฉํ• ์ง€์— ๋Œ€ํ•ด ๊ณ ๋ฏผ์„ ํ–ˆ์œผ๋ฉฐ, ์ง€์—ญ์  ํŠน์ง•๊ฐ’๊ณผ ์‹œ๊ฐ„ ํŠน์ง•๊ฐ’์„ ์‚ฌ์šฉํ•ด์„œ ๋ฌธ์ œ๋ฅผ ๊ฐœ์„ ํ–ˆ๋‹ค ์ง€์—ญ์  ํŠน์ง•๊ฐ’์€ ์‚ฌ๋žŒ์˜ ํ…์Šค์ณ, ํ˜•ํƒœ์™€ ๊ฐ™์€ ํŠน์ง•์„ ํ‘œํ˜„ํ•˜๋Š” ๊ฒƒ์„ ์˜๋ฏธํ•œ๋‹ค. ์šฐ๋ฆฌ๋Š” ์ง€์—ญ์  ํŠน์ง• ๊ฐ’์„ ๋‹ค์ˆ˜์˜ ํŒŒํŠธ๋ฅผ ๋‹ด๊ณ  ์žˆ๋Š” Global feature ์™€ ์†Œ์ˆ˜์˜ ํŒŒํŠธ๋ฅผ ๋‹ด๊ณ  ์žˆ๋Š” Local feature๋กœ ๋ถ„๋ฅ˜ํ•ด์„œ ๋ฌธ์ œ๋ฅผ ์ ‘๊ทผํ–ˆ๋‹ค. ์ฒซ๋ฒˆ์งธ๋กœ๋Š” global-local feature ์„ ๋™์‹œ์— ์‚ฌ์šฉํ•ด์„œ ์„ฑ๋Šฅ์„ ๊ฐœ์„ ํ•˜๋Š” ์—ฐ๊ตฌ์— ์ง‘์ค‘ํ–ˆ๋‹ค. Global feature์„ ์ง‘์ค‘์ ์œผ๋กœ ํ•™์Šตํ•˜๋Š” ๋„คํŠธ์›Œํฌ์™€ ๋‹ค์–‘ํ•œ ํ˜•ํƒœ์˜ local ์ •๋ณด๋ฅผ ํ•™์Šต ํ•  ์ˆ˜ ์žˆ๋Š” local network์„ ์„ค๊ณ„ํ–ˆ๋‹ค. Local network์—์„œ๋Š” global network์—์„œ ๊ฒ€์ถœํ•œ ํฌ์ฆˆ๋ฅผ ๋‹ค์‹œ ํ•œ๋ฒˆ ๊ฐœ์„ ํ•˜๋Š” ์—ญํ• ์„ ์ˆ˜ํ–‰ํ•œ๋‹ค. ์ œ์•ˆ๋œ ๋ฐฉ๋ฒ•์˜ ํšจ์œจ์„ฑ์„ ์ฆ๋ช…ํ•˜๊ธฐ ์œ„ํ•ด์„œ single-person pose estimation ๋ฐ์ดํ„ฐ ์ค‘ ํ•˜๋‚˜์ธ Leeds sports dataset (LSP) ๋ฐ์ดํ„ฐ์—์„œ ์‹คํ—˜์„ ์ˆ˜ํ–‰ํ–ˆ๋‹ค. ๋‘๋ฒˆ์งธ๋กœ๋Š” globalํ•œ ์ •๋ณด๋ฅผ ํ†ตํ•ด ํฌ๊ท€ํ•œ ํฌ์ฆˆ๋ฅผ ๊ฒ€์ถœํ•ด์„œ ํฌ์ฆˆ์˜ ๋ถˆ๊ท ํ˜•์„ ํ•ด์†Œํ•ด ์„ฑ๋Šฅ์„ ๊ฐœ์„ ํ•˜๋Š” ์—ฐ๊ตฌ๋ฅผ ์ˆ˜ํ–‰ํ–ˆ๋‹ค. ์šฐ์„ ์ ์œผ๋กœ ํฌ์ฆˆ ๋ฐ์ดํ„ฐ ๋‚ด์—์„œ ์ „์ฒด ํฌ์ฆˆ์˜ ์œ„์น˜ ์ •๋ณด๋ฅผ ์‚ฌ์šฉํ•ด์„œ ํฌ์ฆˆ๋“ค์„ ๋ถ„๋ฅ˜ํ–ˆ๋‹ค. ์‹คํ—˜ ๊ฒฐ๊ณผ, ์ผ์ • ํฌ์ฆˆ๋ฅผ (์„œ์žˆ๋Š” ํฌ์ฆˆ, ์ƒ๋ฐ˜์‹ ๋งŒ ์žˆ๋Š” ํฌ์ฆˆ ๋“ฑ) ์ค‘์‹ฌ์œผ๋กœ ํฌ์ฆˆ๋“ค์ด ๋ถ„ํฌ ๋œ๋ฉฐ ํฌ์ฆˆ ๊ฐ„์˜ ๋ถˆ๊ท ํ˜•์ด ์žˆ์Œ์„ ๋ฐํ˜€๋ƒˆ๋‹ค. ์šฐ๋ฆฌ๋Š” ํฌ์ฆˆ ๊ฐ„์˜ ๋ถˆ๊ท ํ˜•์„ ํ•ด์†Œํ•˜๊ธฐ ์œ„ํ•ด weight loss, generate rare pose data ๋“ฑ์˜ ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ–ˆ๋‹ค. ์ œ์•ˆ๋œ ๋ฐฉ๋ฒ•์˜ ํšจ์œจ์„ฑ์„ ์ฆ๋ช…ํ•˜๊ธฐ ์œ„ํ•ด์„œ multi-person pose estimation ๋ฐ์ดํ„ฐ์—์„œ ๋งŽ์ด ์‚ฌ์šฉ๋˜๋Š” MPII์™€ COCO ๋ฐ์ดํ„ฐ์—์„œ ์‹คํ—˜์„ ์ˆ˜ํ–‰ํ–ˆ๋‹ค. ์‹œ๊ฐ„ ํŠน์ง•๊ฐ’์€ ์‹œ๊ฐ„ ํ๋ฆ„์— ๋”ฐ๋ฅธ ์›€์ง์ž„ ๋ณ€ํ™”๊ฐ’์„ ์˜๋ฏธํ•œ๋‹ค. ๋™์˜์ƒ์—์„œ ๊ฐ์ฒด๋ฅผ ๋ถ„์„ํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š” ์‹œ๊ฐ„ ์ •๋ณด๋ฅผ ํ™œ์šฉํ•˜๋Š” ๊ฒƒ์ด ์ข‹๋‹ค. ๊ทธ๋ž˜์„œ ์„ธ๋ฒˆ์งธ๋กœ ์šฐ๋ฆฌ๋Š” ์‚ฌ๋žŒ์˜ ์›€์ง์ž„ ๋ณ€ํ™”๋ฅผ ๋งต์œผ๋กœ ํ‘œํ˜„ํ•ด์„œ ํฌ์ฆˆ๋ฅผ ์ถ”์ ํ–ˆ๋‹ค. ์ด๋•Œ ํฌ์ฆˆ์˜ ์ง€์—ญ์  ํŠน์ง•๊ฐ’๊ณผ ๊ฐ™์ด ํ•™์Šตํ•ด์„œ ์„œ๋กœ๊ฐ„์˜ ์‹œ๋„ˆ์ง€ ํšจ๊ณผ๋ฅผ ๋‚ผ ์ˆ˜ ์žˆ๋„๋ก ๋„คํŠธ์›Œํฌ๋ฅผ ์ œ์•ˆํ–ˆ๋‹ค. ์ œ์•ˆ๋œ ๋ฐฉ๋ฒ•์˜ ํšจ์œจ์„ฑ์„ ์ฆ๋ช…ํ•˜๊ธฐ ์œ„ํ•ด์„œ multi-person pose tracking ๋ฐ์ดํ„ฐ์ธ posetrack 2017 ๊ณผ 2018์—์„œ ์‹คํ—˜์„ ์ˆ˜ํ–‰ํ–ˆ๋‹ค. ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ์ง€์—ญ์  ํŠน์ง•๊ณผ ์‹œ๊ฐ„์  ํŠน์ง•์„ ํ™œ์šฉํ•ด์„œ ํฌ์ฆˆ์˜ ์„ฑ๋Šฅ์„ ๊ฐœ์„ ํ•˜๋Š” ๋ฐฉ๋ฒ•๋“ค์„ ์ œ์•ˆํ–ˆ๋‹ค. ์„œ๋กœ ๋‹ค๋ฅธ ๋ฌธ์ œ๋“ค์„ ํ•ด๊ฒฐํ–ˆ์ง€๋งŒ ๋‚˜์•„๊ฐ€ ํ•˜๋‚˜๋กœ ๋ฌถ์—ฌ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐ ํ•  ์ˆ˜ ์žˆ๋‹ค. ์˜ˆ๋ฅผ๋“ค์–ด, top-down ํ˜•ํƒœ์˜ ๋„คํŠธ์›Œํฌ ๊ตฌ์กฐ์—์„œ Heatmap๊ณผ TML์„ ๊ฐ๊ฐ ํ•™์Šต ํ•  ์ˆ˜ ์žˆ๋Š” ํ‰ํ–‰์  ๊ตฌ์กฐ์˜ decovolution network์„ ์ œ์•ˆ ํ•  ์ˆ˜ ์žˆ๋‹ค. ์—ฌ๊ธฐ์— Heatmap์˜ ์„ฑ๋Šฅ ๊ฐœ์„ ์„ ์œ„ํ•ด local network์™€ rare pose data augmentation ๋ฐฉ์‹ ๋˜ํ•œ ์ถ”๊ฐ€ํ•  ์ˆ˜ ์žˆ๋‹ค. ์ด๋ ‡๊ฒŒ ์ œ์•ˆ๋œ ๋ฐฉ๋ฒ•์„ ๊ฒฐํ•ฉํ•ด์„œ ๋” ๋‚˜์€ ํฌ์ฆˆ์˜ ์„ฑ๋Šฅ์„ ๊ฐœ์„  ํ•  ์ˆ˜ ์žˆ๋Š” ๋ฐฉ๋ฒ•๋“ค์ด ์ œ์•ˆ ๋  ์ˆ˜ ์žˆ๋‹ค.Abstract i Contents iii List of Tables vi List of Figures viii 1. Introduction 1 1.1 Spatial feature 3 1.1.1 Global-local network 4 1.1.2 Exploring rare pose estimation using global pose information 5 1.2 Temporal feature 6 2. Related work 8 2.1 Single-person pose estimation 8 2.2 Multi-person pose estimation 10 2.3 Multi-person pose tracking 14 2.4 Datasets and measurements 15 2.4.1 Leeds Sports Pose (LSP) dataset 16 2.4.2 MPII dataset 16 2.4.3 COCO 2017 keypoint dataset 17 2.4.4 PoseTrack 19 3. Single-person pose estimation 22 3.1 Global-local Network 24 3.2 Experiments 28 4. Rare pose estimation 39 4.1 Identification of Rare Pose 39 4.2 Enhancing the performance of rare pose estimation 41 4.2.1 Duplication of Rare Pose Samples (DRP) 42 4.2.2 Addition of Synthetic Rare Pose data (ASRP) 43 4.2.3 Weighted Loss based on Cluster Distance (WLCD) 47 4.2.4 Divide and Conquer Strategy for pose estimation (DACP) 48 4.3 Experiments 48 4.3.1 Results of Rare Pose Identification 50 4.3.2 Results of Proposed Methods 57 5. Multi-person pose estimation and tracking 62 5.1 Temporal flow Maps for Limb movement (TML) 65 5.2 Multi-stride method 69 5.3 Inference 70 5.4 Experiments 74 6. Future work 81 Abstract (In Korean) 97 ๊ฐ์‚ฌ์˜ ๊ธ€ 99Docto

    Face Alignment Assisted by Head Pose Estimation

    Full text link
    In this paper we propose a supervised initialization scheme for cascaded face alignment based on explicit head pose estimation. We first investigate the failure cases of most state of the art face alignment approaches and observe that these failures often share one common global property, i.e. the head pose variation is usually large. Inspired by this, we propose a deep convolutional network model for reliable and accurate head pose estimation. Instead of using a mean face shape, or randomly selected shapes for cascaded face alignment initialisation, we propose two schemes for generating initialisation: the first one relies on projecting a mean 3D face shape (represented by 3D facial landmarks) onto 2D image under the estimated head pose; the second one searches nearest neighbour shapes from the training set according to head pose distance. By doing so, the initialisation gets closer to the actual shape, which enhances the possibility of convergence and in turn improves the face alignment performance. We demonstrate the proposed method on the benchmark 300W dataset and show very competitive performance in both head pose estimation and face alignment.Comment: Accepted by BMVC201

    Robust Head-Pose Estimation Based on Partially-Latent Mixture of Linear Regressions

    Get PDF
    Head-pose estimation has many applications, such as social event analysis, human-robot and human-computer interaction, driving assistance, and so forth. Head-pose estimation is challenging because it must cope with changing illumination conditions, variabilities in face orientation and in appearance, partial occlusions of facial landmarks, as well as bounding-box-to-face alignment errors. We propose tu use a mixture of linear regressions with partially-latent output. This regression method learns to map high-dimensional feature vectors (extracted from bounding boxes of faces) onto the joint space of head-pose angles and bounding-box shifts, such that they are robustly predicted in the presence of unobservable phenomena. We describe in detail the mapping method that combines the merits of unsupervised manifold learning techniques and of mixtures of regressions. We validate our method with three publicly available datasets and we thoroughly benchmark four variants of the proposed algorithm with several state-of-the-art head-pose estimation methods.Comment: 12 pages, 5 figures, 3 table

    Learning to Find Eye Region Landmarks for Remote Gaze Estimation in Unconstrained Settings

    Full text link
    Conventional feature-based and model-based gaze estimation methods have proven to perform well in settings with controlled illumination and specialized cameras. In unconstrained real-world settings, however, such methods are surpassed by recent appearance-based methods due to difficulties in modeling factors such as illumination changes and other visual artifacts. We present a novel learning-based method for eye region landmark localization that enables conventional methods to be competitive to latest appearance-based methods. Despite having been trained exclusively on synthetic data, our method exceeds the state of the art for iris localization and eye shape registration on real-world imagery. We then use the detected landmarks as input to iterative model-fitting and lightweight learning-based gaze estimation methods. Our approach outperforms existing model-fitting and appearance-based methods in the context of person-independent and personalized gaze estimation

    A Differential Approach for Gaze Estimation

    Full text link
    Non-invasive gaze estimation methods usually regress gaze directions directly from a single face or eye image. However, due to important variabilities in eye shapes and inner eye structures amongst individuals, universal models obtain limited accuracies and their output usually exhibit high variance as well as biases which are subject dependent. Therefore, increasing accuracy is usually done through calibration, allowing gaze predictions for a subject to be mapped to his/her actual gaze. In this paper, we introduce a novel image differential method for gaze estimation. We propose to directly train a differential convolutional neural network to predict the gaze differences between two eye input images of the same subject. Then, given a set of subject specific calibration images, we can use the inferred differences to predict the gaze direction of a novel eye sample. The assumption is that by allowing the comparison between two eye images, annoyance factors (alignment, eyelid closing, illumination perturbations) which usually plague single image prediction methods can be much reduced, allowing better prediction altogether. Experiments on 3 public datasets validate our approach which constantly outperforms state-of-the-art methods even when using only one calibration sample or when the latter methods are followed by subject specific gaze adaptation.Comment: Extension to our paper A differential approach for gaze estimation with calibration (BMVC 2018) Submitted to PAMI on Aug. 7th, 2018 Accepted by PAMI short on Dec. 2019, in IEEE Transactions on Pattern Analysis and Machine Intelligenc
    • โ€ฆ
    corecore