585 research outputs found

    Stereo facial image matching to aid in Fetal Alcohol Syndrome screening

    Get PDF
    Includes abstract. Includes bibliographical references

    Face pose estimation with automatic 3D model creation for a driver inattention monitoring application

    Get PDF
    Texto en inglés y resumen en inglés y españolRecent studies have identified inattention (including distraction and drowsiness) as the main cause of accidents, being responsible of at least 25% of them. Driving distraction has been less studied, since it is more diverse and exhibits a higher risk factor than fatigue. In addition, it is present over half of the inattention involved crashes. The increased presence of In Vehicle Information Systems (IVIS) adds to the potential distraction risk and modifies driving behaviour, and thus research on this issue is of vital importance. Many researchers have been working on different approaches to deal with distraction during driving. Among them, Computer Vision is one of the most common, because it allows for a cost effective and non-invasive driver monitoring and sensing. Using Computer Vision techniques it is possible to evaluate some facial movements that characterise the state of attention of a driver. This thesis presents methods to estimate the face pose and gaze direction of a person in real-time, using a stereo camera as a basic for assessing driver distractions. The methods are completely automatic and user-independent. A set of features in the face are identified at initialisation, and used to create a sparse 3D model of the face. These features are tracked from frame to frame, and the model is augmented to cover parts of the face that may have been occluded before. The algorithm is designed to work in a naturalistic driving simulator, which presents challenging low light conditions. We evaluate several techniques to detect features on the face that can be matched between cameras and tracked with success. Well-known methods such as SURF do not return good results, due to the lack of salient points in the face, as well as the low illumination of the images. We introduce a novel multisize technique, based on Harris corner detector and patch correlation. This technique benefits from the better performance of small patches under rotations and illumination changes, and the more robust correlation of the bigger patches under motion blur. The head rotates in a range of ±90º in the yaw angle, and the appearance of the features change noticeably. To deal with these changes, we implement a new re-registering technique that captures new textures of the features as the face rotates. These new textures are incorporated to the model, which mixes the views of both cameras. The captures are taken at regular angle intervals for rotations in yaw, so that each texture is only used in a range of ±7.5º around the capture angle. Rotations in pitch and roll are handled using affine patch warping. The 3D model created at initialisation can only take features in the frontal part of the face, and some of these may occlude during rotations. The accuracy and robustness of the face tracking depends on the number of visible points, so new points are added to the 3D model when new parts of the face are visible from both cameras. Bundle adjustment is used to reduce the accumulated drift of the 3D reconstruction. We estimate the pose from the position of the features in the images and the 3D model using POSIT or Levenberg-Marquardt. A RANSAC process detects incorrectly tracked points, which are not considered for pose estimation. POSIT is faster, while LM obtains more accurate results. Using the model extension and the re-registering technique, we can accurately estimate the pose in the full head rotation range, with error levels that improve the state of the art. A coarse eye direction is composed with the face pose estimation to obtain the gaze and driver's fixation area, parameter which gives much information about the distraction pattern of the driver. The resulting gaze estimation algorithm proposed in this thesis has been tested on a set of driving experiments directed by a team of psychologists in a naturalistic driving simulator. This simulator mimics conditions present in real driving, including weather changes, manoeuvring and distractions due to IVIS. Professional drivers participated in the tests. The driver?s fixation statistics obtained with the proposed system show how the utilisation of IVIS influences the distraction pattern of the drivers, increasing reaction times and affecting the fixation of attention on the road and the surroundings

    Face pose estimation with automatic 3D model creation for a driver inattention monitoring application

    Get PDF
    Texto en inglés y resumen en inglés y españolRecent studies have identified inattention (including distraction and drowsiness) as the main cause of accidents, being responsible of at least 25% of them. Driving distraction has been less studied, since it is more diverse and exhibits a higher risk factor than fatigue. In addition, it is present over half of the inattention involved crashes. The increased presence of In Vehicle Information Systems (IVIS) adds to the potential distraction risk and modifies driving behaviour, and thus research on this issue is of vital importance. Many researchers have been working on different approaches to deal with distraction during driving. Among them, Computer Vision is one of the most common, because it allows for a cost effective and non-invasive driver monitoring and sensing. Using Computer Vision techniques it is possible to evaluate some facial movements that characterise the state of attention of a driver. This thesis presents methods to estimate the face pose and gaze direction of a person in real-time, using a stereo camera as a basic for assessing driver distractions. The methods are completely automatic and user-independent. A set of features in the face are identified at initialisation, and used to create a sparse 3D model of the face. These features are tracked from frame to frame, and the model is augmented to cover parts of the face that may have been occluded before. The algorithm is designed to work in a naturalistic driving simulator, which presents challenging low light conditions. We evaluate several techniques to detect features on the face that can be matched between cameras and tracked with success. Well-known methods such as SURF do not return good results, due to the lack of salient points in the face, as well as the low illumination of the images. We introduce a novel multisize technique, based on Harris corner detector and patch correlation. This technique benefits from the better performance of small patches under rotations and illumination changes, and the more robust correlation of the bigger patches under motion blur. The head rotates in a range of ±90º in the yaw angle, and the appearance of the features change noticeably. To deal with these changes, we implement a new re-registering technique that captures new textures of the features as the face rotates. These new textures are incorporated to the model, which mixes the views of both cameras. The captures are taken at regular angle intervals for rotations in yaw, so that each texture is only used in a range of ±7.5º around the capture angle. Rotations in pitch and roll are handled using affine patch warping. The 3D model created at initialisation can only take features in the frontal part of the face, and some of these may occlude during rotations. The accuracy and robustness of the face tracking depends on the number of visible points, so new points are added to the 3D model when new parts of the face are visible from both cameras. Bundle adjustment is used to reduce the accumulated drift of the 3D reconstruction. We estimate the pose from the position of the features in the images and the 3D model using POSIT or Levenberg-Marquardt. A RANSAC process detects incorrectly tracked points, which are not considered for pose estimation. POSIT is faster, while LM obtains more accurate results. Using the model extension and the re-registering technique, we can accurately estimate the pose in the full head rotation range, with error levels that improve the state of the art. A coarse eye direction is composed with the face pose estimation to obtain the gaze and driver's fixation area, parameter which gives much information about the distraction pattern of the driver. The resulting gaze estimation algorithm proposed in this thesis has been tested on a set of driving experiments directed by a team of psychologists in a naturalistic driving simulator. This simulator mimics conditions present in real driving, including weather changes, manoeuvring and distractions due to IVIS. Professional drivers participated in the tests. The driver?s fixation statistics obtained with the proposed system show how the utilisation of IVIS influences the distraction pattern of the drivers, increasing reaction times and affecting the fixation of attention on the road and the surroundings

    Illumination tolerance in facial recognition

    Get PDF
    In this research work, five different preprocessing techniques were experimented with two different classifiers to find the best match for preprocessor + classifier combination to built an illumination tolerant face recognition system. Hence, a face recognition system is proposed based on illumination normalization techniques and linear subspace model using two distance metrics on three challenging, yet interesting databases. The databases are CAS PEAL database, the Extended Yale B database, and the AT&T database. The research takes the form of experimentation and analysis in which five illumination normalization techniques were compared and analyzed using two different distance metrics. The performances and execution times of the various techniques were recorded and measured for accuracy and efficiency. The illumination normalization techniques were Gamma Intensity Correction (GIC), discrete Cosine Transform (DCT), Histogram Remapping using Normal distribution (HRN), Histogram Remapping using Log-normal distribution (HRL), and Anisotropic Smoothing technique (AS). The linear subspace models utilized were principal component analysis (PCA) and Linear Discriminant Analysis (LDA). The two distance metrics were Euclidean and Cosine distance. The result showed that for databases with both illumination (shadows), and lighting (over-exposure) variations like the CAS PEAL database the Histogram remapping technique with normal distribution produced excellent result when the cosine distance is used as the classifier. The result indicated 65% recognition rate in 15.8 ms/img. Alternatively for databases consisting of pure illumination variation, like the extended Yale B database, the Gamma Intensity Correction (GIC) merged with the Euclidean distance metric gave the most accurate result with 95.4% recognition accuracy in 1ms/img. It was further gathered from the set of experiments that the cosine distance produces more accurate result compared to the Euclidean distance metric. However the Euclidean distance is faster than the cosine distance in all the experiments conducted

    Visual Perception For Robotic Spatial Understanding

    Get PDF
    Humans understand the world through vision without much effort. We perceive the structure, objects, and people in the environment and pay little direct attention to most of it, until it becomes useful. Intelligent systems, especially mobile robots, have no such biologically engineered vision mechanism to take for granted. In contrast, we must devise algorithmic methods of taking raw sensor data and converting it to something useful very quickly. Vision is such a necessary part of building a robot or any intelligent system that is meant to interact with the world that it is somewhat surprising we don\u27t have off-the-shelf libraries for this capability. Why is this? The simple answer is that the problem is extremely difficult. There has been progress, but the current state of the art is impressive and depressing at the same time. We now have neural networks that can recognize many objects in 2D images, in some cases performing better than a human. Some algorithms can also provide bounding boxes or pixel-level masks to localize the object. We have visual odometry and mapping algorithms that can build reasonably detailed maps over long distances with the right hardware and conditions. On the other hand, we have robots with many sensors and no efficient way to compute their relative extrinsic poses for integrating the data in a single frame. The same networks that produce good object segmentations and labels in a controlled benchmark still miss obvious objects in the real world and have no mechanism for learning on the fly while the robot is exploring. Finally, while we can detect pose for very specific objects, we don\u27t yet have a mechanism that detects pose that generalizes well over categories or that can describe new objects efficiently. We contribute algorithms in four of the areas mentioned above. First, we describe a practical and effective system for calibrating many sensors on a robot with up to 3 different modalities. Second, we present our approach to visual odometry and mapping that exploits the unique capabilities of RGB-D sensors to efficiently build detailed representations of an environment. Third, we describe a 3-D over-segmentation technique that utilizes the models and ego-motion output in the previous step to generate temporally consistent segmentations with camera motion. Finally, we develop a synthesized dataset of chair objects with part labels and investigate the influence of parts on RGB-D based object pose recognition using a novel network architecture we call PartNet

    인간 기계 상호작용을 위한 강건하고 정확한 손동작 추적 기술 연구

    Get PDF
    학위논문(박사) -- 서울대학교대학원 : 공과대학 기계항공공학부, 2021.8. 이동준.Hand-based interface is promising for realizing intuitive, natural and accurate human machine interaction (HMI), as the human hand is main source of dexterity in our daily activities. For this, the thesis begins with the human perception study on the detection threshold of visuo-proprioceptive conflict (i.e., allowable tracking error) with or without cutantoues haptic feedback, and suggests tracking error specification for realistic and fluidic hand-based HMI. The thesis then proceeds to propose a novel wearable hand tracking module, which, to be compatible with the cutaneous haptic devices spewing magnetic noise, opportunistically employ heterogeneous sensors (IMU/compass module and soft sensor) reflecting the anatomical properties of human hand, which is suitable for specific application (i.e., finger-based interaction with finger-tip haptic devices). This hand tracking module however loses its tracking when interacting with, or being nearby, electrical machines or ferromagnetic materials. For this, the thesis presents its main contribution, a novel visual-inertial skeleton tracking (VIST) framework, that can provide accurate and robust hand (and finger) motion tracking even for many challenging real-world scenarios and environments, for which the state-of-the-art technologies are known to fail due to their respective fundamental limitations (e.g., severe occlusions for tracking purely with vision sensors; electromagnetic interference for tracking purely with IMUs (inertial measurement units) and compasses; and mechanical contacts for tracking purely with soft sensors). The proposed VIST framework comprises a sensor glove with multiple IMUs and passive visual markers as well as a head-mounted stereo camera; and a tightly-coupled filtering-based visual-inertial fusion algorithm to estimate the hand/finger motion and auto-calibrate hand/glove-related kinematic parameters simultaneously while taking into account the hand anatomical constraints. The VIST framework exhibits good tracking accuracy and robustness, affordable material cost, light hardware and software weights, and ruggedness/durability even to permit washing. Quantitative and qualitative experiments are also performed to validate the advantages and properties of our VIST framework, thereby, clearly demonstrating its potential for real-world applications.손 동작을 기반으로 한 인터페이스는 인간-기계 상호작용 분야에서 직관성, 몰입감, 정교함을 제공해줄 수 있어 많은 주목을 받고 있고, 이를 위해 가장 필수적인 기술 중 하나가 손 동작의 강건하고 정확한 추적 기술 이다. 이를 위해 본 학위논문에서는 먼저 사람 인지의 관점에서 손 동작 추적 오차의 인지 범위를 규명한다. 이 오차 인지 범위는 새로운 손 동작 추적 기술 개발 시 중요한 설계 기준이 될 수 있어 이를 피험자 실험을 통해 정량적으로 밝히고, 특히 손끝 촉각 장비가 있을때 이 인지 범위의 변화도 밝힌다. 이를 토대로, 촉각 피드백을 주는 것이 다양한 인간-기계 상호작용 분야에서 널리 연구되어 왔으므로, 먼저 손끝 촉각 장비와 함께 사용할 수 있는 손 동작 추적 모듈을 개발한다. 이 손끝 촉각 장비는 자기장 외란을 일으켜 착용형 기술에서 흔히 사용되는 지자기 센서를 교란하는데, 이를 적절한 사람 손의 해부학적 특성과 관성 센서/지자기 센서/소프트 센서의 적절한 활용을 통해 해결한다. 이를 확장하여 본 논문에서는, 촉각 장비 착용 시 뿐 아니라 모든 장비 착용 / 환경 / 물체와의 상호작용 시에도 사용 가능한 새로운 손 동작 추적 기술을 제안한다. 기존의 손 동작 추적 기술들은 가림 현상 (영상 기반 기술), 지자기 외란 (관성/지자기 센서 기반 기술), 물체와의 접촉 (소프트 센서 기반 기술) 등으로 인해 제한된 환경에서 밖에 사용하지 못한다. 이를 위해 많은 문제를 일으키는 지자기 센서 없이 상보적인 특성을 지니는 관성 센서와 영상 센서를 융합하고, 이때 작은 공간에 다 자유도의 움직임을 갖는 손 동작을 추적하기 위해 다수의 구분되지 않는 마커들을 사용한다. 이 마커의 구분 과정 (correspondence search)를 위해 기존의 약결합 (loosely-coupled) 기반이 아닌 강결합 (tightly-coupled 기반 센서 융합 기술을 제안하고, 이를 통해 지자기 센서 없이 정확한 손 동작이 가능할 뿐 아니라 착용형 센서들의 정확성/편의성에 문제를 일으키던 센서 부착 오차 / 사용자의 손 모양 등을 자동으로 정확히 보정한다. 이 제안된 영상-관성 센서 융합 기술 (Visual-Inertial Skeleton Tracking (VIST)) 의 뛰어난 성능과 강건성이 다양한 정량/정성 실험을 통해 검증되었고, 이는 VIST의 다양한 일상환경에서 기존 시스템이 구현하지 못하던 손 동작 추적을 가능케 함으로써, 많은 인간-기계 상호작용 분야에서의 가능성을 보여준다.1 Introduction 1 1.1. Motivation 1 1.2. Related Work 5 1.3. Contribution 12 2 Detection Threshold of Hand Tracking Error 16 2.1. Motivation 16 2.2. Experimental Environment 20 2.2.1. Hardware Setup 21 2.2.2. Virtual Environment Rendering 23 2.2.3. HMD Calibration 23 2.3. Identifying the Detection Threshold of Tracking Error 26 2.3.1. Experimental Setup 27 2.3.2. Procedure 27 2.3.3. Experimental Result 31 2.4. Enlarging the Detection Threshold of Tracking Error by Haptic Feedback 31 2.4.1. Experimental Setup 31 2.4.2. Procedure 32 2.4.3. Experimental Result 34 2.5. Discussion 34 3 Wearable Finger Tracking Module for Haptic Interaction 38 3.1. Motivation 38 3.2. Development of Finger Tracking Module 42 3.2.1. Hardware Setup 42 3.2.2. Tracking algorithm 45 3.2.3. Calibration method 48 3.3. Evaluation for VR Haptic Interaction Task 50 3.3.1. Quantitative evaluation of FTM 50 3.3.2. Implementation of Wearable Cutaneous Haptic Interface 51 3.3.3. Usability evaluation for VR peg-in-hole task 53 3.4. Discussion 57 4 Visual-Inertial Skeleton Tracking for Human Hand 59 4.1. Motivation 59 4.2. Hardware Setup and Hand Models 62 4.2.1. Human Hand Model 62 4.2.2. Wearable Sensor Glove 62 4.2.3. Stereo Camera 66 4.3. Visual Information Extraction 66 4.3.1. Marker Detection in Raw Images 68 4.3.2. Cost Function for Point Matching 68 4.3.3. Left-Right Stereo Matching 69 4.4. IMU-Aided Correspondence Search 72 4.5. Filtering-based Visual-Inertial Sensor Fusion 76 4.5.1. EKF States for Hand Tracking and Auto-Calibration 78 4.5.2. Prediction with IMU Information 79 4.5.3. Correction with Visual Information 82 4.5.4. Correction with Anatomical Constraints 84 4.6. Quantitative Evaluation for Free Hand Motion 87 4.6.1. Experimental Setup 87 4.6.2. Procedure 88 4.6.3. Experimental Result 90 4.7. Quantitative and Comparative Evaluation for Challenging Hand Motion 95 4.7.1. Experimental Setup 95 4.7.2. Procedure 96 4.7.3. Experimental Result 98 4.7.4. Performance Comparison with Existing Methods for Challenging Hand Motion 101 4.8. Qualitative Evaluation for Real-World Scenarios 105 4.8.1. Visually Complex Background 105 4.8.2. Object Interaction 106 4.8.3. Wearing Fingertip Cutaneous Haptic Devices 109 4.8.4. Outdoor Environment 111 4.9. Discussion 112 5 Conclusion 116 References 124 Abstract (in Korean) 139 Acknowledgment 141박

    Variable Resolution & Dimensional Mapping For 3d Model Optimization

    Get PDF
    Three-dimensional computer models, especially geospatial architectural data sets, can be visualized in the same way humans experience the world, providing a realistic, interactive experience. Scene familiarization, architectural analysis, scientific visualization, and many other applications would benefit from finely detailed, high resolution, 3D models. Automated methods to construct these 3D models traditionally has produced data sets that are often low fidelity or inaccurate; otherwise, they are initially highly detailed, but are very labor and time intensive to construct. Such data sets are often not practical for common real-time usage and are not easily updated. This thesis proposes Variable Resolution & Dimensional Mapping (VRDM), a methodology that has been developed to address some of the limitations of existing approaches to model construction from images. Key components of VRDM are texture palettes, which enable variable and ultra-high resolution images to be easily composited; texture features, which allow image features to integrated as image or geometry, and have the ability to modify the geometric model structure to add detail. These components support a primary VRDM objective of facilitating model refinement with additional data. This can be done until the desired fidelity is achieved as practical limits of infinite detail are approached. Texture Levels, the third component, enable real-time interaction with a very detailed model, along with the flexibility of having alternate pixel data for a given area of the model and this is achieved through extra dimensions. Together these techniques have been used to construct models that can contain GBs of imagery data

    Information Extraction and Modeling from Remote Sensing Images: Application to the Enhancement of Digital Elevation Models

    Get PDF
    To deal with high complexity data such as remote sensing images presenting metric resolution over large areas, an innovative, fast and robust image processing system is presented. The modeling of increasing level of information is used to extract, represent and link image features to semantic content. The potential of the proposed techniques is demonstrated with an application to enhance and regularize digital elevation models based on information collected from RS images

    On the popularization of digital close-range photogrammetry: a handbook for new users.

    Get PDF
    Εθνικό Μετσόβιο Πολυτεχνείο--Μεταπτυχιακή Εργασία. Διεπιστημονικό-Διατμηματικό Πρόγραμμα Μεταπτυχιακών Σπουδών (Δ.Π.Μ.Σ.) “Γεωπληροφορική
    corecore