1,157 research outputs found

    Accurate pupil center detection in off-the-shelf eye tracking systems using convolutional neural networks

    Get PDF
    Remote eye tracking technology has suffered an increasing growth in recent years due to its applicability in many research areas. In this paper, a video-oculography method based on convolutional neural networks (CNNs) for pupil center detection over webcam images is proposed. As the first contribution of this work and in order to train the model, a pupil center manual labeling procedure of a facial landmark dataset has been performed. The model has been tested over both real and synthetic databases and outperforms state-of-the-art methods, achieving pupil center estimation errors below the size of a constricted pupil in more than 95% of the images, while reducing computing time by a 8 factor. Results show the importance of use high quality training data and well-known architectures to achieve an outstanding performance.This research was funded by Public University of Navarra (Pre-doctoral research grant) and by the Spanish Ministry of Science and Innovation under Contract 'Challenges of Eye Tracking Off-the-Shelf (ChETOS)' with reference: PID2020-118014RB-I0

    Unobtrusive and pervasive video-based eye-gaze tracking

    Get PDF
    Eye-gaze tracking has long been considered a desktop technology that finds its use inside the traditional office setting, where the operating conditions may be controlled. Nonetheless, recent advancements in mobile technology and a growing interest in capturing natural human behaviour have motivated an emerging interest in tracking eye movements within unconstrained real-life conditions, referred to as pervasive eye-gaze tracking. This critical review focuses on emerging passive and unobtrusive video-based eye-gaze tracking methods in recent literature, with the aim to identify different research avenues that are being followed in response to the challenges of pervasive eye-gaze tracking. Different eye-gaze tracking approaches are discussed in order to bring out their strengths and weaknesses, and to identify any limitations, within the context of pervasive eye-gaze tracking, that have yet to be considered by the computer vision community.peer-reviewe

    Low Cost Eye Tracking: The Current Panorama

    Get PDF
    Despite the availability of accurate, commercial gaze tracker devices working with infrared (IR) technology, visible light gaze tracking constitutes an interesting alternative by allowing scalability and removing hardware requirements. Over the last years, this field has seen examples of research showing performance comparable to the IR alternatives. In this work, we survey the previous work on remote, visible light gaze trackers and analyze the explored techniques from various perspectives such as calibration strategies, head pose invariance, and gaze estimation techniques. We also provide information on related aspects of research such as public datasets to test against, open source projects to build upon, and gaze tracking services to directly use in applications. With all this information, we aim to provide the contemporary and future researchers with a map detailing previously explored ideas and the required tools

    Low Cost Eye Tracking : The Current Panorama

    Get PDF
    Altres ajuts: Consolider 2010 MIPRCV, Universitat Autonoma de Barcelona i Google Faculty AwardDespite the availability of accurate, commercial gaze tracker devices working with infrared (IR) technology, visible light gaze tracking constitutes an interesting alternative by allowing scalability and removing hardware requirements. Over the last years, this field has seen examples of research showing performance comparable to the IR alternatives. In this work, we survey the previous work on remote, visible light gaze trackers and analyze the explored techniques from various perspectives such as calibration strategies, head pose invariance, and gaze estimation techniques. We also provide information on related aspects of research such as public datasets to test against, open source projects to build upon, and gaze tracking services to directly use in applications. With all this information, we aim to provide the contemporary and future researchers with a map detailing previously explored ideas and the required tools

    Deep into the Eyes: Applying Machine Learning to improve Eye-Tracking

    Get PDF
    Eye-tracking has been an active research area with applications in personal and behav- ioral studies, medical diagnosis, virtual reality, and mixed reality applications. Improving the robustness, generalizability, accuracy, and precision of eye-trackers while maintaining privacy is crucial. Unfortunately, many existing low-cost portable commercial eye trackers suffer from signal artifacts and a low signal-to-noise ratio. These trackers are highly depen- dent on low-level features such as pupil edges or diffused bright spots in order to precisely localize the pupil and corneal reflection. As a result, they are not reliable for studying eye movements that require high precision, such as microsaccades, smooth pursuit, and ver- gence. Additionally, these methods suffer from reflective artifacts, occlusion of the pupil boundary by the eyelid and often require a manual update of person-dependent parame- ters to identify the pupil region. In this dissertation, I demonstrate (I) a new method to improve precision while maintaining the accuracy of head-fixed eye trackers by combin- ing velocity information from iris textures across frames with position information, (II) a generalized semantic segmentation framework for identifying eye regions with a further extension to identify ellipse fits on the pupil and iris, (III) a data-driven rendering pipeline to generate a temporally contiguous synthetic dataset for use in many eye-tracking ap- plications, and (IV) a novel strategy to preserve privacy in eye videos captured as part of the eye-tracking process. My work also provides the foundation for future research by addressing critical questions like the suitability of using synthetic datasets to improve eye-tracking performance in real-world applications, and ways to improve the precision of future commercial eye trackers with improved camera specifications

    Explainable Artificial Intelligence for Image Segmentation and for Estimation of Optical Aberrations

    Get PDF
    State-of-the-art machine learning methods such as convolutional neural networks (CNNs) are frequently employed in computer vision. Despite their high performance on unseen data, CNNs are often criticized for lacking transparency — that is, providing very limited if any information about the internal decision-making process. In some applications, especially in healthcare, such transparency of algorithms is crucial for end users, as trust in diagnosis and prognosis is important not only for the satisfaction and potential adherence of patients, but also for their health. Explainable artificial intelligence (XAI) aims to open up this “black box,” often perceived as a cryptic and inconceivable algorithm, to increase understanding of the machines’ reasoning.XAI is an emerging field, and techniques for making machine learning explainable are becoming increasingly available. XAI for computer vision mainly focuses on image classification, whereas interpretability in other tasks remains challenging. Here, I examine explainability in computer vision beyond image classification, namely in semantic segmentation and 3D multitarget image regression. This thesis consists of five chapters. In Chapter 1 (Introduction), the background of artificial intelligence (AI), XAI, computer vision, and optics is presented, and the definitions of the terminology for XAI are proposed. Chapter 2 is focused on explaining the predictions of U-Net, a CNN commonly used for semantic image segmentation, and variations of this architecture. To this end, I propose the gradient-weighted class activation mapping for segmentation (Seg-Grad-CAM) method based on the well-known Grad-CAM method for explainable image classification. In Chapter 3, I present the application of deep learning to estimation of optical aberrations in microscopy biodata by identifying the present Zernike aberration modes and their amplitudes. A CNN-based approach PhaseNet can accurately estimate monochromatic aberrations in images of point light sources. I extend this method to objects of complex shapes. In Chapter 4, an approach for explainable 3D multitarget image regression is reported. First, I visualize how the model differentiates the aberration modes using the local interpretable model-agnostic explanations (LIME) method adapted for 3D image classification. Then I “explain,” using LIME modified for multitarget 3D image regression (Image-Reg-LIME), the outputs of the regression model for estimation of the amplitudes. In Chapter 5, the results are discussed in a broader context. The contribution of this thesis is the development of explainability methods for semantic segmentation and 3D multitarget image regression of optical aberrations. The research opens the door for further enhancement of AI’s transparency.:Title Page i List of Figures xi List of Tables xv 1 Introduction 1 1.1 Essential Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.1.1 Artificial intelligence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.1.2 Explainable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.1.3 Proposed definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.2 Explainable Artificial Intelligence . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.2.1 Aims and applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.2.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.3 Computer Vision . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 1.3.1 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 1.3.2 Image classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 1.3.3 Image regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 1.3.4 Image segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 1.4 Optics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 1.4.1 Aberrations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 1.4.2 Zernike polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 1.5 Thesis Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 1.5.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 1.5.2 Dissertation outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 2 Explainable Image Segmentation 23 2.1 Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 2.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 2.3 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 2.3.1 CAM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 2.3.2 Grad-CAM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 2.3.3 U-Net . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 2.3.4 Seg-Grad-CAM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 2.4 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 2.4.1 Circles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 2.4.2 TextureMNIST . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 2.4.3 Cityscapes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 2.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 2.5.1 Circles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 2.5.2 TextureMNIST . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 2.5.3 Cityscapes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 2.6 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 2.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 3 Estimation of Aberrations 55 3.1 Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 3.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 3.3 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 3.3.1 PhaseNet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 3.3.2 PhaseNet data generator . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 3.3.3 Retrieval of noise parameters . . . . . . . . . . . . . . . . . . . . . . . . 62 3.3.4 Data generator with phantoms . . . . . . . . . . . . . . . . . . . . . . . 62 3.3.5 Restoration via deconvolution . . . . . . . . . . . . . . . . . . . . . . . . 63 3.3.6 Convolution with the “zero” synthetic PSF . . . . . . . . . . . . . . . . 63 3.4 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 3.4.1 Astrocytes (synthetic data) . . . . . . . . . . . . . . . . . . . . . . . . . 65 3.4.2 Fluorescent beads . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 3.4.3 Drosophila embryo (live sample) . . . . . . . . . . . . . . . . . . . . . . 67 3.4.4 Neurons (fixed sample) . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 3.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 3.5.1 Astrocytes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 3.5.2 Conclusions on the results for astrocytes . . . . . . . . . . . . . . . . . . 74 3.5.3 Fluorescent beads . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 3.5.4 Conclusions on the results for fluorescent beads . . . . . . . . . . . . . . 81 3.5.5 Drosophila embryo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 3.5.6 Conclusions on the results for Drosophila embryo . . . . . . . . . . . . . 87 3.5.7 Neurons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 3.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 4 Explainable Multitarget Image Regression 99 4.1 Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 4.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 4.3 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 4.3.1 LIME . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 4.3.2 Superpixel algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 4.3.3 LIME for 3D image classification . . . . . . . . . . . . . . . . . . . . . . 104 4.3.4 Image-Reg-LIME: LIME for 3D image regression . . . . . . . . . . . . . 107 4.4 Results: Classification of Aberrations . . . . . . . . . . . . . . . . . . . . . . . . 109 viii TABLE OF CONTENTS 4.4.1 Transforming the regression task into classification . . . . . . . . . . . . 110 4.4.2 Data augmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 4.4.3 Parameter search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112 4.4.4 Clustering of 3D images . . . . . . . . . . . . . . . . . . . . . . . . . . . 114 4.4.5 Explanations of classification . . . . . . . . . . . . . . . . . . . . . . . . 114 4.4.6 Conclusions on the results for classification . . . . . . . . . . . . . . . . 117 4.5 Results: Explainable Regression of Aberrations . . . . . . . . . . . . . . . . . . 118 4.5.1 Explanations with a reference value . . . . . . . . . . . . . . . . . . . . 121 4.5.2 Validation of explanations . . . . . . . . . . . . . . . . . . . . . . . . . . 122 4.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 5 Conclusions and Outlook 127 References 12

    Object Detection using Dimensionality Reduction on Image Descriptors

    Get PDF
    The aim of object detection is to recognize objects in a visual scene. Performing reliable object detection is becoming increasingly important in the fields of computer vision and robotics. Various applications of object detection include video surveillance, traffic monitoring, digital libraries, navigation, human computer interaction, etc. The challenges involved with detecting real world objects include the multitude of colors, textures, sizes, and cluttered or complex backgrounds making objects difficult to detect. This thesis contributes to the exploration of various dimensionality reduction techniques on descriptors for establishing an object detection system that achieves the best trade-offs between performance and speed. Histogram of Oriented Gradients (HOG) and other histogram-based descriptors were used as an input to a Support Vector Machine (SVM) classifier to achieve good classification performance. Binary descriptors were considered as a computationally efficient alternative to HOG. It was determined that single local binary descriptors in combination with Support Vector Machine (SVM) classifier don\u27t work as well as histograms of features for object detection. Thus, histogram of binary descriptors features were explored as a viable alternative and the results were found to be comparable to those of the popular Histogram of Oriented Gradients descriptor. Histogram-based descriptors can be high dimensional and working with large amounts of data can be computationally expensive and slow. Thus, various dimensionality reduction techniques were considered, such as principal component analysis (PCA), which is the most widely used technique, random projections, which is data independent and fast to compute, unsupervised locality preserving projections (LPP), and supervised locality preserving projections (SLPP), which incorporate non-linear reduction techniques. The classification system was tested on eye detection as well as different object classes. The eye database was created using BioID and FERET databases. Additionally, the CalTech-101 data set, which has 101 object categories, was used to evaluate the system. The results showed that the reduced-dimensionality descriptors based on SLPP gave improved classification performance with fewer computations

    Inside-Outカメラを用いた畳み込みニューラネットワークに基づく注視点推定

    Get PDF
    The vision-based gaze estimation system (GES) involves multiple cameras, and such system can estimate gaze direction and what a user is looking at. The inside-out camera is the device to capture user eye and user vision. This system is widely used in many applications because the eye images with the pupil or cornea have much information. These applications have the capability to improve the quality of life of everyone especially a person with a disability. However, an end-user is difficult to access the ability of commercial GES device because of the high price and difficult to use. The budget GES device can be created with a general camera. The common method to estimate the gaze point from the vision-based GES is detected the pupil center position. However, the human eye has variable characteristics and the blinking makes reliable pupil detection is a challenging problem. The state-of-the-art method for the pupil detection is not designed for the wearable camera, the designed for the desktop/TV panels. A small error from the pupil detection can make a large error on gaze point estimation. This thesis presents the novel robust and accurate GES framework by using the learning-based method. The main contributions of this thesis can be divided into two main groups. The first main contribution is to enhance the pupil center detection by creating an effective pupil center detection framework. The second contribution of this thesis is to create the calibration-free GES. The first contribution is to enhance the accuracy of the pupil detection process. Handcraft and learning-based method are used to estimate the pupil center position. We design the handcraft method that using the gradient value and RANSAC ellipse fitting. The pupil center position was estimated by the proposed method and com-pared with the separability filter. The result shows the proposed method has a good performance in term of accuracy and computation time. However, when the user closes the eye, no eye present in the image, or a large unexpected object in the image, the accuracy will be decreased significantly. It is difficult for handcraft method to achieve good accuracy. The learning-based method has the potential to solve the general problem that becomes the focus of this thesis. This thesis presents the convolutional neural network (CNN) model to estimate the pupil position in the various situations. Moreover, this model can recognize the eye states such as open, middle, or closed eyes. The second contribution is to create the calibration-free GES. The calibration process is the process to create the coordinate transfer (CT) function. The CT function uses for transfer the pupil position to the gaze point on-scene image. When the wearable camera moves during the use case, the static CT function cannot estimate the gaze point accurately. The learning-based method has a potential to create a robust and adaptive CT function. The accurate calibration-free system can raise the accuracy of the GES. Furthermore, it makes the GES easy easier to use. We designed the CNN framework that has the ability to estimate the gaze position in the various situations. This thesis also presents the process to create the reliable dataset for GES. The result shows that proposed calibration-free GES can estimation the gaze point when glasses are moved.九州工業大学博士学位論文 学位記番号:情工博甲第338号 学位授与年月日:平成31年3月25日1 Introduction|2 Pupil Detection using handcraft method|3 Convolutional neural network| 4 Pupil detection using CNN method|5 Calibration free approach for GES|6 Character input system|7 Conclusion九州工業大学平成30年
    corecore