15 research outputs found

    A Coarse-to-Fine Adaptive Network for Appearance-Based Gaze Estimation

    Full text link
    Human gaze is essential for various appealing applications. Aiming at more accurate gaze estimation, a series of recent works propose to utilize face and eye images simultaneously. Nevertheless, face and eye images only serve as independent or parallel feature sources in those works, the intrinsic correlation between their features is overlooked. In this paper we make the following contributions: 1) We propose a coarse-to-fine strategy which estimates a basic gaze direction from face image and refines it with corresponding residual predicted from eye images. 2) Guided by the proposed strategy, we design a framework which introduces a bi-gram model to bridge gaze residual and basic gaze direction, and an attention component to adaptively acquire suitable fine-grained feature. 3) Integrating the above innovations, we construct a coarse-to-fine adaptive network named CA-Net and achieve state-of-the-art performances on MPIIGaze and EyeDiap.Comment: 9 pages, 7figures, AAAI-2

    Federated learning in gaze recognition (FLIGR)

    Get PDF
    The efficiency and generalizability of a deep learning model is based on the amount and diversity of training data. Although huge amounts of data are being collected, these data are not stored in centralized servers for further data processing. It is often infeasible to collect and share data in centralized servers due to various medical data regulations. This need for diversely distributed data and infeasible storage solutions calls for Federated Learning (FL). FL is a clever way of utilizing privately stored data in model building without the need for data sharing. The idea is to train several different models locally with same architecture, share the model weights between the collaborators, aggregate the model weights and use the resulting global weights in furthering model building. FL is an iterative algorithm which repeats the above steps over defined number of rounds. By doing so, we negate the need for centralized data sharing and avoid several regulations tied to it. In this work, federated learning is applied to gaze recognition, a task to identify where the doctor’s gaze at. A global model is built by repeatedly aggregating local models built from 8 local institutional data using the FL algorithm for 4 federated rounds. The results show increase in the performance of the global model over federated rounds. The study also shows that the global model can be trained one more time locally at the end of FL on each institutional level to fine-tune the model to local data

    DynamicRead: Exploring Robust Gaze Interaction Methods for Reading on Handheld Mobile Devices under Dynamic Conditions

    Get PDF
    Enabling gaze interaction in real-time on handheld mobile devices has attracted significant attention in recent years. An increasing number of research projects have focused on sophisticated appearance-based deep learning models to enhance the precision of gaze estimation on smartphones. This inspires important research questions, including how the gaze can be used in a real-time application, and what type of gaze interaction methods are preferable under dynamic conditions in terms of both user acceptance and delivering reliable performance. To address these questions, we design four types of gaze scrolling techniques: three explicit technique based on Gaze Gesture, Dwell time, and Pursuit; and one implicit technique based on reading speed to support touch-free, page-scrolling on a reading application. We conduct a 20-participant user study under both sitting and walking settings and our results reveal that Gaze Gesture and Dwell time-based interfaces are more robust while walking and Gaze Gesture has achieved consistently good scores on usability while not causing high cognitive workload.Comment: Accepted by ETRA 2023 as Full paper, and as journal paper in Proceedings of the ACM on Human-Computer Interactio

    Farmland Segmentation in Landsat 8 Satellite Images Using Deep Learning and Conditional Generative Adversarial Networks

    Get PDF
    Leveraging mid-resolution satellite images such as Landsat 8 for accurate farmland segmentation and land change monitoring is crucial for agricultural management, yet is hindered by the scarcity of labelled data for the training of supervised deep learning pipelines. The particular focus of this study is on addressing the scarcity of labelled images. This paper introduces several contributions, including a systematic satellite image data augmentation approach that aims to maintain data population consistency during model training, thus mitigating performance degradation. To alleviate the labour-intensive task of pixel-wise image labelling, we present a novel application of a modified conditional generative adversarial network (CGAN) to generate artificial satellite images and corresponding farm labels. Additionally, we scrutinize the role of spectral bands in satellite image segmentation and compare two prominent semantic segmentation models, U-Net and DeepLabV3+, with diverse backbone structures. Our empirical findings demonstrate that augmenting the dataset with up to 22.85% artificial samples significantly enhances the model performance. Notably, the U-Net model, employing standard convolution, outperforms the DeepLabV3+ models with atrous convolution, achieving a segmentation accuracy of 86.92% on the test data

    I Spy With My Little Eyes: A Convolutional Deep Learning Approach to Web Eye Tracking

    Get PDF
    Eye-tracking is the study of eye movements, blinks, fixations, and is aiming to give insight into visual attention mechanisms. Being a common in marketing, usability research, as well as in cognitive science, there are well stablished methods for lab eye tracking, yet web eye tracking uses webcams of much lower quality. Web eye tracking can provide valuable information about users’ engagement with digital content from the comfort of their own home. This gives designers, developers, and researchers the chance to inform their decisions from data and optimize e.g. user experience while connecting to large and demographically diverse samples without the necessity for lab-level equipment. For web eye tracking, only limited tools exist that are accompanied with uncertainties which need to be addressed before using these tools for scientific research. Improving the quality of data collected via such channels is also part of this goal. The project aims to develop a reliable deep learning solution such as a convolutional neural network capable of predicting gaze x/y screen coordinates from the webcam video of users. The predictions of the proposed methods are compared to baselines models that use webcam data and to predictions made by the lab eye tracker.Eye-tracking is the study of eye movements, blinks, fixations, and is aiming to give insight into visual attention mechanisms. Being a common in marketing, usability research, as well as in cognitive science, there are well stablished methods for lab eye tracking, yet web eye tracking uses webcams of much lower quality. Web eye tracking can provide valuable information about users’ engagement with digital content from the comfort of their own home. This gives designers, developers, and researchers the chance to inform their decisions from data and optimize e.g. user experience while connecting to large and demographically diverse samples without the necessity for lab-level equipment. For web eye tracking, only limited tools exist that are accompanied with uncertainties which need to be addressed before using these tools for scientific research. Improving the quality of data collected via such channels is also part of this goal. The project aims to develop a reliable deep learning solution such as a convolutional neural network capable of predicting gaze x/y screen coordinates from the webcam video of users. The predictions of the proposed methods are compared to baselines models that use webcam data and to predictions made by the lab eye tracker

    Reaaliaikainen katseen suunnan estimointi sulautetussa järjestelmässä

    Get PDF
    Tiivistelmä. Viime vuosina syväoppiminen on mullistanut useita tietokonenäkötehtäviä, kuten ulkonäköön perustuvaa katseen suunnan estimointia. Katseen suunnan estimointitutkimus on viime vuosina keskittynyt enenevissä määrin ulkonäköön perustuviin menetelmiin, sillä ne tuovat useita etuja verrattuna perinteisempiin menetelmiin. Ulkonäköön perustuvat menetelmät eivät vaadi erityisiä laitteita, ja ne ovat huomattavasti vakaampia ympäristön muutoksille. Syväoppimisen avulla ulkonäköön perustuvat menetelmät ovat toimineet hämmästyttävän tarkasti myös haastavissa olosuhteissa. Tutkimuksissa käytettävien menetelmien ja sulautettuihin järjestelmiin sovellettavissa olevien menetelmien välillä on kuitenkin suuri kuilu. Suurin osa nykyisistä konenäön tutkimuksista tehdään tehokkailla tietokoneilla. Parhaimpia tuloksia saavuttaneet, tietokoneilla kehitetyt, menetelmät eivät kuitenkaan usein ole käyttökelpoisia sulautetuilla järjestelmillä. Sulautetuilla järjestelmillä on usein hyvin rajallinen muisti, alhainen laskentateho ja ne toimivat usein akkujen avulla verkkovirran sijaan. Lisäksi sulautetut järjestelmät eivät välttämättä tue toimintoja tai menetelmiä, joilla on saavutettu parhaimpia tuloksia. Laskentateho on hyvin rajallista sulautetuissa järjestelmissä, joten laitevalmistajien on usein hyödynnettävä optimointimenetelmiä suorituskyvyn parantamiseksi. Tämän takia sulautetut järjestelmät eivät välttämättä tue kaikkia tietokoneilla toimivia menetelmiä. Tässä opinnäytetyössä testattiin useiden viimeaikaisten katseen suunnan estimointimenetelmien suorituskykyä ja mahdollista käyttöä sulautetuilla järjestelmillä. Lisäksi katseen suunnan estimointitutkimuksen ja sulautettujen järjestelmien välisen kuilun pienentämiseksi, tässä työssä kehitettiin konvoluutioneuroverkko käyttämällä vain neuroverkkojen yksinkertaisia ja hyvin tunnettuja rakennuspalikoita. Tulosten perusteella työssä kehitetty menetelmä on useiden vertailtujen menetelmien joukosta ainoa, joka toimii sulautetuilla järjestelmillä kilpailukykyisellä tarkkuudella. Lisäksi kehitetty menetelmä osoitti kyvyn toimia reaaliajassa muistirajoitteisessa, vähäisen laskentatehon omaavassa sulautetussa laitteessa.Real-time gaze estimation on embedded system. Abstract. In recent years, deep learning has revolutionized several computer vision tasks, including appearance-based gaze estimation. Over the past few years, the emphasis of gaze estimation research has shifted more and more to appearance-based methods, as they bring several advantages compared to more traditional methods. Appearance-based methods do not require dedicated devices, and they are significantly more robust to environmental changes. Thanks to deep learning, appearance-based methods have achieved astonishing accuracy in gaze estimation, even in challenging environments. However, there is a large gap between methods used in research and the methods that are applicable to embedded systems. Most research is carried out on powerful computers, whereas embedded systems have several limitations. Embedded systems often have very limited memory, low computing power and rely on batteries. In addition, embedded systems may not support functions or methods that have been utilized in studies leading to the best accuracies. Computing power is limited in embedded systems, so manufacturers must implement optimization methods to improve performance. Thus, embedded systems may not be able to perform all the functions or methods supported by desktop computers. In this thesis, the performance of several recent gaze estimation methods and whether they can be used with embedded systems was tested. Furthermore, a convolutional neural network using only the simple and well-known building blocks of convolutional neural networks was developed to reduce the gap between gaze estimation research and embedded systems. Based on the results, the method developed in the thesis was the only one among several compared methods that works with embedded systems while achieving competitive accuracy. In addition, the developed method demonstrated the ability to run in real-time in a memory-constrained, low-computing embedded device
    corecore