7 research outputs found
Appearance-Based Gaze Estimation in the Wild
Appearance-based gaze estimation is believed to work well in real-world
settings, but existing datasets have been collected under controlled laboratory
conditions and methods have been not evaluated across multiple datasets. In
this work we study appearance-based gaze estimation in the wild. We present the
MPIIGaze dataset that contains 213,659 images we collected from 15 participants
during natural everyday laptop use over more than three months. Our dataset is
significantly more variable than existing ones with respect to appearance and
illumination. We also present a method for in-the-wild appearance-based gaze
estimation using multimodal convolutional neural networks that significantly
outperforms state-of-the art methods in the most challenging cross-dataset
evaluation. We present an extensive evaluation of several state-of-the-art
image-based gaze estimation algorithms on three current datasets, including our
own. This evaluation provides clear insights and allows us to identify key
research challenges of gaze estimation in the wild
Recommended from our members
Unmediated Interaction: Communicating with Computers and Embedded Devices as If They Are Not There
Although computers are smaller and more readily accessible today than they have ever been, I believe that we have barely scratched the surface of what computers can become. When we use computing devices today, we end up spending a lot of our time navigating to particular functions or commands to use devices their way rather than executing those commands immediately. In this dissertation, I explore what I call unmediated interaction, the notion of people using computers as if the computers are not there and as if the people are using their own abilities or powers instead. I argue that facilitating unmediated interaction via personalization, new input modalities, and improved text entry can reduce both input overhead and output overhead, which are the burden of providing inputs to and receiving outputs from the intermediate device, respectively. I introduce three computational methods for reducing input overhead and one for reducing output overhead. First, I show how input data mining can eliminate the need for user inputs altogether. Specifically, I develop a method for mining controller inputs to gain deep insights about a players playing style, their preferences, and the nature of video games that they are playing, all of which can be used to personalize their experience without any explicit input on their part. Next, I introduce gaze locking, a method for sensing eye contact from an image that allows people to interact with computers, devices, and other objects just by looking at them. Third, I introduce computationally optimized keyboard designs for touchscreen manual input that allow people to type on smartphones faster and with far fewer errors than currently possible. Last, I introduce the racing auditory display (RAD), an audio system that makes it possible for people who are blind to play the same types of racing games that sighted players can play, and with a similar speed and sense of control as sighted players. The RAD shows how we can reduce output overhead to provide user interface parity between people with and without disabilities. Together, I hope that these systems open the door to even more efforts in unmediated interaction, with the goal of making computers less like devices that we use and more like abilities or powers that we have
Gaze estimation and interaction in real-world environments
Human eye gaze has been widely used in human-computer interaction, as it is a promising modality for natural, fast, pervasive, and non-verbal interaction between humans and computers. As the foundation of gaze-related interactions, gaze estimation has been a hot research topic in recent decades. In this thesis, we focus on developing appearance-based gaze estimation methods and corresponding attentive user interfaces with a single webcam for challenging real-world environments. First, we collect a large-scale gaze estimation dataset, MPIIGaze, the first of its kind, outside of controlled laboratory conditions. Second, we propose an appearance-based method that, in stark contrast to a long-standing tradition in gaze estimation, only takes the full face image as input. Second, we propose an appearance-based method that, in stark contrast to a long-standing tradition in gaze estimation, only takes the full face image as input. Third, we study data normalisation for the first time in a principled way, and propose a modification that yields significant performance improvements. Fourth, we contribute an unsupervised detector for human-human and human-object eye contact. Finally, we study personal gaze estimation with multiple personal devices, such as mobile phones, tablets, and laptops.Der Blick des menschlichen Auges wird in Mensch-Computer-Interaktionen verbreitet eingesetzt, da dies eine vielversprechende Möglichkeit für natürliche, schnelle, allgegenwärtige und nonverbale Interaktion zwischen Mensch und Computer ist. Als Grundlage von blickbezogenen Interaktionen ist die Blickschätzung in den letzten Jahrzehnten ein wichtiges Forschungsthema geworden. In dieser Arbeit konzentrieren wir uns auf die Entwicklung Erscheinungsbild-basierter Methoden zur Blickschätzung und entsprechender “attentive user interfaces” (die Aufmerksamkeit des Benutzers einbeziehende Benutzerschnittstellen) mit nur einer Webcam für anspruchsvolle natürliche Umgebungen. Zunächst sammeln wir einen umfangreichen Datensatz zur Blickschätzung, MPIIGaze, der erste, der außerhalb von kontrollierten Laborbedingungen erstellt wurde. Zweitens schlagen wir eine Erscheinungsbild-basierte Methode vor, die im Gegensatz zur langjährigen Tradition in der Blickschätzung nur eine vollständige Aufnahme des Gesichtes als Eingabe verwendet. Drittens untersuchen wir die Datennormalisierung erstmals grundsätzlich und schlagen eine Modifizierung vor, die zu signifikanten Leistungsverbesserungen führt. Viertens stellen wir einen unüberwachten Detektor für Augenkontakte zwischen Mensch und Mensch und zwischen Mensch und Objekt vor. Abschließend untersuchen wir die persönliche Blickschätzung mit mehreren persönlichen Geräten wie Handy, Tablet und Laptop