1,596 research outputs found

    Gaze-tracking-based interface for robotic chair guidance

    Get PDF
    This research focuses on finding solutions to enhance the quality of life for wheelchair users, specifically by applying a gaze-tracking-based interface for the guidance of a robotized wheelchair. For this purpose, the interface was applied in two different approaches for the wheelchair control system. The first one was an assisted control in which the user was continuously involved in controlling the movement of the wheelchair in the environment and the inclination of the different parts of the seat through the user’s gaze and eye blinks obtained with the interface. The second approach was to take the first steps to apply the device to an autonomous wheelchair control in which the wheelchair moves autonomously avoiding collisions towards the position defined by the user. To this end, the basis for obtaining the gaze position relative to the wheelchair and the object detection was developed in this project to be able to calculate in the future the optimal route to which the wheelchair should move. In addition, the integration of a robotic arm in the wheelchair to manipulate different objects was also considered, obtaining in this work the object of interest indicated by the user's gaze within the detected objects so that in the future the robotic arm could select and pick up the object the user wants to manipulate. In addition to the two approaches, an attempt was also made to estimate the user's gaze without the software interface. For this purpose, the gaze is obtained from pupil detection libraries, a calibration and a mathematical model that relates pupil positions to gaze. The results of the implementations have been analysed in this work, including some limitations encountered. Nevertheless, future improvements are proposed, with the aim of increasing the independence of wheelchair user

    Proficiency-aware systems

    Get PDF
    In an increasingly digital world, technological developments such as data-driven algorithms and context-aware applications create opportunities for novel human-computer interaction (HCI). We argue that these systems have the latent potential to stimulate users and encourage personal growth. However, users increasingly rely on the intelligence of interactive systems. Thus, it remains a challenge to design for proficiency awareness, essentially demanding increased user attention whilst preserving user engagement. Designing and implementing systems that allow users to become aware of their own proficiency and encourage them to recognize learning benefits is the primary goal of this research. In this thesis, we introduce the concept of proficiency-aware systems as one solution. In our definition, proficiency-aware systems use estimates of the user's proficiency to tailor the interaction in a domain and facilitate a reflective understanding for this proficiency. We envision that proficiency-aware systems leverage collected data for learning benefit. Here, we see self-reflection as a key for users to become aware of necessary efforts to advance their proficiency. A key challenge for proficiency-aware systems is the fact that users often have a different self-perception of their proficiency. The benefits of personal growth and advancing one's repertoire might not necessarily be apparent to users, alienating them, and possibly leading to abandoning the system. To tackle this challenge, this work does not rely on learning strategies but rather focuses on the capabilities of interactive systems to provide users with the necessary means to reflect on their proficiency, such as showing calculated text difficulty to a newspaper editor or visualizing muscle activity to a passionate sportsperson. We first elaborate on how proficiency can be detected and quantified in the context of interactive systems using physiological sensing technologies. Through developing interaction scenarios, we demonstrate the feasibility of gaze- and electromyography-based proficiency-aware systems by utilizing machine learning algorithms that can estimate users' proficiency levels for stationary vision-dominant tasks (reading, information intake) and dynamic manual tasks (playing instruments, fitness exercises). Secondly, we show how to facilitate proficiency awareness for users, including design challenges on when and how to communicate proficiency. We complement this second part by highlighting the necessity of toolkits for sensing modalities to enable the implementation of proficiency-aware systems for a wide audience. In this thesis, we contribute a definition of proficiency-aware systems, which we illustrate by designing and implementing interactive systems. We derive technical requirements for real-time, objective proficiency assessment and identify design qualities of communicating proficiency through user reflection. We summarize our findings in a set of design and engineering guidelines for proficiency awareness in interactive systems, highlighting that proficiency feedback makes performance interpretable for the user.In einer zunehmend digitalen Welt schaffen technologische Entwicklungen - wie datengesteuerte Algorithmen und kontextabhängige Anwendungen - neuartige Interaktionsmöglichkeiten mit digitalen Geräten. Jedoch verlassen sich Nutzer oftmals auf die Intelligenz dieser Systeme, ohne dabei selbst auf eine persönliche Weiterentwicklung hinzuwirken. Wird ein solches Vorgehen angestrebt, verlangt dies seitens der Anwender eine erhöhte Aufmerksamkeit. Es ist daher herausfordernd, ein entsprechendes Design für Kompetenzbewusstsein (Proficiency Awareness) zu etablieren. Das primäre Ziel dieser Arbeit ist es, eine Methodik für das Design und die Implementierung von interaktiven Systemen aufzustellen, die Nutzer dabei unterstützen über ihre eigene Kompetenz zu reflektieren, um dadurch Lerneffekte implizit wahrnehmen können. Diese Arbeit stellt ein Konzept für fähigkeitsbewusste Systeme (proficiency-aware systems) vor, welche die Fähigkeiten von Nutzern abschätzen, die Interaktion entsprechend anpassen sowie das Bewusstsein der Nutzer über deren Fähigkeiten fördern. Hierzu sollten die Systeme gesammelte Daten von Nutzern einsetzen, um Lerneffekte sichtbar zu machen. Die Möglichkeit der Anwender zur Selbstreflexion ist hierbei als entscheidend anzusehen, um als Motivation zur Verbesserung der eigenen Fähigkeiten zu dienen. Eine zentrale Herausforderung solcher Systeme ist die Tatsache, dass Nutzer - im Vergleich zur Abschätzung des Systems - oft eine divergierende Selbstwahrnehmung ihrer Kompetenz haben. Im ersten Moment sind daher die Vorteile einer persönlichen Weiterentwicklung nicht unbedingt ersichtlich. Daher baut diese Forschungsarbeit nicht darauf auf, Nutzer über vorgegebene Lernstrategien zu unterrichten, sondern sie bedient sich der Möglichkeiten interaktiver Systeme, die Anwendern die notwendigen Hilfsmittel zur Verfügung stellen, damit diese selbst über ihre Fähigkeiten reflektieren können. Einem Zeitungseditor könnte beispielsweise die aktuelle Textschwierigkeit angezeigt werden, während einem passionierten Sportler dessen Muskelaktivität veranschaulicht wird. Zunächst wird herausgearbeitet, wie sich die Fähigkeiten der Nutzer mittels physiologischer Sensortechnologien erkennen und quantifizieren lassen. Die Evaluation von Interaktionsszenarien demonstriert die Umsetzbarkeit fähigkeitsbewusster Systeme, basierend auf der Analyse von Blickbewegungen und Muskelaktivität. Hierbei kommen Algorithmen des maschinellen Lernens zum Einsatz, die das Leistungsniveau der Anwender für verschiedene Tätigkeiten berechnen. Im Besonderen analysieren wir stationäre Aktivitäten, die hauptsächlich den Sehsinn ansprechen (Lesen, Aufnahme von Informationen), sowie dynamische Betätigungen, die die Motorik der Nutzer fordern (Spielen von Instrumenten, Fitnessübungen). Der zweite Teil zeigt auf, wie Systeme das Bewusstsein der Anwender für deren eigene Fähigkeiten fördern können, einschließlich der Designherausforderungen , wann und wie das System erkannte Fähigkeiten kommunizieren sollte. Abschließend wird die Notwendigkeit von Toolkits für Sensortechnologien hervorgehoben, um die Implementierung derartiger Systeme für ein breites Publikum zu ermöglichen. Die Forschungsarbeit beinhaltet eine Definition für fähigkeitsbewusste Systeme und veranschaulicht dieses Konzept durch den Entwurf und die Implementierung interaktiver Systeme. Ferner werden technische Anforderungen objektiver Echtzeitabschätzung von Nutzerfähigkeiten erforscht und Designqualitäten für die Kommunikation dieser Abschätzungen mittels Selbstreflexion identifiziert. Zusammengefasst sind die Erkenntnisse in einer Reihe von Design- und Entwicklungsrichtlinien für derartige Systeme. Insbesondere die Kommunikation, der vom System erkannten Kompetenz, hilft Anwendern, die eigene Leistung zu interpretieren

    Dwell-free input methods for people with motor impairments

    Full text link
    Millions of individuals affected by disorders or injuries that cause severe motor impairments have difficulty performing compound manipulations using traditional input devices. This thesis first explores how effective various assistive technologies are for people with motor impairments. The following questions are studied: (1) What activities are performed? (2) What tools are used to support these activities? (3) What are the advantages and limitations of these tools? (4) How do users learn about and choose assistive technologies? (5) Why do users adopt or abandon certain tools? A qualitative study of fifteen people with motor impairments indicates that users have strong needs for efficient text entry and communication tools that are not met by existing technologies. To address these needs, this thesis proposes three dwell-free input methods, designed to improve the efficacy of target selection and text entry based on eye-tracking and head-tracking systems. They yield: (1) the Target Reverse Crossing selection mechanism, (2) the EyeSwipe eye-typing interface, and (3) the HGaze Typing interface. With Target Reverse Crossing, a user moves the cursor into a target and reverses over a goal to select it. This mechanism is significantly more efficient than dwell-time selection. Target Reverse Crossing is then adapted in EyeSwipe to delineate the start and end of a word that is eye-typed with a gaze path connecting the intermediate characters (as with traditional gesture typing). When compared with a dwell-based virtual keyboard, EyeSwipe affords higher text entry rates and a more comfortable interaction. Finally, HGaze Typing adds head gestures to gaze-path-based text entry to enable simple and explicit command activations. Results from a user study demonstrate that HGaze Typing has better performance and user satisfaction than a dwell-time method

    Perceptually-guided deep neural networks for ego-action prediction: Object grasping

    Get PDF
    We tackle the problem of predicting a grasping action in ego-centric video for the assistance to upper limb amputees. Our work is based on paradigms of neuroscience that state that human gaze expresses intention and anticipates actions. In our scenario, human gaze fixations are recorded by a glass-worn eye-tracker and then used to predict the grasping actions. We have studied two aspects of the problem: which object from a given taxonomy will be grasped, and when is the moment to trigger the grasping action. To recognize objects, we using gaze to guide Convolutional Neural Networks (CNN) to focus on an object-to-grasp area. However, the acquired sequence of fixations is noisy due to saccades toward distractors and visual fatigue, and gaze is not always reliably directed toward the object-of-interest. To deal with this challenge, we use video-level annotations indicating the object to be grasped and a weak loss in Deep CNNs. To detect a moment when a person will take an object we take advantage of the predictive power of Long-Short Term Memory networks to analyze gaze and visual dynamics. Results show that our method achieves better performance than other approaches on a real-life dataset. (C) 2018 Elsevier Ltd. All rights reserved.This work was partially supported by French National Center of Scientific research with grant Suvipp PEPS CNRS-Idex 215-2016, by French National Center of Scientific research with Interdisciplinary project CNRS RoBioVis 2017–2019, the Scientific Council of Labri, University of Bordeaux, and the Spanish Ministry of Economy and Competitiveness under the National Grants TEC2014-53390-P and TEC2014-61729-EXP.Publicad

    Face pose estimation with automatic 3D model creation for a driver inattention monitoring application

    Get PDF
    Texto en inglés y resumen en inglés y españolRecent studies have identified inattention (including distraction and drowsiness) as the main cause of accidents, being responsible of at least 25% of them. Driving distraction has been less studied, since it is more diverse and exhibits a higher risk factor than fatigue. In addition, it is present over half of the inattention involved crashes. The increased presence of In Vehicle Information Systems (IVIS) adds to the potential distraction risk and modifies driving behaviour, and thus research on this issue is of vital importance. Many researchers have been working on different approaches to deal with distraction during driving. Among them, Computer Vision is one of the most common, because it allows for a cost effective and non-invasive driver monitoring and sensing. Using Computer Vision techniques it is possible to evaluate some facial movements that characterise the state of attention of a driver. This thesis presents methods to estimate the face pose and gaze direction of a person in real-time, using a stereo camera as a basic for assessing driver distractions. The methods are completely automatic and user-independent. A set of features in the face are identified at initialisation, and used to create a sparse 3D model of the face. These features are tracked from frame to frame, and the model is augmented to cover parts of the face that may have been occluded before. The algorithm is designed to work in a naturalistic driving simulator, which presents challenging low light conditions. We evaluate several techniques to detect features on the face that can be matched between cameras and tracked with success. Well-known methods such as SURF do not return good results, due to the lack of salient points in the face, as well as the low illumination of the images. We introduce a novel multisize technique, based on Harris corner detector and patch correlation. This technique benefits from the better performance of small patches under rotations and illumination changes, and the more robust correlation of the bigger patches under motion blur. The head rotates in a range of ±90º in the yaw angle, and the appearance of the features change noticeably. To deal with these changes, we implement a new re-registering technique that captures new textures of the features as the face rotates. These new textures are incorporated to the model, which mixes the views of both cameras. The captures are taken at regular angle intervals for rotations in yaw, so that each texture is only used in a range of ±7.5º around the capture angle. Rotations in pitch and roll are handled using affine patch warping. The 3D model created at initialisation can only take features in the frontal part of the face, and some of these may occlude during rotations. The accuracy and robustness of the face tracking depends on the number of visible points, so new points are added to the 3D model when new parts of the face are visible from both cameras. Bundle adjustment is used to reduce the accumulated drift of the 3D reconstruction. We estimate the pose from the position of the features in the images and the 3D model using POSIT or Levenberg-Marquardt. A RANSAC process detects incorrectly tracked points, which are not considered for pose estimation. POSIT is faster, while LM obtains more accurate results. Using the model extension and the re-registering technique, we can accurately estimate the pose in the full head rotation range, with error levels that improve the state of the art. A coarse eye direction is composed with the face pose estimation to obtain the gaze and driver's fixation area, parameter which gives much information about the distraction pattern of the driver. The resulting gaze estimation algorithm proposed in this thesis has been tested on a set of driving experiments directed by a team of psychologists in a naturalistic driving simulator. This simulator mimics conditions present in real driving, including weather changes, manoeuvring and distractions due to IVIS. Professional drivers participated in the tests. The driver?s fixation statistics obtained with the proposed system show how the utilisation of IVIS influences the distraction pattern of the drivers, increasing reaction times and affecting the fixation of attention on the road and the surroundings

    Face pose estimation with automatic 3D model creation for a driver inattention monitoring application

    Get PDF
    Texto en inglés y resumen en inglés y españolRecent studies have identified inattention (including distraction and drowsiness) as the main cause of accidents, being responsible of at least 25% of them. Driving distraction has been less studied, since it is more diverse and exhibits a higher risk factor than fatigue. In addition, it is present over half of the inattention involved crashes. The increased presence of In Vehicle Information Systems (IVIS) adds to the potential distraction risk and modifies driving behaviour, and thus research on this issue is of vital importance. Many researchers have been working on different approaches to deal with distraction during driving. Among them, Computer Vision is one of the most common, because it allows for a cost effective and non-invasive driver monitoring and sensing. Using Computer Vision techniques it is possible to evaluate some facial movements that characterise the state of attention of a driver. This thesis presents methods to estimate the face pose and gaze direction of a person in real-time, using a stereo camera as a basic for assessing driver distractions. The methods are completely automatic and user-independent. A set of features in the face are identified at initialisation, and used to create a sparse 3D model of the face. These features are tracked from frame to frame, and the model is augmented to cover parts of the face that may have been occluded before. The algorithm is designed to work in a naturalistic driving simulator, which presents challenging low light conditions. We evaluate several techniques to detect features on the face that can be matched between cameras and tracked with success. Well-known methods such as SURF do not return good results, due to the lack of salient points in the face, as well as the low illumination of the images. We introduce a novel multisize technique, based on Harris corner detector and patch correlation. This technique benefits from the better performance of small patches under rotations and illumination changes, and the more robust correlation of the bigger patches under motion blur. The head rotates in a range of ±90º in the yaw angle, and the appearance of the features change noticeably. To deal with these changes, we implement a new re-registering technique that captures new textures of the features as the face rotates. These new textures are incorporated to the model, which mixes the views of both cameras. The captures are taken at regular angle intervals for rotations in yaw, so that each texture is only used in a range of ±7.5º around the capture angle. Rotations in pitch and roll are handled using affine patch warping. The 3D model created at initialisation can only take features in the frontal part of the face, and some of these may occlude during rotations. The accuracy and robustness of the face tracking depends on the number of visible points, so new points are added to the 3D model when new parts of the face are visible from both cameras. Bundle adjustment is used to reduce the accumulated drift of the 3D reconstruction. We estimate the pose from the position of the features in the images and the 3D model using POSIT or Levenberg-Marquardt. A RANSAC process detects incorrectly tracked points, which are not considered for pose estimation. POSIT is faster, while LM obtains more accurate results. Using the model extension and the re-registering technique, we can accurately estimate the pose in the full head rotation range, with error levels that improve the state of the art. A coarse eye direction is composed with the face pose estimation to obtain the gaze and driver's fixation area, parameter which gives much information about the distraction pattern of the driver. The resulting gaze estimation algorithm proposed in this thesis has been tested on a set of driving experiments directed by a team of psychologists in a naturalistic driving simulator. This simulator mimics conditions present in real driving, including weather changes, manoeuvring and distractions due to IVIS. Professional drivers participated in the tests. The driver?s fixation statistics obtained with the proposed system show how the utilisation of IVIS influences the distraction pattern of the drivers, increasing reaction times and affecting the fixation of attention on the road and the surroundings

    Driver Attention based on Deep Learning for a Smart Vehicle to Driver (V2D) Interaction

    Get PDF
    La atención del conductor es un tópico interesante dentro del mundo de los vehículos inteligentes para la consecución de tareas que van desde la monitorización del conductor hasta la conducción autónoma. Esta tesis aborda este tópico basándose en algoritmos de aprendizaje profundo para conseguir una interacción inteligente entre el vehículo y el conductor. La monitorización del conductor requiere una estimación precisa de su mirada en un entorno 3D para conocer el estado de su atención. En esta tesis se aborda este problema usando una única cámara, para que pueda ser utilizada en aplicaciones reales, sin un alto coste y sin molestar al conductor. La herramienta desarrollada ha sido evaluada en una base de datos pública (DADA2000), obteniendo unos resultados similares a los obtenidos mediante un seguidor de ojos caro que no puede ser usado en un vehículo real. Además, ha sido usada en una aplicación que evalúa la atención del conductor en la transición de modo autónomo a manual de forma simulada, proponiendo el uso de una métrica novedosa para conocer el estado de la situación del conductor en base a su atención sobre los diferentes objetos de la escena. Por otro lado, se ha propuesto un algoritmo de estimación de atención del conductor, utilizando las últimas técnicas de aprendizaje profundo como son las conditional Generative Adversarial Networks (cGANs) y el Multi-Head Self-Attention. Esto permite enfatizar ciertas zonas de la escena al igual que lo haría un humano. El modelo ha sido entrenado y validado en dos bases de datos públicas (BDD-A y DADA2000) superando a otras propuestas del estado del arte y consiguiendo unos tiempos de inferencia que permiten su uso en aplicaciones reales. Por último, se ha desarrollado un modelo que aprovecha nuestro algoritmo de atención del conductor para comprender una escena de tráfico obteniendo la decisión tomada por el vehículo y su explicación, en base a las imágenes tomadas por una cámara situada en la parte frontal del vehículo. Ha sido entrenado en una base de datos pública (BDD-OIA) proponiendo un modelo que entiende la secuencia temporal de los eventos usando un Transformer Encoder, consiguiendo superar a otras propuestas del estado del arte. Además de su validación en la base de datos, ha sido implementado en una aplicación que interacciona con el conductor aconsejando sobre las decisiones a tomar y sus explicaciones ante diferentes casos de uso en un entorno simulado. Esta tesis explora y demuestra los beneficios de la atención del conductor para el mundo de los vehículos inteligentes, logrando una interacción vehículo conductor a través de las últimas técnicas de aprendizaje profundo

    Attention Allocation Aid for Visual Search

    Full text link
    This paper outlines the development and testing of a novel, feedback-enabled attention allocation aid (AAAD), which uses real-time physiological data to improve human performance in a realistic sequential visual search task. Indeed, by optimizing over search duration, the aid improves efficiency, while preserving decision accuracy, as the operator identifies and classifies targets within simulated aerial imagery. Specifically, using experimental eye-tracking data and measurements about target detectability across the human visual field, we develop functional models of detection accuracy as a function of search time, number of eye movements, scan path, and image clutter. These models are then used by the AAAD in conjunction with real time eye position data to make probabilistic estimations of attained search accuracy and to recommend that the observer either move on to the next image or continue exploring the present image. An experimental evaluation in a scenario motivated from human supervisory control in surveillance missions confirms the benefits of the AAAD.Comment: To be presented at the ACM CHI conference in Denver, Colorado in May 201