124 research outputs found

    Real-time gaze estimation using a Kinect and a HD webcam

    Get PDF
    In human-computer interaction, gaze orientation is an important and promising source of information to demonstrate the attention and focus of users. Gaze detection can also be an extremely useful metric for analysing human mood and affect. Furthermore, gaze can be used as an input method for human-computer interaction. However, currently real-time and accurate gaze estimation is still an open problem. In this paper, we propose a simple and novel estimation model of the real-time gaze direction of a user on a computer screen. This method utilises cheap capturing devices, a HD webcam and a Microsoft Kinect. We consider that the gaze motion from a user facing forwards is composed of the local gaze motion shifted by eye motion and the global gaze motion driven by face motion. We validate our proposed model of gaze estimation and provide experimental evaluation of the reliability and the precision of the method

    A low-cost head and eye tracking system for realistic eye movements in virtual avatars

    Get PDF
    A virtual avatar or autonomous agent is a digital representation of a human being that can be controlled by either a human or an artificially intelligent computer system. Increasingly avatars are becoming realistic virtual human characters that exhibit human behavioral traits, body language and eye and head movements. As the interpretation of eye and head movements represents an important part of nonverbal human communication it is extremely important to accurately reproduce these movements in virtual avatars to avoid falling into the well-known ``uncanny valley''. In this paper we present a cheap hybrid real-time head and eye tracking system based on existing open source software and commonly available hardware. Our evaluation indicates that the system of head and eye tracking is stable and accurate and can allow a human user to robustly puppet a virtual avatar, potentially allowing us to train an A.I. system to learn realistic human head and eye movements

    Estimating Point of Regard with a Consumer Camera at a Distance

    Full text link
    In this work, we have studied the viability of a novel technique to estimate the POR that only requires video feed from a consumer camera. The system can work under uncontrolled light conditions and does not require any complex hardware setup. To that end we propose a system that uses PCA feature extraction from the eyes region followed by non-linear regression. We evaluated three state of the art non-linear regression algorithms. In the study, we also compared the performance using a high quality webcam versus a Kinect sensor. We found, that despite the relatively low quality of the Kinect images it achieves similar performance compared to the high quality camera. These results show that the proposed approach could be extended to estimate POR in a completely non-intrusive way.Mansanet Sandin, J.; Albiol Colomer, A.; Paredes Palacios, R.; Mossi García, JM.; Albiol Colomer, AJ. (2013). Estimating Point of Regard with a Consumer Camera at a Distance. En Pattern Recognition and Image Analysis. Springer Verlag. 7887:881-888. doi:10.1007/978-3-642-38628-2_104S8818887887Baluja, S., Pomerleau, D.: Non-intrusive gaze tracking using artificial neural networks. Technical report (1994)Breiman, L.: Random forests. Machine Learning (2001)Logitech HD Webcam C525, http://www.logitech.com/es-es/webcam-communications/webcams/hd-webcam-c525Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM TIST (2011), Software, http://www.csie.ntu.edu.tw/~cjlin/libsvmDrucker, H., Burges, C., Kaufman, L., Smola, A., Vapnik, V.: Support vector regression machines (1996)Hansen, D.W., Ji, Q. In: the eye of the beholder: A survey of models for eyes and gaze. IEEE Transactions on PAMI (2010)Ji, Q., Yang, X.: Real-time eye, gaze, and face pose tracking for monitoring driver vigilance. Real-Time Imaging (2002)Kalman, R.E.: A new approach to linear filtering and prediction problems. Transactions of the ASME–Journal of Basic Engineering (1960)Microsoft Kinect, http://www.microsoft.com/en-us/kinectforwindowsTimmerman, M.E.: Principal component analysis (2nd ed.). i. t. jolliffe. Journal of the American Statistical Association (2003)Morimoto, C.H., Mimica, M.R.M.: Eye gaze tracking techniques for interactive applications. Comput. Vis. Image Underst. (2005)Pirri, F., Pizzoli, M., Rudi, A.: A general method for the point of regard estimation in 3d space. In: Proceedings of the IEEE Conference on CVPR (2011)Reale, M.J., Canavan, S., Yin, L., Hu, K., Hung, T.: A multi-gesture interaction system using a 3-d iris disk model for gaze estimation and an active appearance model for 3-d hand pointing. IEEE Transactions on Multimedia (2011)Saragih, J.M., Lucey, S., Cohn, J.F.: Face alignment through subspace constrained mean-shifts. In: International Conference of Computer Vision, ICCV (2009)Kar-Han, T., Kriegman, D.J., Ahuja, N.: Appearance-based eye gaze estimation. In: Applications of Computer Vision (2002)Takemura, K., Kohashi, Y., Suenaga, T., Takamatsu, J., Ogasawara, T.: Estimating 3d point-of-regard and visualizing gaze trajectories under natural head movements. In: Symposium on Eye-Tracking Research and Applications (2010)Villanueva, A., Cabeza, R., Porta, S.: Eye tracking: Pupil orientation geometrical modeling. Image and Vision Computing (2006)Williams, O., Blake, A., Cipolla, R.: Sparse and semi-supervised visual mapping with the s3gp. In: IEEE Computer Society Conference on CVPR (2006

    Development Of Eye Gaze Estimation System Using Two Cameras

    Get PDF
    Eye Gaze is the direction where a person is looking at. It is suitable to be used as a type of natural Human Computer Interface (HCI). Current researches uses infrared or LED to locate the iris of the user to have better gaze estimation accuracy compared to researches that does not. Infrared and LED are intrusive to human eyes and might cause damage to the cornea and the retina of the eye. This research suggests a non-intrusive approach to locate the iris of the user. By using two remote cameras to capture the images of the user, a better accuracy gaze estimation system can be achieved. The system uses Haar cascade algorithms to detect the face and eye regions. The iris detection uses Hough Circle Transform algorithm to locate the position of the iris, which is critical for the gaze estimation calculation. To enable the system to track the eye and the iris location of the user in real time, the system uses CAMshift (Continuously Adaptive Meanshift) to track the eye and iris of the user. The parameters of the eye and iris are then collected and are used to calculate the gaze direction of the user. The left and right camera achieves 70.00% and 74.67% accuracy respectively. When two cameras are used to estimate the gaze direction, 88.67% accuracy is achieved. This shows that by using two cameras, the accuracy of gaze estimation is improved

    A Framework for Controlling Wheelchair Motion by using Gaze Information

    Get PDF
    Users with severe motor ability are unable to control their wheelchair using standard joystick and hence an alternative control input is preferred. In this paper a method on how to enable the severe impairment user to control a wheelchair via gaze information is proposed. Since when using such an input, the navigation burden for the user is significantly increased, an assistive navigation platform is also proposed to reduce the user burden. Initially, user information is inferred using a camera and a bite-like switch. Then information from the environment is obtained using combination of laser and Kinect sensors. Eventually, both information from the environment and the user is analyzed to decide the final control operation that according to the user intention and safe from collision. Experimental results demonstrate the feasibility of the proposed approach

    Unobtrusive and pervasive video-based eye-gaze tracking

    Get PDF
    Eye-gaze tracking has long been considered a desktop technology that finds its use inside the traditional office setting, where the operating conditions may be controlled. Nonetheless, recent advancements in mobile technology and a growing interest in capturing natural human behaviour have motivated an emerging interest in tracking eye movements within unconstrained real-life conditions, referred to as pervasive eye-gaze tracking. This critical review focuses on emerging passive and unobtrusive video-based eye-gaze tracking methods in recent literature, with the aim to identify different research avenues that are being followed in response to the challenges of pervasive eye-gaze tracking. Different eye-gaze tracking approaches are discussed in order to bring out their strengths and weaknesses, and to identify any limitations, within the context of pervasive eye-gaze tracking, that have yet to be considered by the computer vision community.peer-reviewe

    Videos in Context for Telecommunication and Spatial Browsing

    Get PDF
    The research presented in this thesis explores the use of videos embedded in panoramic imagery to transmit spatial and temporal information describing remote environments and their dynamics. Virtual environments (VEs) through which users can explore remote locations are rapidly emerging as a popular medium of presence and remote collaboration. However, capturing visual representation of locations to be used in VEs is usually a tedious process that requires either manual modelling of environments or the employment of specific hardware. Capturing environment dynamics is not straightforward either, and it is usually performed through specific tracking hardware. Similarly, browsing large unstructured video-collections with available tools is difficult, as the abundance of spatial and temporal information makes them hard to comprehend. At the same time, on a spectrum between 3D VEs and 2D images, panoramas lie in between, as they offer the same 2D images accessibility while preserving 3D virtual environments surrounding representation. For this reason, panoramas are an attractive basis for videoconferencing and browsing tools as they can relate several videos temporally and spatially. This research explores methods to acquire, fuse, render and stream data coming from heterogeneous cameras, with the help of panoramic imagery. Three distinct but interrelated questions are addressed. First, the thesis considers how spatially localised video can be used to increase the spatial information transmitted during video mediated communication, and if this improves quality of communication. Second, the research asks whether videos in panoramic context can be used to convey spatial and temporal information of a remote place and the dynamics within, and if this improves users' performance in tasks that require spatio-temporal thinking. Finally, the thesis considers whether there is an impact of display type on reasoning about events within videos in panoramic context. These research questions were investigated over three experiments, covering scenarios common to computer-supported cooperative work and video browsing. To support the investigation, two distinct video+context systems were developed. The first telecommunication experiment compared our videos in context interface with fully-panoramic video and conventional webcam video conferencing in an object placement scenario. The second experiment investigated the impact of videos in panoramic context on quality of spatio-temporal thinking during localization tasks. To support the experiment, a novel interface to video-collection in panoramic context was developed and compared with common video-browsing tools. The final experimental study investigated the impact of display type on reasoning about events. The study explored three adaptations of our video-collection interface to three display types. The overall conclusion is that videos in panoramic context offer a valid solution to spatio-temporal exploration of remote locations. Our approach presents a richer visual representation in terms of space and time than standard tools, showing that providing panoramic contexts to video collections makes spatio-temporal tasks easier. To this end, videos in context are suitable alternative to more difficult, and often expensive solutions. These findings are beneficial to many applications, including teleconferencing, virtual tourism and remote assistance

    Multisensory integration across exteroceptive and interoceptive domains modulates self-experience in the rubber-hand illusion

    Get PDF
    Identifying with a body is central to being a conscious self. The now classic “rubber hand illusion” demonstrates that the experience of body ownership can be modulated by manipulating the timing of exteroceptive(visual and tactile)body-related feedback. Moreover,the strength of this modulation is related to individual differences in sensitivity to internal bodily signals(interoception). However the interaction of exteroceptive and interoceptive signals in determining the experience of body-ownership within an individual remains poorly understood.Here, we demonstrate that this depends on the online integration of exteroceptive and interoceptive signals by implementing an innovative “cardiac rubber hand illusion” that combined computer-generated augmented-reality with feedback of interoceptive (cardiac) information. We show that both subjective and objective measures of virtual-hand ownership are enhanced by cardio-visual feedback in-time with the actual heartbeat,as compared to asynchronous feedback. We further show that these measures correlate with individual differences in interoceptive sensitivity,and are also modulated by the integration of proprioceptive signals instantiated using real-time visual remapping of finger movements to the virtual hand.Our results demonstrate that interoceptive signals directly influence the experience of body ownership via multisensory integration,and they lend support to models of conscious selfhood based on interoceptive predictive coding

    2D and 3D computer vision analysis of gaze, gender and age

    Get PDF
    Human-Computer Interaction (HCI) has been an active research area for over four decades. Research studies and commercial designs in this area have been largely facilitated by the visual modality which brings diversified functionality and improved usability to HCI interfaces by employing various computer vision techniques. This thesis explores a number of facial cues, such as gender, age and gaze, by performing 2D and 3D based computer vision analysis. The ultimate aim is to create a natural HCI strategy that can fulfil user expectations, augment user satisfaction and enrich user experience by understanding user characteristics and behaviours. To this end, salient features have been extracted and analysed from 2D and 3D face representations; 3D reconstruction algorithms and their compatible real-world imaging systems have been investigated; case study HCI systems have been designed to demonstrate the reliability, robustness, and applicability of the proposed method.More specifically, an unsupervised approach has been proposed to localise eye centres in images and videos accurately and efficiently. This is achieved by utilisation of two types of geometric features and eye models, complemented by an iris radius constraint and a selective oriented gradient filter specifically tailored to this modular scheme. This approach resolves challenges such as interfering facial edges, undesirable illumination conditions, head poses, and the presence of facial accessories and makeup. Tested on 3 publicly available databases (the BioID database, the GI4E database and the extended Yale Face Database b), and a self-collected database, this method outperforms all the methods in comparison and thus proves to be highly accurate and robust. Based on this approach, a gaze gesture recognition algorithm has been designed to increase the interactivity of HCI systems by encoding eye saccades into a communication channel similar to the role of hand gestures. As well as analysing eye/gaze data that represent user behaviours and reveal user intentions, this thesis also investigates the automatic recognition of user demographics such as gender and age. The Fisher Vector encoding algorithm is employed to construct visual vocabularies as salient features for gender and age classification. Algorithm evaluations on three publicly available databases (the FERET database, the LFW database and the FRCVv2 database) demonstrate the superior performance of the proposed method in both laboratory and unconstrained environments. In order to achieve enhanced robustness, a two-source photometric stereo method has been introduced to recover surface normals such that more invariant 3D facia features become available that can further boost classification accuracy and robustness. A 2D+3D imaging system has been designed for construction of a self-collected dataset including 2D and 3D facial data. Experiments show that utilisation of 3D facial features can increase gender classification rate by up to 6% (based on the self-collected dataset), and can increase age classification rate by up to 12% (based on the Photoface database). Finally, two case study HCI systems, a gaze gesture based map browser and a directed advertising billboard, have been designed by adopting all the proposed algorithms as well as the fully compatible imaging system. Benefits from the proposed algorithms naturally ensure that the case study systems can possess high robustness to head pose variation and illumination variation; and can achieve excellent real-time performance. Overall, the proposed HCI strategy enabled by reliably recognised facial cues can serve to spawn a wide array of innovative systems and to bring HCI to a more natural and intelligent state

    Recurso de tecnologia assistiva vestível para rastreamento do olhar

    Get PDF
    Orientador: Prof. Dr. Luciano SilvaDissertação (mestrado) - Universidade Federal do Paraná, Setor de Ciências Exatas, Programa de Pós-Graduação em Informática. Defesa : Curitiba, 08/02/2019Inclui referências: p.53-57Área de concentração: Ciência da ComputaçãoResumo: Pessoas com deficiência têm dificuldade de interagir com o ambiente que vivem, devido as limitações ocasionadas pela deficiência. Atividades simples como ligar uma TV ou qualquer outro equipamento, de forma independente, pode ser impossível para esse grupo de pessoas com deficiência motora. Este trabalho apresenta um dispositivo de rastreamento ocular de baixo custo e uma ferramenta computacional para a navegação na internet por meio de sinais biológicos. Os usuários em potencial para utilização do dispositivo são pessoas com deficiências motoras leves e graves, que desejam adquirir autonomia na navegação web com a interação da ferramenta baseada no conceito de Comunicação Aumentativa e Alternativa (CAA). O sinal biológico utilizado é a Videooculografia (VOG), possibilitando a interação dos movimentos dos olhos, no qual foi obtido o rastreamento das fixações do olhar. Um ponto importante deste trabalho é a utilização de equipamentos convencionais de baixo custo e impressão 3D para a confecção da estrutura do dispositivo de rastreamento ocular, fácil manuseio e de rápida adaptação. No primeiro teste, o objetivo foi avaliar a adaptação do usuário com o dispositivo por meio do processo de calibração, comparando o tempo de finalização do processo determinado. No segundo teste, o objetivo foi avaliar a performance de cada usuário ao utilizar o teclado virtual da ferramenta computacional na digitação de uma palavra. O terceiro e último teste, visa avaliar a usabilidade (SUS) diante da perspectiva do usuário, por meio da ferramenta computacional classificada como um sistema Comunicação Aumentativa e Alternativa (CAA). Os testes foram realizados por 16 participantes, sendo 8 com deficiência motora e 8 sem deficiência. Com relação ao resultado de usabilidade, observou que os participantes obtiveram resultados dentro do esperado ou superior, logo na primeira utilização. A acuracidade do resultado pode ser melhorada à medida que o usuário passe a utilizar o dispositivo de rastreamento ocular com mais frequência. Palavras-chave: Rastreamento do Olhar, Comunicação Aumentativa e Alternativa, Interface Humano-Computador.Abstract: People with disabilities have difficulty interacting with the environment they live, due to the limitations caused by the disability. Simple activities like connecting a TV or any other equipment, independently, may be impossible for this group of people with motor disabilities. This work presents a low cost eyepiece tracking device and computational tool for internet navigation through biological signals. Potential users to use the device are people with mild and severe motor impairments who wish to gain autonomy in web browsing with the tool interaction based on the concept of Incremental and Alternative Communication (CAA). The biological sign used is VOG, allowing the interaction of the eye movements, in which the tracking of the eye fixations was obtained. An important point of this work is the use of conventional equipment of low cost and 3D printing for the construction of the structure of the device of eye tracking, easy handling and fast adaptation. In the first test, the objective was to evaluate the adaptation of the user to the device through the calibration process, comparing the end time of the determined process. In the second test, the purpose was to evaluate the performance of each user when using the virtual keyboard of the computational tool when typing a word. The third and final test should aim to evaluate the usability (SUS) of the eye tracking device from the user's point of view, using the computational tool classified as an Incremental and Alternative Communication (CAA) system. The tests were performed by 16 participants, of whom 8 had motor deficiency and 8 had no disability. Regarding the usability result, it was observed that the participants obtained an expected or superior result at the first use. This result can still be improved as the user uses the eye tracking device more often. Keywords: Eye Tracking, Augmentative and Alternative Communication, Human-Computer Interaction
    corecore