2,519 research outputs found
A low cost virtual reality interface for educational games
Mobile virtual reality has the potential to improve learning experiences by making them more immersive and engaging for students. This type of virtual reality also aims to be more cost effective by using a smartphone to drive the virtual reality experience. One issue with mobile virtual reality is that the screen (i.e. main interface) of the smartphone is occluded by the virtual reality headset. To investigate solutions to this issue, this project details the development and testing of a computer vision based controller that aims to have a cheaper per unit cost when compared to a conventional electronic controller by making use of 3D printing and the built-in camera of a smartphone. Reducing the cost per unit is useful for educational contexts as solutions would need to scale to classrooms sizes. The research question for this project is thus, âcan a computer vision based virtual reality controller provide comparable immersion to a conventional electronic controllerâ. It was found that a computer vision based controller can provide comparable immersion, though it is more challenging to use. This challenge was found to contribute more towards engagement as it did not diminish the performance of users in terms of question scores
Recommended from our members
Leveraging Eye Structure and Motion to Build a Low-Power Wearable Gaze Tracking System
Clinical studies have shown that features of a person\u27s eyes can function as an effective proxy for cognitive state and neurological function. Technological advances in recent decades have allowed us to deepen this understanding and discover that the actions of the eyes are in fact very tightly coupled to the operation of the brain. Researchers have used camera-based eye monitoring technology to exploit this connection and analyze mental state across across many different metrics of interest. These range from simple things like attention and scene processing, to impairments such as a fatigue or substance use, and even significant mental disorders such as Parkinson\u27s, autism, and schizophrenia.
While there is a wealth of knowledge and social benefit to be gained from eye tracking, the field has historically been restricted to laboratory use by crippling technological limitations - most notably, device size and power consumption. These issues primarily stem from the use of high-resolution cameras and heavyweight video-processing algorithms, both of which induce extremely high performance overhead on the eye tracker. To address this problem, we have constructed a lightweight, ultra-low-power eye monitoring device in the form factor of a pair of eyeglasses. The key guiding design principle for its construction was saliency-aware resource minimization. Specifically, our design leverages the fact that close-up images of the eye are characterized by large salient features which provide a high degree of redundant information; we exploit this to heavily subsample the eye image and reduce resource utilization while performing effective eye tracking.
In the first part of this thesis, we present an initial design of a wearable system to enable ubiquitous eye tracking. By exploiting the fact that the eye has several large, visually redundant features such as the iris and pupil, we were able to develop a neural-network-based adaptive-sampling algorithm for predicting the gaze point while sampling a minimal number of pixels from the image. This enabled us to realize a power savings using specialized imaging hardware that would sample only those most salient pixels, which proportionally reduced the power and time cost of reading images for eye tracking. With these optimizations we were able to build a first-of-of its kind wearable eye tracker that consumed 40 mW of power and demonstrated a gaze tracking error of only 3 degrees across multiple subjects. We refer to this device as the iShadow platform.
The second contribution and section of this thesis is a significant improvement upon the original iShadow design for the purpose of improving both power utilization and eye tracking performance. We constructed a new pupil-tracking algorithm based on lightweight computer vision features, which leverages the smoothness of the eye\u27s motion to reduce even further the amount of camera sampling needed. To guard against very infrequent discontinuities resulting from blinks or reflections off the eye, we integrated this model with the previously-used one-shot neural network algorithm. Because the common case (smooth, uninterrupted eye motion) occurs 90% of the time, we were able to realize a dramatic increase in performance due to the efficiency of the smooth tracking algorithm. The new and improved system, labeled CIDER, enabled much more accurate eye tracking - 0.4 degree error - with power consumption as low as 7 mW. This design also enabled a tradeoff between power consumption and eye tracking rate, in which it was also possible to draw higher power of ~30 mW in order to do eye tracking at rates of up to 240 frames per second.
The final contribution of this thesis is a re-designed version of the iShadow glasses hardware that is suitable for ``in-the-wild\u27\u27 studies on subjects in their daily living environment. A wearable device, especially one that is worn on the head, must be minimally obtrusive in order to be accepted and used in the field by subjects. This design goal conflicts with the ideal placement of cameras that is needed for achieving consistent eye tracking fidelity. We present multiple possible methods we explored for addressing these competing design challenges, and discuss the reasons that many proved infeasible. To conclude, we present a working design solution that appears to optimally trade off user comfort and convenience and against the technical requirements of the system
Managing heterogeneous cues in social contexts. A holistic approach for social interactions analysis
Une interaction sociale dĂ©signe toute action rĂ©ciproque entre deux ou plusieurs individus, au cours de laquelle des informations sont partagĂ©es sans "mĂ©diation technologique". Cette interaction, importante dans la socialisation de l'individu et les compĂ©tences qu'il acquiert au cours de sa vie, constitue un objet d'Ă©tude pour diffĂ©rentes disciplines (sociologie, psychologie, mĂ©decine, etc.). Dans le contexte de tests et d'Ă©tudes observationnelles, de multiples mĂ©canismes sont utilisĂ©s pour Ă©tudier ces interactions tels que les questionnaires, l'observation directe des Ă©vĂ©nements et leur analyse par des opĂ©rateurs humains, ou l'observation et l'analyse Ă posteriori des Ă©vĂ©nements enregistrĂ©s par des spĂ©cialistes (psychologues, sociologues, mĂ©decins, etc.). Cependant, de tels mĂ©canismes sont coĂ»teux en termes de temps de traitement, ils nĂ©cessitent un niveau Ă©levĂ© d'attention pour analyser simultanĂ©ment plusieurs descripteurs, ils sont dĂ©pendants de l'opĂ©rateur (subjectivitĂ© de l'analyse) et ne peuvent viser qu'une facette de l'interaction. Pour faire face aux problĂšmes susmentionnĂ©s, il peut donc s'avĂ©rer utile d'automatiser le processus d'analyse de l'interaction sociale. Il s'agit donc de combler le fossĂ© entre les processus d'analyse des interactions sociales basĂ©s sur l'homme et ceux basĂ©s sur la machine. Nous proposons donc une approche holistique qui intĂšgre des signaux hĂ©tĂ©rogĂšnes multimodaux et des informations contextuelles (donnĂ©es "exogĂšnes" complĂ©mentaires) de maniĂšre dynamique et optionnelle en fonction de leur disponibilitĂ© ou non. Une telle approche permet l'analyse de plusieurs "signaux" en parallĂšle (oĂč les humains ne peuvent se concentrer que sur un seul). Cette analyse peut ĂȘtre encore enrichie Ă partir de donnĂ©es liĂ©es au contexte de la scĂšne (lieu, date, type de musique, description de l'Ă©vĂ©nement, etc.) ou liĂ©es aux individus (nom, Ăąge, sexe, donnĂ©es extraites de leurs rĂ©seaux sociaux, etc.) Les informations contextuelles enrichissent la modĂ©lisation des mĂ©tadonnĂ©es extraites et leur donnent une dimension plus "sĂ©mantique". La gestion de cette hĂ©tĂ©rogĂ©nĂ©itĂ© est une Ă©tape essentielle pour la mise en Ćuvre d'une approche holistique. L'automatisation de la capture et de l'observation " in vivo " sans scĂ©narios prĂ©dĂ©finis lĂšve des verrous liĂ©s Ă i) la protection de la vie privĂ©e et Ă la sĂ©curitĂ© ; ii) l'hĂ©tĂ©rogĂ©nĂ©itĂ© des donnĂ©es ; et iii) leur volume. Par consĂ©quent, dans le cadre de l'approche holistique, nous proposons (1) un modĂšle de donnĂ©es complet prĂ©servant la vie privĂ©e qui garantit le dĂ©couplage entre les mĂ©thodes d'extraction des mĂ©tadonnĂ©es et d'analyse des interactions sociales ; (2) une mĂ©thode gĂ©omĂ©trique non intrusive de dĂ©tection par contact visuel ; et (3) un modĂšle profond de classification des repas français pour extraire les informations du contenu vidĂ©o. L'approche proposĂ©e gĂšre des signaux hĂ©tĂ©rogĂšnes provenant de diffĂ©rentes modalitĂ©s en tant que sources multicouches (signaux visuels, signaux vocaux, informations contextuelles) Ă diffĂ©rentes Ă©chelles de temps et diffĂ©rentes combinaisons entre les couches (reprĂ©sentation des signaux sous forme de sĂ©ries temporelles). L'approche a Ă©tĂ© conçue pour fonctionner sans dispositifs intrusifs, afin d'assurer la capture de comportements rĂ©els et de rĂ©aliser l'observation naturaliste. Nous avons dĂ©ployĂ© l'approche proposĂ©e sur la plateforme OVALIE qui vise Ă Ă©tudier les comportements alimentaires dans diffĂ©rents contextes de la vie rĂ©elle et qui est situĂ©e Ă l'UniversitĂ© Toulouse-Jean JaurĂšs, en France.Social interaction refers to any interaction between two or more individuals, in which information sharing is carried out without any mediating technology. This interaction is a significant part of individual socialization and experience gaining throughout one's lifetime. It is interesting for different disciplines (sociology, psychology, medicine, etc.). In the context of testing and observational studies, multiple mechanisms are used to study these interactions such as questionnaires, direct observation and analysis of events by human operators, or a posteriori observation and analysis of recorded events by specialists (psychologists, sociologists, doctors, etc.). However, such mechanisms are expensive in terms of processing time. They require a high level of attention to analyzing several cues simultaneously. They are dependent on the operator (subjectivity of the analysis) and can only target one side of the interaction. In order to face the aforementioned issues, the need to automatize the social interaction analysis process is highlighted. So, it is a question of bridging the gap between human-based and machine-based social interaction analysis processes. Therefore, we propose a holistic approach that integrates multimodal heterogeneous cues and contextual information (complementary "exogenous" data) dynamically and optionally according to their availability or not. Such an approach allows the analysis of multi "signals" in parallel (where humans are able only to focus on one). This analysis can be further enriched from data related to the context of the scene (location, date, type of music, event description, etc.) or related to individuals (name, age, gender, data extracted from their social networks, etc.). The contextual information enriches the modeling of extracted metadata and gives them a more "semantic" dimension. Managing this heterogeneity is an essential step for implementing a holistic approach. The automation of " in vivo " capturing and observation using non-intrusive devices without predefined scenarios introduces various issues that are related to data (i) privacy and security; (ii) heterogeneity; and (iii) volume. Hence, within the holistic approach we propose (1) a privacy-preserving comprehensive data model that grants decoupling between metadata extraction and social interaction analysis methods; (2) geometric non-intrusive eye contact detection method; and (3) French food classification deep model to extract information from the video content. The proposed approach manages heterogeneous cues coming from different modalities as multi-layer sources (visual signals, voice signals, contextual information) at different time scales and different combinations between layers (representation of the cues like time series). The approach has been designed to operate without intrusive devices, in order to ensure the capture of real behaviors and achieve the naturalistic observation. We have deployed the proposed approach on OVALIE platform which aims to study eating behaviors in different real-life contexts and it is located in University Toulouse-Jean JaurĂšs, France
- âŠ