66 research outputs found

    PICOZOOM: A context sensitive multimodal zooming interface

    Get PDF
    This paper introduces a novel zooming interface deploying a pico projector that, instead of a second visual display, leverages audioscapes for contextual information. The technique enhances current flashlight metaphor approaches, supporting flexible usage within the domain of spatial augmented reality to focus on object or environment-related details. Within a user study we focused on quantifying the projection limitations related to depiction of details through the pico projector and validated the interaction approach. The quantified results of the study correlate pixel density, detail and proximity, which can greatly aid to design more effective, legible zooming interfaces for pico projectors - the study can form an example testbed that can be applied well for testing aberrations with other projectors. Furthermore, users rated the zooming technique using audioscapes well, showing the validity of the approach. The studies form the foundation for extending our work by detailing out the audio-visual approach and looking more closely in the role of real-world features on interpreting projected content

    Challenges in Transcribing Multimodal Data: A Case Study

    Get PDF
    open2siComputer-mediated communication (CMC) once meant principally text-based communication mediated by computers, but rapid technological advances in recent years have heralded an era of multimodal communication with a growing emphasis on audio and video synchronous interaction. As CMC, in all its variants (text chats, video chats, forums, blogs, SMS, etc.), has become normalized practice in personal and professional lives, educational initiatives, particularly language teaching and learning, are following suit. For researchers interested in exploring learner interactions in complex technology-supported learning environments, new challenges inevitably emerge. This article looks at the challenges of transcribing and representing multimodal data (visual, oral, and textual) when engaging in computer-assisted language learning research. When transcribing and representing such data, the choices made depend very much on the specific research questions addressed, hence in this paper we explore these challenges through discussion of a specific case study where the researchers were seeking to explore the emergence of identity through interaction in an online, multimodal situated space. Given the limited amount of literature addressing the transcription of online multimodal communication, it is felt that this article is a timely contribution to researchers interested in exploring interaction in CMC language and intercultural learning environments.Cited 10 times as of November 2020 including the prestigious Language Learning Sans Frontiers: A Translanguaging View L Wei, WYJ Ho - Annual Review of Applied Linguistics, 2018 - cambridge.org In this article, we present an analytical approach that focuses on how transnational and translingual learners mobilize their multilingual, multimodal, and multisemiotic repertoires, as well as their learning and work experiences, as resources in language learning. The … Cited by 23 Related articles All 11 versionsopenFrancesca, Helm; Melinda DoolyHelm, Francesca; Melinda, Dool

    Multimodal Contrastive Learning with Hard Negative Sampling for Human Activity Recognition

    Full text link
    Human Activity Recognition (HAR) systems have been extensively studied by the vision and ubiquitous computing communities due to their practical applications in daily life, such as smart homes, surveillance, and health monitoring. Typically, this process is supervised in nature and the development of such systems requires access to large quantities of annotated data. However, the higher costs and challenges associated with obtaining good quality annotations have rendered the application of self-supervised methods an attractive option and contrastive learning comprises one such method. However, a major component of successful contrastive learning is the selection of good positive and negative samples. Although positive samples are directly obtainable, sampling good negative samples remain a challenge. As human activities can be recorded by several modalities like camera and IMU sensors, we propose a hard negative sampling method for multimodal HAR with a hard negative sampling loss for skeleton and IMU data pairs. We exploit hard negatives that have different labels from the anchor but are projected nearby in the latent space using an adjustable concentration parameter. Through extensive experiments on two benchmark datasets: UTD-MHAD and MMAct, we demonstrate the robustness of our approach forlearning strong feature representation for HAR tasks, and on the limited data setting. We further show that our model outperforms all other state-of-the-art methods for UTD-MHAD dataset, and self-supervised methods for MMAct: Cross session, even when uni-modal data are used during downstream activity recognition

    WEAR: A Multimodal Dataset for Wearable and Egocentric Video Activity Recognition

    Full text link
    Though research has shown the complementarity of camera- and inertial-based data, datasets which offer both modalities remain scarce. In this paper we introduce WEAR, a multimodal benchmark dataset for both vision- and wearable-based Human Activity Recognition (HAR). The dataset comprises data from 18 participants performing a total of 18 different workout activities with untrimmed inertial (acceleration) and camera (egocentric video) data recorded at 10 different outside locations. WEAR features a diverse set of activities which are low in inter-class similarity and, unlike previous egocentric datasets, not defined by human-object-interactions nor originate from inherently distinct activity categories. Provided benchmark results reveal that single-modality architectures have different strengths and weaknesses in their prediction performance. Further, in light of the recent success of transformer-based video action detection models, we demonstrate their versatility by applying them in a plain fashion using vision, inertial and combined (vision + inertial) features as input. Results show that vision transformers are not only able to produce competitive results using only inertial data, but also can function as an architecture to fuse both modalities by means of simple concatenation, with the multimodal approach being able to produce the highest average mAP, precision and close-to-best F1-scores. Up until now, vision-based transformers have neither been explored in inertial nor in multimodal human activity recognition, making our approach the first to do so. The dataset and code to reproduce experiments is publicly available via: mariusbock.github.io/wearComment: 12 pages, 2 figures, 2 table

    Confirmation Report: Modelling Interlocutor Confusion in Situated Human Robot Interaction

    Get PDF
    Human-Robot Interaction (HRI) is an important but challenging field focused on improving the interaction between humans and robots such to make the interaction more intelligent and effective. However, building a natural conversational HRI is an interdisciplinary challenge for scholars, engineers, and designers. It is generally assumed that the pinnacle of human- robot interaction will be having fluid naturalistic conversational interaction that in important ways mimics that of how humans interact with each other. This of course is challenging at a number of levels, and in particular there are considerable difficulties when it comes to naturally monitoring and responding to the user’s mental state. On the topic of mental states, one field that has received little attention to date is moni- toring the user for possible confusion states. Confusion is a non-trivial mental state which can be seen as having at least two substates. There two confusion states can be thought of as being associated with either negative or positive emotions. In the former, when people are productively confused, they have a passion to solve any current difficulties. Meanwhile, people who are in unproductive confusion may lose their engagement and motivation to overcome those difficulties, which in turn may even lead them to drop the current conversation. While there has been some research on confusion monitoring and detection, it has been limited with the most focused on evaluating confusion states in online learning tasks. The central hypothesis of this research is that the monitoring and detection of confusion states in users is essential to fluid task-centric HRI and that it should be possible to detect such confusion and adjust policies to mitigate the confusion in users. In this report, I expand on this hypothesis and set out several research questions. I also provide a comprehensive literature review before outlining work done to date towards my research hypothesis, I also set out plans for future experimental work

    Challenges in transcribing multimodal data: A case study

    Get PDF
    Computer-mediated communication (CMC) once meant principally text-based communication mediated by computers, but rapid technological advances in recent years have heralded an era of multimodal communication with a growing emphasis on audio and video synchronous interaction. As CMC, in all its variants (text chats, video chats, forums, blogs, SMS, etc.), has become normalized practice in personal and professional lives, educational initiatives, particularly language teaching and learning, are following suit. For researchers interested in exploring learner interactions in complex technology-supported learning environments, new challenges inevitably emerge. This article looks at the challenges of transcribing and representing multimodal data (visual, oral, and textual) when engaging in computer-assisted language learning research. When transcribing and representing such data, the choices made depend very much on the specific research questions addressed, hence in this paper we explore these challenges through discussion of a specific case study where the researchers were seeking to explore the emergence of identity through interaction in an online, multimodal situated space. Given the limited amount of literature addressing the transcription of online multimodal communication, it is felt that this article is a timely contribution to researchers interested in exploring interaction in CMC language and intercultural learning environments

    Interactive Virtual Directory for Shopping Mall (Suria KLCC)

    Get PDF
    As Internet-related technology advances rapidly, the number of system presenting information using VR techniques are also increasing to promote better understanding of information. The use of static directory nowadays is still very much lacking and not encouraging as an information provider. This is due its inability provide user adequate quality information in an interesting and interactive manner. The objective ofthis system is to help shopping mall visitors to know the direction of where they are and where they are going by using simple, intuitive, observable and interactive directory system. With the combination of VR technology and Interactive Directory, an Interactive Virtual Directory for Shopping Mall that provided with adequate information been developed. To form the basis of the system development, a pre-survey questionnaire was conducted to find out customers opinion on static directories. The result of the survey showed that 70% or 35 out of 50 respondents know and understand the VR technology.The results of the analysis provide motivations for the development of the interactive virtual directory system The development of the system is based on the approach proposed by Kulwinder Kaur's design framework which will analyze the requirement and project scope, task and domain of the project, the designation of the environment, designation of user support and navigational tools and also evaluation by determine the prototype and iterative process. The results of an evaluation on the system shows that by having experience on both static and virtual map help user precisely understand the system. However if the mouse click application could be replaced with the touch screen application, it help user to navigate easily. In conclusion, a directory with additional functionalities could be an informative and more usable director

    Toward an intelligent multimodal interface for natural interaction

    Get PDF
    Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2010.Cataloged from PDF version of thesis.Includes bibliographical references (p. 73-76).Advances in technology are enabling novel approaches to human-computer interaction (HCI) in a wide variety of devices and settings (e.g., the Microsoft® Surface, the Nintendo® Wii, iPhone®, etc.). While many of these devices have been commercially successful, the use of multimodal interaction technology is still not well understood from a more principled system design or cognitive science perspective. The long-term goal of our research is to build an intelligent multimodal interface for natural interaction that can serve as a testbed for enabling the formulation of a more principled system design framework for multimodal HCI. This thesis focuses on the gesture input modality. Using a new hand tracking technology capable of tracking 3D hand postures in real-time, we developed a recognition system for continuous natural gestures. By nature gestures, we mean the ones encountered in spontaneous interaction, rather than a set of artificial gestures designed for the convenience of recognition. To date we have achieved 96% accuracy on isolated gesture recognition, and 74% correct rate on continuous gesture recognition with data from different users and twelve gesture classes. We are able to connect the gesture recognition system with Google Earth, enabling gestural control of a 3D map. In particular, users can do 3D tilting of the map using non touch-based gesture which is more intuitive than touch-based ones. We also did an exploratory user study to observe natural behavior under a urban search and rescue scenario with a large tabletop display. The qualitative results from the study provides us with good starting points for understanding how users naturally gesture, and how to integrate different modalities. This thesis has set the stage for further development towards our long-term goal.by Ying Yin.S.M

    Interactivity Improves Usability of Geographic Maps for Visually Impaired People

    Get PDF
    International audienceTactile relief maps are used by visually impaired people to acquire mental representation of space, but they retain important limitations (limited amount of information, braille text, etc.). Interactive maps may overcome these limitations. However, usability of these two types of maps had never been compared. It is then unknown whether interactive maps are equivalent or even better solutions than traditional raised-line maps. This study presents a comparison of usability of a classical raised-line map vs. an interactive map composed by a multi-touch screen, a raised-line overlay and audio output. Both maps were tested by 24 blind participants. We measured usability as efficiency, effectiveness and satisfaction. Our results show that replacing braille with simple audio-tactile interaction significantly improved efficiency and user satisfaction. Effectiveness was not related to the map type but depended on users' characteristics as well as the category of assessed spatial knowledge. Long-term evaluation of acquired spatial information revealed that maps, whether interactive or not, are useful to build robust survey-type mental representations in blind users. Altogether, these results are encouraging as they show that interactive maps are a good solution for improving map exploration and cognitive mapping in visually impaired people
    corecore