66 research outputs found
PICOZOOM: A context sensitive multimodal zooming interface
This paper introduces a novel zooming interface deploying a pico projector that, instead of a second visual display, leverages audioscapes for contextual information. The technique enhances current flashlight metaphor approaches, supporting flexible usage within the domain of spatial augmented reality to focus on object or environment-related details. Within a user study we focused on quantifying the projection limitations related to depiction of details through the pico projector and validated the interaction approach. The quantified results of the study correlate pixel density, detail and proximity, which can greatly aid to design more effective, legible zooming interfaces for pico projectors - the study can form an example testbed that can be applied well for testing aberrations with other projectors. Furthermore, users rated the zooming technique using audioscapes well, showing the validity of the approach. The studies form the foundation for extending our work by detailing out the audio-visual approach and looking more closely in the role of real-world features on interpreting projected content
Challenges in Transcribing Multimodal Data: A Case Study
open2siComputer-mediated communication (CMC) once meant principally text-based communication mediated by computers, but rapid technological advances in recent years have heralded an era of multimodal communication with a growing emphasis on audio and video synchronous interaction. As CMC, in all its variants (text chats, video chats, forums, blogs, SMS, etc.), has become normalized practice in personal and professional lives, educational initiatives, particularly language teaching and learning, are following suit. For researchers interested in exploring learner interactions in complex technology-supported learning environments, new challenges inevitably emerge. This article looks at the challenges of transcribing and representing multimodal data (visual, oral, and textual) when engaging in computer-assisted language learning research. When transcribing and representing such data, the choices made depend very much on the specific research questions addressed, hence in this paper we explore these challenges through discussion of a specific case study where the researchers were seeking to explore the emergence of identity through interaction in an online, multimodal situated space. Given the limited amount of literature addressing the transcription of online multimodal communication, it is felt that this article is a timely contribution to researchers interested in exploring interaction in CMC language and intercultural learning environments.Cited 10 times as of November 2020 including the prestigious
Language Learning Sans Frontiers: A Translanguaging View
L Wei, WYJ Ho - Annual Review of Applied Linguistics, 2018 - cambridge.org
In this article, we present an analytical approach that focuses on how transnational and
translingual learners mobilize their multilingual, multimodal, and multisemiotic repertoires,
as well as their learning and work experiences, as resources in language learning. The …
Cited by 23 Related articles All 11 versionsopenFrancesca, Helm; Melinda DoolyHelm, Francesca; Melinda, Dool
Multimodal Contrastive Learning with Hard Negative Sampling for Human Activity Recognition
Human Activity Recognition (HAR) systems have been extensively studied by the
vision and ubiquitous computing communities due to their practical applications
in daily life, such as smart homes, surveillance, and health monitoring.
Typically, this process is supervised in nature and the development of such
systems requires access to large quantities of annotated data.
However, the higher costs and challenges associated with obtaining good
quality annotations have rendered the application of self-supervised methods an
attractive option and contrastive learning comprises one such method.
However, a major component of successful contrastive learning is the
selection of good positive and negative samples.
Although positive samples are directly obtainable, sampling good negative
samples remain a challenge.
As human activities can be recorded by several modalities like camera and IMU
sensors, we propose a hard negative sampling method for multimodal HAR with a
hard negative sampling loss for skeleton and IMU data pairs.
We exploit hard negatives that have different labels from the anchor but are
projected nearby in the latent space using an adjustable concentration
parameter.
Through extensive experiments on two benchmark datasets: UTD-MHAD and MMAct,
we demonstrate the robustness of our approach forlearning strong feature
representation for HAR tasks, and on the limited data setting.
We further show that our model outperforms all other state-of-the-art methods
for UTD-MHAD dataset, and self-supervised methods for MMAct: Cross session,
even when uni-modal data are used during downstream activity recognition
WEAR: A Multimodal Dataset for Wearable and Egocentric Video Activity Recognition
Though research has shown the complementarity of camera- and inertial-based
data, datasets which offer both modalities remain scarce. In this paper we
introduce WEAR, a multimodal benchmark dataset for both vision- and
wearable-based Human Activity Recognition (HAR). The dataset comprises data
from 18 participants performing a total of 18 different workout activities with
untrimmed inertial (acceleration) and camera (egocentric video) data recorded
at 10 different outside locations. WEAR features a diverse set of activities
which are low in inter-class similarity and, unlike previous egocentric
datasets, not defined by human-object-interactions nor originate from
inherently distinct activity categories. Provided benchmark results reveal that
single-modality architectures have different strengths and weaknesses in their
prediction performance. Further, in light of the recent success of
transformer-based video action detection models, we demonstrate their
versatility by applying them in a plain fashion using vision, inertial and
combined (vision + inertial) features as input. Results show that vision
transformers are not only able to produce competitive results using only
inertial data, but also can function as an architecture to fuse both modalities
by means of simple concatenation, with the multimodal approach being able to
produce the highest average mAP, precision and close-to-best F1-scores. Up
until now, vision-based transformers have neither been explored in inertial nor
in multimodal human activity recognition, making our approach the first to do
so. The dataset and code to reproduce experiments is publicly available via:
mariusbock.github.io/wearComment: 12 pages, 2 figures, 2 table
Confirmation Report: Modelling Interlocutor Confusion in Situated Human Robot Interaction
Human-Robot Interaction (HRI) is an important but challenging field focused on improving the interaction between humans and robots such to make the interaction more intelligent and effective. However, building a natural conversational HRI is an interdisciplinary challenge for scholars, engineers, and designers. It is generally assumed that the pinnacle of human- robot interaction will be having fluid naturalistic conversational interaction that in important ways mimics that of how humans interact with each other. This of course is challenging at a number of levels, and in particular there are considerable difficulties when it comes to naturally monitoring and responding to the user’s mental state. On the topic of mental states, one field that has received little attention to date is moni- toring the user for possible confusion states. Confusion is a non-trivial mental state which can be seen as having at least two substates. There two confusion states can be thought of as being associated with either negative or positive emotions. In the former, when people are productively confused, they have a passion to solve any current difficulties. Meanwhile, people who are in unproductive confusion may lose their engagement and motivation to overcome those difficulties, which in turn may even lead them to drop the current conversation. While there has been some research on confusion monitoring and detection, it has been limited with the most focused on evaluating confusion states in online learning tasks. The central hypothesis of this research is that the monitoring and detection of confusion states in users is essential to fluid task-centric HRI and that it should be possible to detect such confusion and adjust policies to mitigate the confusion in users. In this report, I expand on this hypothesis and set out several research questions. I also provide a comprehensive literature review before outlining work done to date towards my research hypothesis, I also set out plans for future experimental work
Challenges in transcribing multimodal data: A case study
Computer-mediated communication (CMC) once meant principally text-based communication mediated by computers, but rapid technological advances in recent years have heralded an era of multimodal communication with a growing emphasis on audio and video synchronous interaction. As CMC, in all its variants (text chats, video chats, forums, blogs, SMS, etc.), has become normalized practice in personal and professional lives, educational initiatives, particularly language teaching and learning, are following suit. For researchers interested in exploring learner interactions in complex technology-supported learning environments, new challenges inevitably emerge. This article looks at the challenges of transcribing and representing multimodal data (visual, oral, and textual) when engaging in computer-assisted language learning research. When transcribing and representing such data, the choices made depend very much on the specific research questions addressed, hence in this paper we explore these challenges through discussion of a specific case study where the researchers were seeking to explore the emergence of identity through interaction in an online, multimodal situated space. Given the limited amount of literature addressing the transcription of online multimodal communication, it is felt that this article is a timely contribution to researchers interested in exploring interaction in CMC language and intercultural learning environments
Interactive Virtual Directory for Shopping Mall (Suria KLCC)
As Internet-related technology advances rapidly, the number of system presenting
information using VR techniques are also increasing to promote better understanding of
information. The use of static directory nowadays is still very much lacking and not
encouraging as an information provider. This is due its inability provide user adequate
quality information in an interesting and interactive manner. The objective ofthis system
is to help shopping mall visitors to know the direction of where they are and where they
are going by using simple, intuitive, observable and interactive directory system. With
the combination of VR technology and Interactive Directory, an Interactive Virtual
Directory for Shopping Mall that provided with adequate information been developed.
To form the basis of the system development, a pre-survey questionnaire was conducted
to find out customers opinion on static directories. The result of the survey showed that
70% or 35 out of 50 respondents know and understand the VR technology.The results of
the analysis provide motivations for the development of the interactive virtual directory
system The development of the system is based on the approach proposed by Kulwinder
Kaur's design framework which will analyze the requirement and project scope, task and
domain of the project, the designation of the environment, designation of user support
and navigational tools and also evaluation by determine the prototype and iterative
process. The results of an evaluation on the system shows that by having experience on
both static and virtual map help user precisely understand the system. However if the
mouse click application could be replaced with the touch screen application, it help user
to navigate easily. In conclusion, a directory with additional functionalities could be an
informative and more usable director
Toward an intelligent multimodal interface for natural interaction
Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2010.Cataloged from PDF version of thesis.Includes bibliographical references (p. 73-76).Advances in technology are enabling novel approaches to human-computer interaction (HCI) in a wide variety of devices and settings (e.g., the Microsoft® Surface, the Nintendo® Wii, iPhone®, etc.). While many of these devices have been commercially successful, the use of multimodal interaction technology is still not well understood from a more principled system design or cognitive science perspective. The long-term goal of our research is to build an intelligent multimodal interface for natural interaction that can serve as a testbed for enabling the formulation of a more principled system design framework for multimodal HCI. This thesis focuses on the gesture input modality. Using a new hand tracking technology capable of tracking 3D hand postures in real-time, we developed a recognition system for continuous natural gestures. By nature gestures, we mean the ones encountered in spontaneous interaction, rather than a set of artificial gestures designed for the convenience of recognition. To date we have achieved 96% accuracy on isolated gesture recognition, and 74% correct rate on continuous gesture recognition with data from different users and twelve gesture classes. We are able to connect the gesture recognition system with Google Earth, enabling gestural control of a 3D map. In particular, users can do 3D tilting of the map using non touch-based gesture which is more intuitive than touch-based ones. We also did an exploratory user study to observe natural behavior under a urban search and rescue scenario with a large tabletop display. The qualitative results from the study provides us with good starting points for understanding how users naturally gesture, and how to integrate different modalities. This thesis has set the stage for further development towards our long-term goal.by Ying Yin.S.M
Interactivity Improves Usability of Geographic Maps for Visually Impaired People
International audienceTactile relief maps are used by visually impaired people to acquire mental representation of space, but they retain important limitations (limited amount of information, braille text, etc.). Interactive maps may overcome these limitations. However, usability of these two types of maps had never been compared. It is then unknown whether interactive maps are equivalent or even better solutions than traditional raised-line maps. This study presents a comparison of usability of a classical raised-line map vs. an interactive map composed by a multi-touch screen, a raised-line overlay and audio output. Both maps were tested by 24 blind participants. We measured usability as efficiency, effectiveness and satisfaction. Our results show that replacing braille with simple audio-tactile interaction significantly improved efficiency and user satisfaction. Effectiveness was not related to the map type but depended on users' characteristics as well as the category of assessed spatial knowledge. Long-term evaluation of acquired spatial information revealed that maps, whether interactive or not, are useful to build robust survey-type mental representations in blind users. Altogether, these results are encouraging as they show that interactive maps are a good solution for improving map exploration and cognitive mapping in visually impaired people
- …