9,176 research outputs found
PANEL: Challenges for multimedia/multimodal research in the next decade
The multimedia and multimodal community is witnessing an
explosive transformation in the recent years with major
societal impact. With the unprecedented deployment of
multimedia devices and systems, multimedia research is
critical to our abilities and prospects in advancing state-of-theart technologies and solving real-world challenges facing the
society and the nation. To respond to these challenges and
further advance the frontiers of the field of multimedia, this
panel will discuss the challenges and visions that may guide
future research in the next ten years
Transportation mode recognition fusing wearable motion, sound and vision sensors
We present the first work that investigates the potential of improving the performance of transportation mode recognition through fusing multimodal data from wearable sensors: motion, sound and vision. We first train three independent deep neural network (DNN) classifiers, which work with the three types of sensors, respectively. We then propose two schemes that fuse the classification results from the three mono-modal classifiers. The first scheme makes an ensemble decision with fixed rules including Sum, Product, Majority Voting, and Borda Count. The second scheme is an adaptive fuser built as another classifier (including Naive Bayes, Decision Tree, Random Forest and Neural Network) that learns enhanced predictions by combining the outputs from the three mono-modal classifiers. We verify the advantage of the proposed method with the state-of-the-art Sussex-Huawei Locomotion and Transportation (SHL) dataset recognizing the eight transportation activities: Still, Walk, Run, Bike, Bus, Car, Train and Subway. We achieve F1 scores of 79.4%, 82.1% and 72.8% with the mono-modal motion, sound and vision classifiers, respectively. The F1 score is remarkably improved to 94.5% and 95.5% by the two data fusion schemes, respectively. The recognition performance can be further improved with a post-processing scheme that exploits the temporal continuity of transportation. When assessing generalization of the model to unseen data, we show that while performance is reduced - as expected - for each individual classifier, the benefits of fusion are retained with performance improved by 15 percentage points. Besides the actual performance increase, this work, most importantly, opens up the possibility for dynamically fusing modalities to achieve distinct power-performance trade-off at run time
Services surround you:physical-virtual linkage with contextual bookmarks
Our daily life is pervaded by digital information and devices, not least the common mobile phone. However, a seamless connection between our physical world, such as a movie trailer on a screen in the main rail station and its digital counterparts, such as an online ticket service, remains difficult. In this paper, we present contextual bookmarks that enable users to capture information of interest with a mobile camera phone. Depending on the user’s context, the snapshot is mapped to a digital service such as ordering tickets for a movie theater close by or a link to the upcoming movie’s Web page
Characteristics of pervasive learning environments in museum contexts
There is no appropriate learning model for pervasive learning environments (PLEs), and museums maintain authenticity at the cost of unmarked information. To address these problems, we present the LieksaMyst PLE developed for Pielinen Museum and we derive a set of characteristics that an effective PLE should meet and which form the basis of a new learning model currently under development. We discuss how the characteristics are addressed in LieksaMyst and present an evaluation of the game component of LieksaMyst. Results indicate that, while some usability issues remain to be resolved, the game was received well by the participants enabling them to immerse themselves in the story and to interact effectively with its virtual characters
Multimodal Generic Framework for Multimedia Documents Adaptation
Today, people are increasingly capable of creating and sharing documents (which generally are multimedia oriented) via the internet. These multimedia documents can be accessed at anytime and anywhere (city, home, etc.) on a wide variety of devices, such as laptops, tablets and smartphones. The heterogeneity of devices and user preferences has raised a serious issue for multimedia contents adaptation. Our research focuses on multimedia documents adaptation with a strong focus on interaction with users and exploration of multimodality. We propose a multimodal framework for adapting multimedia documents based on a distributed implementation of W3C’s Multimodal Architecture and Interfaces applied to ubiquitous computing. The core of our proposed architecture is the presence of a smart interaction manager that accepts context related information from sensors in the environment as well as from other sources, including information available on the web and multimodal user inputs. The interaction manager integrates and reasons over this information to predict the user’s situation and service use. A key to realizing this framework is the use of an ontology that undergirds the communication and representation, and the use of the cloud to insure the service continuity on heterogeneous mobile devices. Smart city is assumed as the reference scenario
- …