143 research outputs found

    Mixed reality participants in smart meeting rooms and smart home enviroments

    Get PDF
    Human–computer interaction requires modeling of the user. A user profile typically contains preferences, interests, characteristics, and interaction behavior. However, in its multimodal interaction with a smart environment the user displays characteristics that show how the user, not necessarily consciously, verbally and nonverbally provides the smart environment with useful input and feedback. Especially in ambient intelligence environments we encounter situations where the environment supports interaction between the environment, smart objects (e.g., mobile robots, smart furniture) and human participants in the environment. Therefore it is useful for the profile to contain a physical representation of the user obtained by multi-modal capturing techniques. We discuss the modeling and simulation of interacting participants in a virtual meeting room, we discuss how remote meeting participants can take part in meeting activities and they have some observations on translating research results to smart home environments

    Presenting in Virtual Worlds: Towards an Architecture for a 3D Presenter explaining 2D-Presented Information

    Get PDF
    Entertainment, education and training are changing because of multi-party interaction technology. In the past we have seen the introduction of embodied agents and robots that take the role of a museum guide, a news presenter, a teacher, a receptionist, or someone who is trying to sell you insurances, houses or tickets. In all these cases the embodied agent needs to explain and describe. In this paper we contribute the design of a 3D virtual presenter that uses different output channels to present and explain. Speech and animation (posture, pointing and involuntary movements) are among these channels. The behavior is scripted and synchronized with the display of a 2D presentation with associated text and regions that can be pointed at (sheets, drawings, and paintings). In this paper the emphasis is on the interaction between 3D presenter and the 2D presentation

    Realistic multi-microphone data simulation for distant speech recognition

    Full text link
    The availability of realistic simulated corpora is of key importance for the future progress of distant speech recognition technology. The reliability, flexibility and low computational cost of a data simulation process may ultimately allow researchers to train, tune and test different techniques in a variety of acoustic scenarios, avoiding the laborious effort of directly recording real data from the targeted environment. In the last decade, several simulated corpora have been released to the research community, including the data-sets distributed in the context of projects and international challenges, such as CHiME and REVERB. These efforts were extremely useful to derive baselines and common evaluation frameworks for comparison purposes. At the same time, in many cases they highlighted the need of a better coherence between real and simulated conditions. In this paper, we examine this issue and we describe our approach to the generation of realistic corpora in a domestic context. Experimental validation, conducted in a multi-microphone scenario, shows that a comparable performance trend can be observed with both real and simulated data across different recognition frameworks, acoustic models, as well as multi-microphone processing techniques.Comment: Proc. of Interspeech 201

    Evaluation campaigns and TRECVid

    Get PDF
    The TREC Video Retrieval Evaluation (TRECVid) is an international benchmarking activity to encourage research in video information retrieval by providing a large test collection, uniform scoring procedures, and a forum for organizations interested in comparing their results. TRECVid completed its fifth annual cycle at the end of 2005 and in 2006 TRECVid will involve almost 70 research organizations, universities and other consortia. Throughout its existence, TRECVid has benchmarked both interactive and automatic/manual searching for shots from within a video corpus, automatic detection of a variety of semantic and low-level video features, shot boundary detection and the detection of story boundaries in broadcast TV news. This paper will give an introduction to information retrieval (IR) evaluation from both a user and a system perspective, highlighting that system evaluation is by far the most prevalent type of evaluation carried out. We also include a summary of TRECVid as an example of a system evaluation benchmarking campaign and this allows us to discuss whether such campaigns are a good thing or a bad thing. There are arguments for and against these campaigns and we present some of them in the paper concluding that on balance they have had a very positive impact on research progress
    corecore