69,071 research outputs found

    Pervasive and standalone computing: The perceptual effects of variable multimedia quality.

    Get PDF
    The introduction of multimedia on pervasive and mobile communication devices raises a number of perceptual quality issues, however, limited work has been done examining the 3-way interaction between use of equipment, quality of perception and quality of service. Our work measures levels of informational transfer (objective) and user satisfaction (subjective)when users are presented with multimedia video clips at three different frame rates, using four different display devices, simulating variation in participant mobility. Our results will show that variation in frame-rate does not impact a user’s level of information assimilation, however, does impact a users’ perception of multimedia video ‘quality’. Additionally, increased visual immersion can be used to increase transfer of video information, but can negatively affect the users’ perception of ‘quality’. Finally, we illustrate the significant affect of clip-content on the transfer of video, audio and textual information, placing into doubt the use of purely objective quality definitions when considering multimedia presentations

    Oral computer-mediated interaction between l2 learners: it’s about time!

    Get PDF
    This study explores task-based, synchronous oral computer-mediated communication (CMC) among intermediate-level learners of Spanish. In particular, this paper examines (a) how learners in video and audio CMC groups negotiate for meaning during task-based interaction, (b) possible differences between both oral CMC modes and traditional face-to-face (FTF) communication, and (c) how this oral computer mediated negotiation compares to that found in the text-based CMC literature. Fifteen learner-to-learner dyads were randomly assigned to an audio group, a video group, and a FTF control group to complete a jigsaw task that was seeded with 16 unknown lexical items. Experimental groups used Skype, free online communication software, to carry out the task. The transcripts of the conversations reveal that oral CMC groups do indeed negotiate for meaning in this multimedia context when non-understanding occurs between speakers. In addition, results showed differences in the way audio and video groups carry out these negotiations, which were mainly due to the lack of visual contact in the audio group. No differences were found between video and FTF groups. Furthermore, oral CMC turn-taking patterns were shown to be very similar to FTF patterns but opposite to those found in written synchronous CMC. Oral CMC interaction patterns are shown to be more versatile

    CHORUS Deliverable 4.3: Report from CHORUS workshops on national initiatives and metadata

    Get PDF
    Minutes of the following Workshops: • National Initiatives on Multimedia Content Description and Retrieval, Geneva, October 10th, 2007. • Metadata in Audio-Visual/Multimedia production and archiving, Munich, IRT, 21st – 22nd November 2007 Workshop in Geneva 10/10/2007 This highly successful workshop was organised in cooperation with the European Commission. The event brought together the technical, administrative and financial representatives of the various national initiatives, which have been established recently in some European countries to support research and technical development in the area of audio-visual content processing, indexing and searching for the next generation Internet using semantic technologies, and which may lead to an internet-based knowledge infrastructure. The objective of this workshop was to provide a platform for mutual information and exchange between these initiatives, the European Commission and the participants. Top speakers were present from each of the national initiatives. There was time for discussions with the audience and amongst the European National Initiatives. The challenges, communalities, difficulties, targeted/expected impact, success criteria, etc. were tackled. This workshop addressed how these national initiatives could work together and benefit from each other. Workshop in Munich 11/21-22/2007 Numerous EU and national research projects are working on the automatic or semi-automatic generation of descriptive and functional metadata derived from analysing audio-visual content. The owners of AV archives and production facilities are eagerly awaiting such methods which would help them to better exploit their assets.Hand in hand with the digitization of analogue archives and the archiving of digital AV material, metadatashould be generated on an as high semantic level as possible, preferably fully automatically. All users of metadata rely on a certain metadata model. All AV/multimedia search engines, developed or under current development, would have to respect some compatibility or compliance with the metadata models in use. The purpose of this workshop is to draw attention to the specific problem of metadata models in the context of (semi)-automatic multimedia search

    Harnessing AI for Speech Reconstruction using Multi-view Silent Video Feed

    Full text link
    Speechreading or lipreading is the technique of understanding and getting phonetic features from a speaker's visual features such as movement of lips, face, teeth and tongue. It has a wide range of multimedia applications such as in surveillance, Internet telephony, and as an aid to a person with hearing impairments. However, most of the work in speechreading has been limited to text generation from silent videos. Recently, research has started venturing into generating (audio) speech from silent video sequences but there have been no developments thus far in dealing with divergent views and poses of a speaker. Thus although, we have multiple camera feeds for the speech of a user, but we have failed in using these multiple video feeds for dealing with the different poses. To this end, this paper presents the world's first ever multi-view speech reading and reconstruction system. This work encompasses the boundaries of multimedia research by putting forth a model which leverages silent video feeds from multiple cameras recording the same subject to generate intelligent speech for a speaker. Initial results confirm the usefulness of exploiting multiple camera views in building an efficient speech reading and reconstruction system. It further shows the optimal placement of cameras which would lead to the maximum intelligibility of speech. Next, it lays out various innovative applications for the proposed system focusing on its potential prodigious impact in not just security arena but in many other multimedia analytics problems.Comment: 2018 ACM Multimedia Conference (MM '18), October 22--26, 2018, Seoul, Republic of Kore

    A cognitive approach to user perception of multimedia quality: An empirical investigation

    Get PDF
    Whilst multimedia technology has been one of the main contributing factors behind the Web's success, delivery of personalized multimedia content has been a desire seldom achieved in practice. Moreover, the perspective adopted is rarely viewed from a cognitive styles standpoint, notwithstanding the fact that they have significant effects on users’ preferences with respect to the presentation of multimedia content. Indeed, research has thus far neglected to examine the effect of cognitive styles on users’ subjective perceptions of multimedia quality. This paper aims to examine the relationships between users’ cognitive styles, the multimedia quality of service delivered by the underlying network, and users’ quality of perception (understood as both enjoyment and informational assimilation) associated with the viewed multimedia content. Results from the empirical study reported here show that all users, regardless of cognitive style, have higher levels of understanding of informational content in multimedia video clips (represented in our study by excerpts from television programmes) with weak dynamism, but that they enjoy moderately dynamic clips most. Additionally, multimedia content was found to significantly influence users’ levels of understanding and enjoyment. Surprisingly, our study highlighted the fact that Bimodal users prefer to draw on visual sources for informational purposes, and that the presence of text in multimedia clips has a detrimental effect on the knowledge acquisition of all three cognitive style groups

    A review of the empirical studies of computer supported human-to-human communication

    Get PDF
    This paper presents a review of the empirical studies of human-to-human communication which have been carried out over the last three decades. Although this review is primarily concerned with the empirical studies of computer supported human-to-human communication, a number of studies dealing with group work in non-computer-based collaborative environments, which form the basis of many of the empirical studies of the recent years in the area of CSCW, are also discussed. The concept of person and task spaces is introduced and then subsequently used to categorise the large volume of studies reported in this review. This paper also gives a comparative analysis of the findings of these studies, and draws a number of general conclusions to guide the design and evaluation of future CSCW systems

    Perceived quality of audio-visual stimuli containing streaming audio degradations

    Get PDF
    Multimedia services play an important role in modern human communication. Understanding the impact of multisensory input (audio and video) on perceived quality is important for optimizing the delivery of these services. This work explores the impact of audio degradations on audio-visual quality. With this goal, we present a new dataset that contains audio-visual sequences with distortions only in the audio component (Im- AV-Exp2). The degradations in this new dataset correspond to commonly encountered streaming degradations, matching those found in the audio-only TCD-VoIP dataset. Using the Immersive Methodology, we perform a subjective experiment with the Im-AV-Exp2 dataset. We analyze the experimental data and compared the quality scores of the Im-AV-Exp2 and TCDVoIP datasets. Results show that the video component act as a masking factor for certain classes of audio degradations (e.g. echo), showing that there is an interaction of video and audio quality that may depend on content

    Sensing and mapping for interactive performance

    Get PDF
    This paper describes a trans-domain mapping (TDM) framework for translating meaningful activities from one creative domain onto another. The multi-disciplinary framework is designed to facilitate an intuitive and non-intrusive interactive multimedia performance interface that offers the users or performers real-time control of multimedia events using their physical movements. It is intended to be a highly dynamic real-time performance tool, sensing and tracking activities and changes, in order to provide interactive multimedia performances. From a straightforward definition of the TDM framework, this paper reports several implementations and multi-disciplinary collaborative projects using the proposed framework, including a motion and colour-sensitive system, a sensor-based system for triggering musical events, and a distributed multimedia server for audio mapping of a real-time face tracker, and discusses different aspects of mapping strategies in their context. Plausible future directions, developments and exploration with the proposed framework, including stage augmenta tion, virtual and augmented reality, which involve sensing and mapping of physical and non-physical changes onto multimedia control events, are discussed
    corecore