4,464 research outputs found

    Gathering a corpus of multimodal computer-mediated meetings with focus on text and audio interaction

    Get PDF
    In this paper we describe the gathering of a corpus of synchronised speech and text interaction over the network. The data collection scenarios characterise audio meetings with a significant textual component. Unlike existing meeting corpora, the corpus described in this paper emphasises temporal relationships between speech and text media streams. This is achieved through detailed logging and time stamping of text editing operations, actions on shared user interface widgets and gesturing, as well as generation of speech activity profiles. A set of tools has been developed specifically for these purposes which can be used as a data collection platform for the development of meeting browsers. The data gathered to data consists of nearly 30 hours of recorded audio and time stamped editing operations and gestures

    History-based visual mining of semi-structured audio and text

    Get PDF
    Accessing specific or salient parts of multimedia recordings remains a challenge as there is no obvious way of structuring and representing a mix of space-based and time-based media. A number of approaches have been proposed which usually involve translating the continuous component of the multimedia recording into a space-based representation, such as text from audio through automatic speech recognition and images from video (keyframes). In this paper, we present a novel technique which defines retrieval units in terms of a log of actions performed on space-based artefacts, and exploits timing properties and extended concurrency to construct a visual presentation of text and speech data. This technique can be easily adapted to any mix of space-based artefacts and continuous media

    First impressions: A survey on vision-based apparent personality trait analysis

    Get PDF
    © 2019 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes,creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.Personality analysis has been widely studied in psychology, neuropsychology, and signal processing fields, among others. From the past few years, it also became an attractive research area in visual computing. From the computational point of view, by far speech and text have been the most considered cues of information for analyzing personality. However, recently there has been an increasing interest from the computer vision community in analyzing personality from visual data. Recent computer vision approaches are able to accurately analyze human faces, body postures and behaviors, and use these information to infer apparent personality traits. Because of the overwhelming research interest in this topic, and of the potential impact that this sort of methods could have in society, we present in this paper an up-to-date review of existing vision-based approaches for apparent personality trait recognition. We describe seminal and cutting edge works on the subject, discussing and comparing their distinctive features and limitations. Future venues of research in the field are identified and discussed. Furthermore, aspects on the subjectivity in data labeling/evaluation, as well as current datasets and challenges organized to push the research on the field are reviewed.Peer ReviewedPostprint (author's final draft

    Towards Universally Designed Communication: Opportunities and Challenges in the Use of Automatic Speech Recognition Systems to Support Access, Understanding and Use of Information in Communicative Settings

    Get PDF
    Unlike physical barriers, communication barriers do not have an easy solution: people speak or sign in different languages and may have wide-ranging proficiency levels in the languages they understand and produce. Universal Design (UD) principles in the domain of language and communication have guided the production of multimodal (audio, visual, written) information. For example, UD guidelines encourage websites to provide information in alternative formats (for example, a video with captions; a sign language version). The same UD for Learning principles apply in the classroom, and instructors are encouraged to prepare content to be presented multimodally, making use of increasingly available technology. In this chapter, I will address some of the opportunities and challenges offered by automatic speech recognition (ASR) systems. These systems have many strengths, and the most evident is the time they employ to convert speech sounds into a written form, faster than the time human transcribers need to perform the same process. These systems also present weaknesses, for example, a higher rate of errors when compared to human-generated transcriptions. It is essential to weigh the strengths and weaknesses of technology when choosing which device(s) to use in a universally designed environment to enhance access to information and communication. It is equally imperative to understand which tools are most appropriate for diverse populations. Therefore, researchers should continue investigating how people process information in a multimodal format, and how technology can be improved based on this knowledge and users’ needs and feedback

    An MPEG-7 scheme for semantic content modelling and filtering of digital video

    Get PDF
    Abstract Part 5 of the MPEG-7 standard specifies Multimedia Description Schemes (MDS); that is, the format multimedia content models should conform to in order to ensure interoperability across multiple platforms and applications. However, the standard does not specify how the content or the associated model may be filtered. This paper proposes an MPEG-7 scheme which can be deployed for digital video content modelling and filtering. The proposed scheme, COSMOS-7, produces rich and multi-faceted semantic content models and supports a content-based filtering approach that only analyses content relating directly to the preferred content requirements of the user. We present details of the scheme, front-end systems used for content modelling and filtering and experiences with a number of users
    • 

    corecore