686 research outputs found

    Searching Spontaneous Conversational Speech:Proceedings of ACM SIGIR Workshop (SSCS2008)

    Get PDF

    Unsupervised video indexing on audiovisual characterization of persons

    Get PDF
    Cette thèse consiste à proposer une méthode de caractérisation non-supervisée des intervenants dans les documents audiovisuels, en exploitant des données liées à leur apparence physique et à leur voix. De manière générale, les méthodes d'identification automatique, que ce soit en vidéo ou en audio, nécessitent une quantité importante de connaissances a priori sur le contenu. Dans ce travail, le but est d'étudier les deux modes de façon corrélée et d'exploiter leur propriété respective de manière collaborative et robuste, afin de produire un résultat fiable aussi indépendant que possible de toute connaissance a priori. Plus particulièrement, nous avons étudié les caractéristiques du flux audio et nous avons proposé plusieurs méthodes pour la segmentation et le regroupement en locuteurs que nous avons évaluées dans le cadre d'une campagne d'évaluation. Ensuite, nous avons mené une étude approfondie sur les descripteurs visuels (visage, costume) qui nous ont servis à proposer de nouvelles approches pour la détection, le suivi et le regroupement des personnes. Enfin, le travail s'est focalisé sur la fusion des données audio et vidéo en proposant une approche basée sur le calcul d'une matrice de cooccurrence qui nous a permis d'établir une association entre l'index audio et l'index vidéo et d'effectuer leur correction. Nous pouvons ainsi produire un modèle audiovisuel dynamique des intervenants.This thesis consists to propose a method for an unsupervised characterization of persons within audiovisual documents, by exploring the data related for their physical appearance and their voice. From a general manner, the automatic recognition methods, either in video or audio, need a huge amount of a priori knowledge about their content. In this work, the goal is to study the two modes in a correlated way and to explore their properties in a collaborative and robust way, in order to produce a reliable result as independent as possible from any a priori knowledge. More particularly, we have studied the characteristics of the audio stream and we have proposed many methods for speaker segmentation and clustering and that we have evaluated in a french competition. Then, we have carried a deep study on visual descriptors (face, clothing) that helped us to propose novel approches for detecting, tracking, and clustering of people within the document. Finally, the work was focused on the audiovisual fusion by proposing a method based on computing the cooccurrence matrix that allowed us to establish an association between audio and video indexes, and to correct them. That will enable us to produce a dynamic audiovisual model for each speaker

    Recherche du rôle des intervenants et de leurs interactions pour la structuration de documents audiovisuels

    Get PDF
    Nous présentons un système de structuration automatique d'enregistrements audiovisuels s'appuyant sur des informations non lexicales caractéristiques des rôles des intervenants et de leurs interactions. Dans une première étape, nous proposons une méthode de détection et de caractérisation de séquences temporelles, nommée « zones d'interaction », susceptibles de correspondre à des conversations. La seconde étape de notre système réalise une reconnaissance du rôle des intervenants : présentateur, journaliste et autre. Notre contribution au domaine de la reconnaissance automatique du rôle se distingue en reposant sur l'hypothèse selon laquelle les rôles des intervenants sont accessibles à travers des paramètres « bas-niveau » inscrits d'une part dans l'organisation temporelle des tours de parole des intervenants, dans les environnements acoustiques dans lesquels ils apparaissent, ainsi que dans plusieurs paramètres prosodiques (intonation et débit). Dans une dernière étape, nous combinons l'information du rôle des intervenants à la connaissance des séquences d'interaction afin de produire deux niveaux de description du contenu des documents. Le premier niveau de description segmente les enregistrements en zones de 4 types : informations, entretiens, transition et intermède. Un second niveau de description classe les zones d'interaction orales en 4 catégories : débat, interview, chronique et relais. Chaque étape du système est validée par une grand nombre d'expériences menées sur le corpus du projet EPAC et celui de la campagne d'évaluation ESTER.We present a system for audiovisual document structuring, based-on speaker role recognition and speech interaction zone detection. The first stage of our system consists in an automatic method for speech interaction zones detection and characterization. Such zones correspond to temporal sequences of documents which potentially contain conversations between speakers. The second stage of our system achieves the recognition of speaker roles : anchorman, journalist and other. Our contribution to this domain is based on the hypothesis that cues about speaker roles are available through low-level features extracted from the temporal organization of turn-takings and from acoustic and prosodic features (speech rate and pitch). In the last stage of our system, we combine speaker roles and speech interaction zones to provide two descriptive layers of the audiovisual document contents. The first descriptive layer gathers segments of 4 types : informations, meeting, transition and interlude. The second descriptive layer consists in a classification of speech interaction zones into 4 categories : debate, interview, chronicle and relay. Each step of the system has been evaluated using a large number of experiments realized using the EPAC project and ESTER campaign corpora

    Beyond Media Borders, Volume 1

    Get PDF
    This open access book promotes the idea that all media types are multimodal and that comparing media types, through an intermedial lens, necessarily involves analysing these multimodal traits. The collection includes a series of interconnected articles that illustrate and clarify how the concepts developed in Elleström’s influential article The Modalities of Media: A Model for Understanding Intermedial Relations (Palgrave Macmillan, 2010) can be used for methodical investigation and interpretation of media traits and media interrelations. The authors work with a wide range of old and new media types that are traditionally investigated through limited, media-specific concepts. The publication is a significant contribution to interdisciplinary research, advancing the frontiers of conceptual as well as practical understanding of media interrelations. This is the first of two volumes. It contains Elleström’s revised article and six other contributions focusing especially on media integration: how media products and media types are combined and merged in various ways

    Beyond Media Borders, Volume 1

    Get PDF
    This open access book promotes the idea that all media types are multimodal and that comparing media types, through an intermedial lens, necessarily involves analysing these multimodal traits. The collection includes a series of interconnected articles that illustrate and clarify how the concepts developed in Elleström’s influential article The Modalities of Media: A Model for Understanding Intermedial Relations (Palgrave Macmillan, 2010) can be used for methodical investigation and interpretation of media traits and media interrelations. The authors work with a wide range of old and new media types that are traditionally investigated through limited, media-specific concepts. The publication is a significant contribution to interdisciplinary research, advancing the frontiers of conceptual as well as practical understanding of media interrelations. This is the first of two volumes. It contains Elleström’s revised article and six other contributions focusing especially on media integration: how media products and media types are combined and merged in various ways

    Proceedings of the Seventh Italian Conference on Computational Linguistics CLiC-it 2020

    Get PDF
    On behalf of the Program Committee, a very warm welcome to the Seventh Italian Conference on Computational Linguistics (CLiC-it 2020). This edition of the conference is held in Bologna and organised by the University of Bologna. The CLiC-it conference series is an initiative of the Italian Association for Computational Linguistics (AILC) which, after six years of activity, has clearly established itself as the premier national forum for research and development in the fields of Computational Linguistics and Natural Language Processing, where leading researchers and practitioners from academia and industry meet to share their research results, experiences, and challenges

    MediaSync: Handbook on Multimedia Synchronization

    Get PDF
    This book provides an approachable overview of the most recent advances in the fascinating field of media synchronization (mediasync), gathering contributions from the most representative and influential experts. Understanding the challenges of this field in the current multi-sensory, multi-device, and multi-protocol world is not an easy task. The book revisits the foundations of mediasync, including theoretical frameworks and models, highlights ongoing research efforts, like hybrid broadband broadcast (HBB) delivery and users' perception modeling (i.e., Quality of Experience or QoE), and paves the way for the future (e.g., towards the deployment of multi-sensory and ultra-realistic experiences). Although many advances around mediasync have been devised and deployed, this area of research is getting renewed attention to overcome remaining challenges in the next-generation (heterogeneous and ubiquitous) media ecosystem. Given the significant advances in this research area, its current relevance and the multiple disciplines it involves, the availability of a reference book on mediasync becomes necessary. This book fills the gap in this context. In particular, it addresses key aspects and reviews the most relevant contributions within the mediasync research space, from different perspectives. Mediasync: Handbook on Multimedia Synchronization is the perfect companion for scholars and practitioners that want to acquire strong knowledge about this research area, and also approach the challenges behind ensuring the best mediated experiences, by providing the adequate synchronization between the media elements that constitute these experiences

    Gesture and Speech in Interaction - 4th edition (GESPIN 4)

    Get PDF
    International audienceThe fourth edition of Gesture and Speech in Interaction (GESPIN) was held in Nantes, France. With more than 40 papers, these proceedings show just what a flourishing field of enquiry gesture studies continues to be. The keynote speeches of the conference addressed three different aspects of multimodal interaction:gesture and grammar, gesture acquisition, and gesture and social interaction. In a talk entitled Qualitiesof event construal in speech and gesture: Aspect and tense, Alan Cienki presented an ongoing researchproject on narratives in French, German and Russian, a project that focuses especially on the verbal andgestural expression of grammatical tense and aspect in narratives in the three languages. Jean-MarcColletta's talk, entitled Gesture and Language Development: towards a unified theoretical framework,described the joint acquisition and development of speech and early conventional and representationalgestures. In Grammar, deixis, and multimodality between code-manifestation and code-integration or whyKendon's Continuum should be transformed into a gestural circle, Ellen Fricke proposed a revisitedgrammar of noun phrases that integrates gestures as part of the semiotic and typological codes of individuallanguages. From a pragmatic and cognitive perspective, Judith Holler explored the use ofgaze and hand gestures as means of organizing turns at talk as well as establishing common ground in apresentation entitled On the pragmatics of multi-modal face-to-face communication: Gesture, speech andgaze in the coordination of mental states and social interaction.Among the talks and posters presented at the conference, the vast majority of topics related, quitenaturally, to gesture and speech in interaction - understood both in terms of mapping of units in differentsemiotic modes and of the use of gesture and speech in social interaction. Several presentations explored the effects of impairments(such as diseases or the natural ageing process) on gesture and speech. The communicative relevance ofgesture and speech and audience-design in natural interactions, as well as in more controlled settings liketelevision debates and reports, was another topic addressed during the conference. Some participantsalso presented research on first and second language learning, while others discussed the relationshipbetween gesture and intonation. While most participants presented research on gesture and speech froman observer's perspective, be it in semiotics or pragmatics, some nevertheless focused on another importantaspect: the cognitive processes involved in language production and perception. Last but not least,participants also presented talks and posters on the computational analysis of gestures, whether involvingexternal devices (e.g. mocap, kinect) or concerning the use of specially-designed computer software forthe post-treatment of gestural data. Importantly, new links were made between semiotics and mocap data

    Specialised Languages and Multimedia. Linguistic and Cross-cultural Issues

    Get PDF
    none2noThis book collects academic works focusing on scientific and technical discourse and on the ways in which this type of discourse appears in or is shaped by multimedia products. The originality of this book is to be seen in the variety of approaches used and of the specialised languages investigated in relation to multimodal and multimedia genres. Contributions will particularly focus on new multimodal or multimedia forms of specialised discourse (in institutional, academic, technical, scientific, social or popular settings), linguistic features of specialised discourse in multimodal or multimedia genres, the popularisation of specialised knowledge in multimodal or multimedia genres, the impact of multimodality and multimediality on the construction of scientific and technical discourse, the impact of multimodality/multimediality in the practice and teaching of language, the impact of multimodality/multimediality in the practice and teaching of translation, new multimedia modes of knowledge dissemination, the translation/adaptation of scientific discourse in multimedia products. This volume contributes to the theory and practice of multimodal studies and translation, with a specific focus on specialized discourse.Rivista di Classe A - Volume specialeopenManca E., Bianchi F.Manca, E.; Bianchi, F
    • …
    corecore