103 research outputs found

    Suivi Multi-Locuteurs avec des Informations Audio-Visuelles pour la Perception des Robots

    Get PDF
    Robot perception plays a crucial role in human-robot interaction (HRI). Perception system provides the robot information of the surroundings and enables the robot to give feedbacks. In a conversational scenario, a group of people may chat in front of the robot and move freely. In such situations, robots are expected to understand where are the people, who are speaking, or what are they talking about. This thesis concentrates on answering the first two questions, namely speaker tracking and diarization. We use different modalities of the robot’s perception system to achieve the goal. Like seeing and hearing for a human-being, audio and visual information are the critical cues for a robot in a conversational scenario. The advancement of computer vision and audio processing of the last decade has revolutionized the robot perception abilities. In this thesis, we have the following contributions: we first develop a variational Bayesian framework for tracking multiple objects. The variational Bayesian framework gives closed-form tractable problem solutions, which makes the tracking process efficient. The framework is first applied to visual multiple-person tracking. Birth and death process are built jointly with the framework to deal with the varying number of the people in the scene. Furthermore, we exploit the complementarity of vision and robot motorinformation. On the one hand, the robot’s active motion can be integrated into the visual tracking system to stabilize the tracking. On the other hand, visual information can be used to perform motor servoing. Moreover, audio and visual information are then combined in the variational framework, to estimate the smooth trajectories of speaking people, and to infer the acoustic status of a person- speaking or silent. In addition, we employ the model to acoustic-only speaker localization and tracking. Online dereverberation techniques are first applied then followed by the tracking system. Finally, a variant of the acoustic speaker tracking model based on von-Mises distribution is proposed, which is specifically adapted to directional data. All the proposed methods are validated on datasets according to applications.La perception des robots joue un rôle crucial dans l’interaction homme-robot (HRI). Le système de perception fournit les informations au robot sur l’environnement, ce qui permet au robot de réagir en consequence. Dans un scénario de conversation, un groupe de personnes peut discuter devant le robot et se déplacer librement. Dans de telles situations, les robots sont censés comprendre où sont les gens, ceux qui parlent et de quoi ils parlent. Cette thèse se concentre sur les deux premières questions, à savoir le suivi et la diarisation des locuteurs. Nous utilisons différentes modalités du système de perception du robot pour remplir cet objectif. Comme pour l’humain, l’ouie et la vue sont essentielles pour un robot dans un scénario de conversation. Les progrès de la vision par ordinateur et du traitement audio de la dernière décennie ont révolutionné les capacités de perception des robots. Dans cette thèse, nous développons les contributions suivantes : nous développons d’abord un cadre variationnel bayésien pour suivre plusieurs objets. Le cadre bayésien variationnel fournit des solutions explicites, rendant le processus de suivi très efficace. Cette approche est d’abord appliqué au suivi visuel de plusieurs personnes. Les processus de créations et de destructions sont en adéquation avecle modèle probabiliste proposé pour traiter un nombre variable de personnes. De plus, nous exploitons la complémentarité de la vision et des informations du moteur du robot : d’une part, le mouvement actif du robot peut être intégré au système de suivi visuel pour le stabiliser ; d’autre part, les informations visuelles peuvent être utilisées pour effectuer l’asservissement du moteur. Par la suite, les informations audio et visuelles sont combinées dans le modèle variationnel, pour lisser les trajectoires et déduire le statut acoustique d’une personne : parlant ou silencieux. Pour experimenter un scenario où l’informationvisuelle est absente, nous essayons le modèle pour la localisation et le suivi des locuteurs basé sur l’information acoustique uniquement. Les techniques de déréverbération sont d’abord appliquées, dont le résultat est fourni au système de suivi. Enfin, une variante du modèle de suivi des locuteurs basée sur la distribution de von-Mises est proposée, celle-ci étant plus adaptée aux données directionnelles. Toutes les méthodes proposées sont validées sur des bases de données specifiques à chaque application

    Tangible auditory interfaces : combining auditory displays and tangible interfaces

    Get PDF
    Bovermann T. Tangible auditory interfaces : combining auditory displays and tangible interfaces. Bielefeld (Germany): Bielefeld University; 2009.Tangible Auditory Interfaces (TAIs) investigates into the capabilities of the interconnection of Tangible User Interfaces and Auditory Displays. TAIs utilise artificial physical objects as well as soundscapes to represent digital information. The interconnection of the two fields establishes a tight coupling between information and operation that is based on the human's familiarity with the incorporated interrelations. This work gives a formal introduction to TAIs and shows their key features at hand of seven proof of concept applications

    Structure out of sound

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Program in Media Arts & Sciences, 1993.Vita.Includes bibliographical references (p. 155-170).Michael Jerome Hawley.Ph.D

    Sonic Interactions in Virtual Environments

    Get PDF
    This open access book tackles the design of 3D spatial interactions in an audio-centered and audio-first perspective, providing the fundamental notions related to the creation and evaluation of immersive sonic experiences. The key elements that enhance the sensation of place in a virtual environment (VE) are: Immersive audio: the computational aspects of the acoustical-space properties of Virutal Reality (VR) technologies Sonic interaction: the human-computer interplay through auditory feedback in VE VR systems: naturally support multimodal integration, impacting different application domains Sonic Interactions in Virtual Environments will feature state-of-the-art research on real-time auralization, sonic interaction design in VR, quality of the experience in multimodal scenarios, and applications. Contributors and editors include interdisciplinary experts from the fields of computer science, engineering, acoustics, psychology, design, humanities, and beyond. Their mission is to shape an emerging new field of study at the intersection of sonic interaction design and immersive media, embracing an archipelago of existing research spread in different audio communities and to increase among the VR communities, researchers, and practitioners, the awareness of the importance of sonic elements when designing immersive environments

    Designing instruments towards networked music practices

    Get PDF
    It is commonly noted in New Interfaces for Musical Expression (NIME) research that few of these make it to the mainstream and are adopted by the general public. Some research in Sound and Music Computing (SMC) suggests that the lack of humanistic research guiding technological development may be one of the causes. Many new technologies are invented, however without real aim else than for technical innovation, great products however emphasize the user-friendliness, user involvement in the design process or User-Centred Design (UCD), that seek to guarantee that innovation address real, existing needs among users. Such an approach includes not only traditionally quantifiable usability goals, but also qualitative, psychological, philosophical and musical such. The latter approach has come to be called experience design, while the former is referred to as interaction design. Although the Human Computer Interaction (HCI) community in general has recognized the significance of qualitative needs and experience design, NIME has been slower to adopt this new paradigm. This thesis therefore attempts to investigate its relevance in NIME, and specifically Computer Supported Cooperative Work (CSCW) for music applications by devising a prototype for group music action based on needs defined from pianists engaging in piano duets, one of the more common forms of group creation seen in the western musical tradition. These needs, some which are socio-emotional in nature, are addressed through our prototype although in the context of computers and global networks by allowing for composers from all over the world to submit music to a group concert on a Yamaha Disklavier in location in Porto, Portugal. Although this prototype is not a new gestural controller per se, and therefore not a traditional NIME, but rather a platform that interfaces groups of composers with a remote audience, the aim of this research is on investigating how contextual parameters like venue, audience, joint concert and technologies impact the overall user experience of such a system. The results of this research has been important not only in understanding the processes, services, events or environments in which NIME’s operate, but also understanding reciprocity, creativity, experience design in Networked Music practices.É de conhecimento generalizado que na área de investigação em novos interfaces para expressão musical (NIME - New Interfaces for Musical Expression), poucos dos resultantes dispositivos acabam por ser popularizados e adoptados pelo grande público. Algum do trabalho em computação sonora e musical (SMC- Sound and Music Computing) sugere que uma das causas para esta dificuldade, reside numalacuna ao nível da investigação dos comportamentos humanos como linha orientadora para os desenvolvimentos tecnológicos. Muitos dos desenvolvimentos tecnológicos são conduzidos sem um real objectivo, para além da inovação tecnológica, resultando em excelentes produtos, mas sem qualquer enfâse na usabilidade humana ou envolvimento do utilizador no processo de Design (UCDUser Centered Design), no sentido de garantir que a inovação atende a necessidades reais dos utilizadores finais. Esta estratégia implica, não só objectivos quantitativos tradicionais de usabilidade, mas também princípios qualitativos, fisiológicos, psicológicos e musicológicos. Esta ultima abordagem é atualmente reconhecida como Design de Experiência (Experience Design) enquanto a abordagem tradicional é vulgarmente reconhecida apenas como Design de Interação (Interaction Design). Apesar de na área Interação Homem-Computador (HCI – Human Computer Interaction) as necessidades qualitativas no design de experiência ser amplamente reconhecido em termos do seu significado e aplicabilidade, a comunidade NIME tem sido mais lenta em adoptar este novo paradigma. Neste sentido, esta Tese procura investigar a relevância em NIME, especificamente nu subtópico do trabalho cooperativo suportado por Computadores (CSCW – Computer Supported Cooperative Work), para aplicações musicais, através do desenvolvimento de um protótipo de um sistema que suporta ações musicais coletivas, baseado nas necessidades especificas de Pianistas em duetos de Piano, uma das formas mais comuns de criação musical em grupo popularizada na tradição musical ocidental. Estes requisitos, alguns sócioemocionais na sua natureza, são atendidos através do protótipo, neste caso aplicado ao contexto informático e da rede de comunicações global, permitindo a compositores de todo o mundo submeterem a sua música para um concerto de piano em grupo num piano acústico Yamaha Disklavier, localizado fisicamente na cidade do Porto, Portugal. Este protótipo não introduz um novo controlador em si mesmo, e consequentemente não está alinhado com as típicas propostas de NIME. Trata-se sim, de uma nova plataforma de interface em grupo para compositores com uma audiência remota, enquadrado com objectivos de experimentação e investigação sobre o impacto de diversos parâmetros, tais como o espaço performativo, as audiências, concertos colaborativos e tecnologias em termos do sistema global. O resultado deste processo de investigação foi relevante, não só para compreender os processos, serviços, eventos ou ambiente em que os NIME podem operar, mas também para melhor perceber a reciprocidade, criatividade e design de experiencia nas práticas musicais em rede

    Sonic Interactions in Virtual Environments

    Get PDF
    This open access book tackles the design of 3D spatial interactions in an audio-centered and audio-first perspective, providing the fundamental notions related to the creation and evaluation of immersive sonic experiences. The key elements that enhance the sensation of place in a virtual environment (VE) are: Immersive audio: the computational aspects of the acoustical-space properties of Virutal Reality (VR) technologies Sonic interaction: the human-computer interplay through auditory feedback in VE VR systems: naturally support multimodal integration, impacting different application domains Sonic Interactions in Virtual Environments will feature state-of-the-art research on real-time auralization, sonic interaction design in VR, quality of the experience in multimodal scenarios, and applications. Contributors and editors include interdisciplinary experts from the fields of computer science, engineering, acoustics, psychology, design, humanities, and beyond. Their mission is to shape an emerging new field of study at the intersection of sonic interaction design and immersive media, embracing an archipelago of existing research spread in different audio communities and to increase among the VR communities, researchers, and practitioners, the awareness of the importance of sonic elements when designing immersive environments
    corecore