103 research outputs found
Suivi Multi-Locuteurs avec des Informations Audio-Visuelles pour la Perception des Robots
Robot perception plays a crucial role in human-robot interaction (HRI). Perception system provides the robot information of the surroundings and enables the robot to give feedbacks. In a conversational scenario, a group of people may chat in front of the robot and move freely. In such situations, robots are expected to understand where are the people, who are speaking, or what are they talking about. This thesis concentrates on answering the first two questions, namely speaker tracking and diarization. We use different modalities of the robot’s perception system to achieve the goal. Like seeing and hearing for a human-being, audio and visual information are the critical cues for a robot in a conversational scenario. The advancement of computer vision and audio processing of the last decade has revolutionized the robot perception abilities. In this thesis, we have the following contributions: we first develop a variational Bayesian framework for tracking multiple objects. The variational Bayesian framework gives closed-form tractable problem solutions, which makes the tracking process efficient. The framework is first applied to visual multiple-person tracking. Birth and death process are built jointly with the framework to deal with the varying number of the people in the scene. Furthermore, we exploit the complementarity of vision and robot motorinformation. On the one hand, the robot’s active motion can be integrated into the visual tracking system to stabilize the tracking. On the other hand, visual information can be used to perform motor servoing. Moreover, audio and visual information are then combined in the variational framework, to estimate the smooth trajectories of speaking people, and to infer the acoustic status of a person- speaking or silent. In addition, we employ the model to acoustic-only speaker localization and tracking. Online dereverberation techniques are first applied then followed by the tracking system. Finally, a variant of the acoustic speaker tracking model based on von-Mises distribution is proposed, which is specifically adapted to directional data. All the proposed methods are validated on datasets according to applications.La perception des robots joue un rôle crucial dans l’interaction homme-robot (HRI). Le système de perception fournit les informations au robot sur l’environnement, ce qui permet au robot de réagir en consequence. Dans un scénario de conversation, un groupe de personnes peut discuter devant le robot et se déplacer librement. Dans de telles situations, les robots sont censés comprendre où sont les gens, ceux qui parlent et de quoi ils parlent. Cette thèse se concentre sur les deux premières questions, à savoir le suivi et la diarisation des locuteurs. Nous utilisons différentes modalités du système de perception du robot pour remplir cet objectif. Comme pour l’humain, l’ouie et la vue sont essentielles pour un robot dans un scénario de conversation. Les progrès de la vision par ordinateur et du traitement audio de la dernière décennie ont révolutionné les capacités de perception des robots. Dans cette thèse, nous développons les contributions suivantes : nous développons d’abord un cadre variationnel bayésien pour suivre plusieurs objets. Le cadre bayésien variationnel fournit des solutions explicites, rendant le processus de suivi très efficace. Cette approche est d’abord appliqué au suivi visuel de plusieurs personnes. Les processus de créations et de destructions sont en adéquation avecle modèle probabiliste proposé pour traiter un nombre variable de personnes. De plus, nous exploitons la complémentarité de la vision et des informations du moteur du robot : d’une part, le mouvement actif du robot peut être intégré au système de suivi visuel pour le stabiliser ; d’autre part, les informations visuelles peuvent être utilisées pour effectuer l’asservissement du moteur. Par la suite, les informations audio et visuelles sont combinées dans le modèle variationnel, pour lisser les trajectoires et déduire le statut acoustique d’une personne : parlant ou silencieux. Pour experimenter un scenario où l’informationvisuelle est absente, nous essayons le modèle pour la localisation et le suivi des locuteurs basé sur l’information acoustique uniquement. Les techniques de déréverbération sont d’abord appliquées, dont le résultat est fourni au système de suivi. Enfin, une variante du modèle de suivi des locuteurs basée sur la distribution de von-Mises est proposée, celle-ci étant plus adaptée aux données directionnelles. Toutes les méthodes proposées sont validées sur des bases de données specifiques à chaque application
Tangible auditory interfaces : combining auditory displays and tangible interfaces
Bovermann T. Tangible auditory interfaces : combining auditory displays and tangible interfaces. Bielefeld (Germany): Bielefeld University; 2009.Tangible Auditory Interfaces (TAIs) investigates into the capabilities of the interconnection of Tangible User Interfaces and Auditory Displays. TAIs utilise artificial physical objects as well as soundscapes to represent digital information. The interconnection of the two fields establishes a tight coupling between information and operation that is based on the human's familiarity with the incorporated interrelations. This work gives a formal introduction to TAIs and shows their key features at hand of seven proof of concept applications
Structure out of sound
Thesis (Ph. D.)--Massachusetts Institute of Technology, Program in Media Arts & Sciences, 1993.Vita.Includes bibliographical references (p. 155-170).Michael Jerome Hawley.Ph.D
Sonic Interactions in Virtual Environments
This open access book tackles the design of 3D spatial interactions in an audio-centered and audio-first perspective, providing the fundamental notions related to the creation and evaluation of immersive sonic experiences. The key elements that enhance the sensation of place in a virtual environment (VE) are: Immersive audio: the computational aspects of the acoustical-space properties of Virutal Reality (VR) technologies Sonic interaction: the human-computer interplay through auditory feedback in VE VR systems: naturally support multimodal integration, impacting different application domains Sonic Interactions in Virtual Environments will feature state-of-the-art research on real-time auralization, sonic interaction design in VR, quality of the experience in multimodal scenarios, and applications. Contributors and editors include interdisciplinary experts from the fields of computer science, engineering, acoustics, psychology, design, humanities, and beyond. Their mission is to shape an emerging new field of study at the intersection of sonic interaction design and immersive media, embracing an archipelago of existing research spread in different audio communities and to increase among the VR communities, researchers, and practitioners, the awareness of the importance of sonic elements when designing immersive environments
Designing instruments towards networked music practices
It is commonly noted in New Interfaces for Musical Expression (NIME) research that
few of these make it to the mainstream and are adopted by the general public. Some
research in Sound and Music Computing (SMC) suggests that the lack of humanistic
research guiding technological development may be one of the causes. Many new
technologies are invented, however without real aim else than for technical
innovation, great products however emphasize the user-friendliness, user involvement
in the design process or User-Centred Design (UCD), that seek to guarantee that
innovation address real, existing needs among users. Such an approach includes not
only traditionally quantifiable usability goals, but also qualitative, psychological,
philosophical and musical such. The latter approach has come to be called experience
design, while the former is referred to as interaction design. Although the Human
Computer Interaction (HCI) community in general has recognized the significance of
qualitative needs and experience design, NIME has been slower to adopt this new
paradigm. This thesis therefore attempts to investigate its relevance in NIME, and
specifically Computer Supported Cooperative Work (CSCW) for music applications
by devising a prototype for group music action based on needs defined from pianists
engaging in piano duets, one of the more common forms of group creation seen in the
western musical tradition. These needs, some which are socio-emotional in nature, are
addressed through our prototype although in the context of computers and global
networks by allowing for composers from all over the world to submit music to a
group concert on a Yamaha Disklavier in location in Porto, Portugal. Although this
prototype is not a new gestural controller per se, and therefore not a traditional NIME,
but rather a platform that interfaces groups of composers with a remote audience, the
aim of this research is on investigating how contextual parameters like venue, audience, joint concert and technologies impact the overall user experience of such a
system. The results of this research has been important not only in understanding the
processes, services, events or environments in which NIME’s operate, but also
understanding reciprocity, creativity, experience design in Networked Music
practices.É de conhecimento generalizado que na área de investigação em novos interfaces para
expressão musical (NIME - New Interfaces for Musical Expression), poucos dos
resultantes dispositivos acabam por ser popularizados e adoptados pelo grande
público. Algum do trabalho em computação sonora e musical (SMC- Sound and
Music Computing) sugere que uma das causas para esta dificuldade, reside
numalacuna ao nível da investigação dos comportamentos humanos como linha
orientadora para os desenvolvimentos tecnológicos. Muitos dos desenvolvimentos
tecnológicos são conduzidos sem um real objectivo, para além da inovação
tecnológica, resultando em excelentes produtos, mas sem qualquer enfâse na
usabilidade humana ou envolvimento do utilizador no processo de Design (UCDUser
Centered Design), no sentido de garantir que a inovação atende a necessidades
reais dos utilizadores finais. Esta estratégia implica, não só objectivos quantitativos
tradicionais de usabilidade, mas também princípios qualitativos, fisiológicos,
psicológicos e musicológicos. Esta ultima abordagem é atualmente reconhecida como
Design de Experiência (Experience Design) enquanto a abordagem tradicional é
vulgarmente reconhecida apenas como Design de Interação (Interaction Design).
Apesar de na área Interação Homem-Computador (HCI – Human Computer
Interaction) as necessidades qualitativas no design de experiência ser amplamente
reconhecido em termos do seu significado e aplicabilidade, a comunidade NIME tem
sido mais lenta em adoptar este novo paradigma. Neste sentido, esta Tese procura
investigar a relevância em NIME, especificamente nu subtópico do trabalho
cooperativo suportado por Computadores (CSCW – Computer Supported Cooperative
Work), para aplicações musicais, através do desenvolvimento de um protótipo de um
sistema que suporta ações musicais coletivas, baseado nas necessidades especificas de Pianistas em duetos de Piano, uma das formas mais comuns de criação musical em
grupo popularizada na tradição musical ocidental. Estes requisitos, alguns sócioemocionais
na sua natureza, são atendidos através do protótipo, neste caso aplicado ao
contexto informático e da rede de comunicações global, permitindo a compositores de
todo o mundo submeterem a sua música para um concerto de piano em grupo num
piano acústico Yamaha Disklavier, localizado fisicamente na cidade do Porto,
Portugal. Este protótipo não introduz um novo controlador em si mesmo, e
consequentemente não está alinhado com as típicas propostas de NIME. Trata-se sim,
de uma nova plataforma de interface em grupo para compositores com uma audiência
remota, enquadrado com objectivos de experimentação e investigação sobre o
impacto de diversos parâmetros, tais como o espaço performativo, as audiências,
concertos colaborativos e tecnologias em termos do sistema global. O resultado deste
processo de investigação foi relevante, não só para compreender os processos,
serviços, eventos ou ambiente em que os NIME podem operar, mas também para
melhor perceber a reciprocidade, criatividade e design de experiencia nas práticas
musicais em rede
Sonic Interactions in Virtual Environments
This open access book tackles the design of 3D spatial interactions in an audio-centered and audio-first perspective, providing the fundamental notions related to the creation and evaluation of immersive sonic experiences. The key elements that enhance the sensation of place in a virtual environment (VE) are: Immersive audio: the computational aspects of the acoustical-space properties of Virutal Reality (VR) technologies Sonic interaction: the human-computer interplay through auditory feedback in VE VR systems: naturally support multimodal integration, impacting different application domains Sonic Interactions in Virtual Environments will feature state-of-the-art research on real-time auralization, sonic interaction design in VR, quality of the experience in multimodal scenarios, and applications. Contributors and editors include interdisciplinary experts from the fields of computer science, engineering, acoustics, psychology, design, humanities, and beyond. Their mission is to shape an emerging new field of study at the intersection of sonic interaction design and immersive media, embracing an archipelago of existing research spread in different audio communities and to increase among the VR communities, researchers, and practitioners, the awareness of the importance of sonic elements when designing immersive environments
- …