215 research outputs found

    ベイズ法によるマイクロフォンアレイ処理

    Get PDF
    京都大学0048新制・課程博士博士(情報学)甲第18412号情博第527号新制||情||93(附属図書館)31270京都大学大学院情報学研究科知能情報学専攻(主査)教授 奥乃 博, 教授 河原 達也, 准教授 CUTURI CAMETO Marco, 講師 吉井 和佳学位規則第4条第1項該当Doctor of InformaticsKyoto UniversityDFA

    Proceedings of the 10th International Conference on Ecological Informatics: translating ecological data into knowledge and decisions in a rapidly changing world: ICEI 2018

    Get PDF
    The Conference Proceedings are an impressive display of the current scope of Ecological Informatics. Whilst Data Management, Analysis, Synthesis and Forecasting have been lasting popular themes over the past nine biannual ICEI conferences, ICEI 2018 addresses distinctively novel developments in Data Acquisition enabled by cutting edge in situ and remote sensing technology. The here presented ICEI 2018 abstracts captures well current trends and challenges of Ecological Informatics towards: • regional, continental and global sharing of ecological data, • thorough integration of complementing monitoring technologies including DNA-barcoding, • sophisticated pattern recognition by deep learning, • advanced exploration of valuable information in ‘big data’ by means of machine learning and process modelling, • decision-informing solutions for biodiversity conservation and sustainable ecosystem management in light of global changes

    Proceedings of the 10th International Conference on Ecological Informatics: translating ecological data into knowledge and decisions in a rapidly changing world: ICEI 2018

    Get PDF
    The Conference Proceedings are an impressive display of the current scope of Ecological Informatics. Whilst Data Management, Analysis, Synthesis and Forecasting have been lasting popular themes over the past nine biannual ICEI conferences, ICEI 2018 addresses distinctively novel developments in Data Acquisition enabled by cutting edge in situ and remote sensing technology. The here presented ICEI 2018 abstracts captures well current trends and challenges of Ecological Informatics towards: • regional, continental and global sharing of ecological data, • thorough integration of complementing monitoring technologies including DNA-barcoding, • sophisticated pattern recognition by deep learning, • advanced exploration of valuable information in ‘big data’ by means of machine learning and process modelling, • decision-informing solutions for biodiversity conservation and sustainable ecosystem management in light of global changes

    Suivi Multi-Locuteurs avec des Informations Audio-Visuelles pour la Perception des Robots

    Get PDF
    Robot perception plays a crucial role in human-robot interaction (HRI). Perception system provides the robot information of the surroundings and enables the robot to give feedbacks. In a conversational scenario, a group of people may chat in front of the robot and move freely. In such situations, robots are expected to understand where are the people, who are speaking, or what are they talking about. This thesis concentrates on answering the first two questions, namely speaker tracking and diarization. We use different modalities of the robot’s perception system to achieve the goal. Like seeing and hearing for a human-being, audio and visual information are the critical cues for a robot in a conversational scenario. The advancement of computer vision and audio processing of the last decade has revolutionized the robot perception abilities. In this thesis, we have the following contributions: we first develop a variational Bayesian framework for tracking multiple objects. The variational Bayesian framework gives closed-form tractable problem solutions, which makes the tracking process efficient. The framework is first applied to visual multiple-person tracking. Birth and death process are built jointly with the framework to deal with the varying number of the people in the scene. Furthermore, we exploit the complementarity of vision and robot motorinformation. On the one hand, the robot’s active motion can be integrated into the visual tracking system to stabilize the tracking. On the other hand, visual information can be used to perform motor servoing. Moreover, audio and visual information are then combined in the variational framework, to estimate the smooth trajectories of speaking people, and to infer the acoustic status of a person- speaking or silent. In addition, we employ the model to acoustic-only speaker localization and tracking. Online dereverberation techniques are first applied then followed by the tracking system. Finally, a variant of the acoustic speaker tracking model based on von-Mises distribution is proposed, which is specifically adapted to directional data. All the proposed methods are validated on datasets according to applications.La perception des robots joue un rôle crucial dans l’interaction homme-robot (HRI). Le système de perception fournit les informations au robot sur l’environnement, ce qui permet au robot de réagir en consequence. Dans un scénario de conversation, un groupe de personnes peut discuter devant le robot et se déplacer librement. Dans de telles situations, les robots sont censés comprendre où sont les gens, ceux qui parlent et de quoi ils parlent. Cette thèse se concentre sur les deux premières questions, à savoir le suivi et la diarisation des locuteurs. Nous utilisons différentes modalités du système de perception du robot pour remplir cet objectif. Comme pour l’humain, l’ouie et la vue sont essentielles pour un robot dans un scénario de conversation. Les progrès de la vision par ordinateur et du traitement audio de la dernière décennie ont révolutionné les capacités de perception des robots. Dans cette thèse, nous développons les contributions suivantes : nous développons d’abord un cadre variationnel bayésien pour suivre plusieurs objets. Le cadre bayésien variationnel fournit des solutions explicites, rendant le processus de suivi très efficace. Cette approche est d’abord appliqué au suivi visuel de plusieurs personnes. Les processus de créations et de destructions sont en adéquation avecle modèle probabiliste proposé pour traiter un nombre variable de personnes. De plus, nous exploitons la complémentarité de la vision et des informations du moteur du robot : d’une part, le mouvement actif du robot peut être intégré au système de suivi visuel pour le stabiliser ; d’autre part, les informations visuelles peuvent être utilisées pour effectuer l’asservissement du moteur. Par la suite, les informations audio et visuelles sont combinées dans le modèle variationnel, pour lisser les trajectoires et déduire le statut acoustique d’une personne : parlant ou silencieux. Pour experimenter un scenario où l’informationvisuelle est absente, nous essayons le modèle pour la localisation et le suivi des locuteurs basé sur l’information acoustique uniquement. Les techniques de déréverbération sont d’abord appliquées, dont le résultat est fourni au système de suivi. Enfin, une variante du modèle de suivi des locuteurs basée sur la distribution de von-Mises est proposée, celle-ci étant plus adaptée aux données directionnelles. Toutes les méthodes proposées sont validées sur des bases de données specifiques à chaque application

    Effects of errorless learning on the acquisition of velopharyngeal movement control

    Get PDF
    Session 1pSC - Speech Communication: Cross-Linguistic Studies of Speech Sound Learning of the Languages of Hong Kong (Poster Session)The implicit motor learning literature suggests a benefit for learning if errors are minimized during practice. This study investigated whether the same principle holds for learning velopharyngeal movement control. Normal speaking participants learned to produce hypernasal speech in either an errorless learning condition (in which the possibility for errors was limited) or an errorful learning condition (in which the possibility for errors was not limited). Nasality level of the participants’ speech was measured by nasometer and reflected by nasalance scores (in %). Errorless learners practiced producing hypernasal speech with a threshold nasalance score of 10% at the beginning, which gradually increased to a threshold of 50% at the end. The same set of threshold targets were presented to errorful learners but in a reversed order. Errors were defined by the proportion of speech with a nasalance score below the threshold. The results showed that, relative to errorful learners, errorless learners displayed fewer errors (50.7% vs. 17.7%) and a higher mean nasalance score (31.3% vs. 46.7%) during the acquisition phase. Furthermore, errorless learners outperformed errorful learners in both retention and novel transfer tests. Acknowledgment: Supported by The University of Hong Kong Strategic Research Theme for Sciences of Learning © 2012 Acoustical Society of Americapublished_or_final_versio

    Sensor fusion in distributed cortical circuits

    Get PDF
    The substantial motion of the nature is to balance, to survive, and to reach perfection. The evolution in biological systems is a key signature of this quintessence. Survival cannot be achieved without understanding the surrounding world. How can a fruit fly live without searching for food, and thereby with no form of perception that guides the behavior? The nervous system of fruit fly with hundred thousand of neurons can perform very complicated tasks that are beyond the power of an advanced supercomputer. Recently developed computing machines are made by billions of transistors and they are remarkably fast in precise calculations. But these machines are unable to perform a single task that an insect is able to do by means of thousands of neurons. The complexity of information processing and data compression in a single biological neuron and neural circuits are not comparable with that of developed today in transistors and integrated circuits. On the other hand, the style of information processing in neural systems is also very different from that of employed by microprocessors which is mostly centralized. Almost all cognitive functions are generated by a combined effort of multiple brain areas. In mammals, Cortical regions are organized hierarchically, and they are reciprocally interconnected, exchanging the information from multiple senses. This hierarchy in circuit level, also preserves the sensory world within different levels of complexity and within the scope of multiple modalities. The main behavioral advantage of that is to understand the real-world through multiple sensory systems, and thereby to provide a robust and coherent form of perception. When the quality of a sensory signal drops, the brain can alternatively employ other information pathways to handle cognitive tasks, or even to calibrate the error-prone sensory node. Mammalian brain also takes a good advantage of multimodal processing in learning and development; where one sensory system helps another sensory modality to develop. Multisensory integration is considered as one of the main factors that generates consciousness in human. Although, we still do not know where exactly the information is consolidated into a single percept, and what is the underpinning neural mechanism of this process? One straightforward hypothesis suggests that the uni-sensory signals are pooled in a ploy-sensory convergence zone, which creates a unified form of perception. But it is hard to believe that there is just one single dedicated region that realizes this functionality. Using a set of realistic neuro-computational principles, I have explored theoretically how multisensory integration can be performed within a distributed hierarchical circuit. I argued that the interaction of cortical populations can be interpreted as a specific form of relation satisfaction in which the information preserved in one neural ensemble must agree with incoming signals from connected populations according to a relation function. This relation function can be seen as a coherency function which is implicitly learnt through synaptic strength. Apart from the fact that the real world is composed of multisensory attributes, the sensory signals are subject to uncertainty. This requires a cortical mechanism to incorporate the statistical parameters of the sensory world in neural circuits and to deal with the issue of inaccuracy in perception. I argued in this thesis how the intrinsic stochasticity of neural activity enables a systematic mechanism to encode probabilistic quantities within neural circuits, e.g. reliability, prior probability. The systematic benefit of neural stochasticity is well paraphrased by the problem of Duns Scotus paradox: imagine a donkey with a deterministic brain that is exposed to two identical food rewards. This may make the animal suffer and die starving because of indecision. In this thesis, I have introduced an optimal encoding framework that can describe the probability function of a Gaussian-like random variable in a pool of Poisson neurons. Thereafter a distributed neural model is proposed that can optimally combine conditional probabilities over sensory signals, in order to compute Bayesian Multisensory Causal Inference. This process is known as a complex multisensory function in the cortex. Recently it is found that this process is performed within a distributed hierarchy in sensory cortex. Our work is amongst the first successful attempts that put a mechanistic spotlight on understanding the underlying neural mechanism of Multisensory Causal Perception in the brain, and in general the theory of decentralized multisensory integration in sensory cortex. Engineering information processing concepts in the brain and developing new computing technologies have been recently growing. Neuromorphic Engineering is a new branch that undertakes this mission. In a dedicated part of this thesis, I have proposed a Neuromorphic algorithm for event-based stereoscopic fusion. This algorithm is anchored in the idea of cooperative computing that dictates the defined epipolar and temporal constraints of the stereoscopic setup, to the neural dynamics. The performance of this algorithm is tested using a pair of silicon retinas

    Sensor fusion in distributed cortical circuits

    Get PDF
    The substantial motion of the nature is to balance, to survive, and to reach perfection. The evolution in biological systems is a key signature of this quintessence. Survival cannot be achieved without understanding the surrounding world. How can a fruit fly live without searching for food, and thereby with no form of perception that guides the behavior? The nervous system of fruit fly with hundred thousand of neurons can perform very complicated tasks that are beyond the power of an advanced supercomputer. Recently developed computing machines are made by billions of transistors and they are remarkably fast in precise calculations. But these machines are unable to perform a single task that an insect is able to do by means of thousands of neurons. The complexity of information processing and data compression in a single biological neuron and neural circuits are not comparable with that of developed today in transistors and integrated circuits. On the other hand, the style of information processing in neural systems is also very different from that of employed by microprocessors which is mostly centralized. Almost all cognitive functions are generated by a combined effort of multiple brain areas. In mammals, Cortical regions are organized hierarchically, and they are reciprocally interconnected, exchanging the information from multiple senses. This hierarchy in circuit level, also preserves the sensory world within different levels of complexity and within the scope of multiple modalities. The main behavioral advantage of that is to understand the real-world through multiple sensory systems, and thereby to provide a robust and coherent form of perception. When the quality of a sensory signal drops, the brain can alternatively employ other information pathways to handle cognitive tasks, or even to calibrate the error-prone sensory node. Mammalian brain also takes a good advantage of multimodal processing in learning and development; where one sensory system helps another sensory modality to develop. Multisensory integration is considered as one of the main factors that generates consciousness in human. Although, we still do not know where exactly the information is consolidated into a single percept, and what is the underpinning neural mechanism of this process? One straightforward hypothesis suggests that the uni-sensory signals are pooled in a ploy-sensory convergence zone, which creates a unified form of perception. But it is hard to believe that there is just one single dedicated region that realizes this functionality. Using a set of realistic neuro-computational principles, I have explored theoretically how multisensory integration can be performed within a distributed hierarchical circuit. I argued that the interaction of cortical populations can be interpreted as a specific form of relation satisfaction in which the information preserved in one neural ensemble must agree with incoming signals from connected populations according to a relation function. This relation function can be seen as a coherency function which is implicitly learnt through synaptic strength. Apart from the fact that the real world is composed of multisensory attributes, the sensory signals are subject to uncertainty. This requires a cortical mechanism to incorporate the statistical parameters of the sensory world in neural circuits and to deal with the issue of inaccuracy in perception. I argued in this thesis how the intrinsic stochasticity of neural activity enables a systematic mechanism to encode probabilistic quantities within neural circuits, e.g. reliability, prior probability. The systematic benefit of neural stochasticity is well paraphrased by the problem of Duns Scotus paradox: imagine a donkey with a deterministic brain that is exposed to two identical food rewards. This may make the animal suffer and die starving because of indecision. In this thesis, I have introduced an optimal encoding framework that can describe the probability function of a Gaussian-like random variable in a pool of Poisson neurons. Thereafter a distributed neural model is proposed that can optimally combine conditional probabilities over sensory signals, in order to compute Bayesian Multisensory Causal Inference. This process is known as a complex multisensory function in the cortex. Recently it is found that this process is performed within a distributed hierarchy in sensory cortex. Our work is amongst the first successful attempts that put a mechanistic spotlight on understanding the underlying neural mechanism of Multisensory Causal Perception in the brain, and in general the theory of decentralized multisensory integration in sensory cortex. Engineering information processing concepts in the brain and developing new computing technologies have been recently growing. Neuromorphic Engineering is a new branch that undertakes this mission. In a dedicated part of this thesis, I have proposed a Neuromorphic algorithm for event-based stereoscopic fusion. This algorithm is anchored in the idea of cooperative computing that dictates the defined epipolar and temporal constraints of the stereoscopic setup, to the neural dynamics. The performance of this algorithm is tested using a pair of silicon retinas

    Towards using Cough for Respiratory Disease Diagnosis by leveraging Artificial Intelligence: A Survey

    Full text link
    Cough acoustics contain multitudes of vital information about pathomorphological alterations in the respiratory system. Reliable and accurate detection of cough events by investigating the underlying cough latent features and disease diagnosis can play an indispensable role in revitalizing the healthcare practices. The recent application of Artificial Intelligence (AI) and advances of ubiquitous computing for respiratory disease prediction has created an auspicious trend and myriad of future possibilities in the medical domain. In particular, there is an expeditiously emerging trend of Machine learning (ML) and Deep Learning (DL)-based diagnostic algorithms exploiting cough signatures. The enormous body of literature on cough-based AI algorithms demonstrate that these models can play a significant role for detecting the onset of a specific respiratory disease. However, it is pertinent to collect the information from all relevant studies in an exhaustive manner for the medical experts and AI scientists to analyze the decisive role of AI/ML. This survey offers a comprehensive overview of the cough data-driven ML/DL detection and preliminary diagnosis frameworks, along with a detailed list of significant features. We investigate the mechanism that causes cough and the latent cough features of the respiratory modalities. We also analyze the customized cough monitoring application, and their AI-powered recognition algorithms. Challenges and prospective future research directions to develop practical, robust, and ubiquitous solutions are also discussed in detail.Comment: 30 pages, 12 figures, 9 table

    Mathematical modelling ano optimization strategies for acoustic source localization in reverberant environments

    Get PDF
    La presente Tesis se centra en el uso de técnicas modernas de optimización y de procesamiento de audio para la localización precisa y robusta de personas dentro de un entorno reverberante dotado con agrupaciones (arrays) de micrófonos. En esta tesis se han estudiado diversos aspectos de la localización sonora, incluyendo el modelado, la algoritmia, así como el calibrado previo que permite usar los algoritmos de localización incluso cuando la geometría de los sensores (micrófonos) es desconocida a priori. Las técnicas existentes hasta ahora requerían de un número elevado de micrófonos para obtener una alta precisión en la localización. Sin embargo, durante esta tesis se ha desarrollado un nuevo método que permite una mejora de más del 30\% en la precisión de la localización con un número reducido de micrófonos. La reducción en el número de micrófonos es importante ya que se traduce directamente en una disminución drástica del coste y en un aumento de la versatilidad del sistema final. Adicionalmente, se ha realizado un estudio exhaustivo de los fenómenos que afectan al sistema de adquisición y procesado de la señal, con el objetivo de mejorar el modelo propuesto anteriormente. Dicho estudio profundiza en el conocimiento y modelado del filtrado PHAT (ampliamente utilizado en localización acústica) y de los aspectos que lo hacen especialmente adecuado para localización. Fruto del anterior estudio, y en colaboración con investigadores del instituto IDIAP (Suiza), se ha desarrollado un sistema de auto-calibración de las posiciones de los micrófonos a partir del ruido difuso presente en una sala en silencio. Esta aportación relacionada con los métodos previos basados en la coherencia. Sin embargo es capaz de reducir el ruido atendiendo a parámetros físicos previamente conocidos (distancia máxima entre los micrófonos). Gracias a ello se consigue una mejor precisión utilizando un menor tiempo de cómputo. El conocimiento de los efectos del filtro PHAT ha permitido crear un nuevo modelo que permite la representación 'sparse' del típico escenario de localización. Este tipo de representación se ha demostrado ser muy conveniente para localización, permitiendo un enfoque sencillo del caso en el que existen múltiples fuentes simultáneas. La última aportación de esta tesis, es el de la caracterización de las Matrices TDOA (Time difference of arrival -Diferencia de tiempos de llegada, en castellano-). Este tipo de matrices son especialmente útiles en audio pero no están limitadas a él. Además, este estudio transciende a la localización con sonido ya que propone métodos de reducción de ruido de las medias TDOA basados en una representación matricial 'low-rank', siendo útil, además de en localización, en técnicas tales como el beamforming o el autocalibrado

    Sonic interactions in virtual environments

    Get PDF
    This book tackles the design of 3D spatial interactions in an audio-centered and audio-first perspective, providing the fundamental notions related to the creation and evaluation of immersive sonic experiences. The key elements that enhance the sensation of place in a virtual environment (VE) are: Immersive audio: the computational aspects of the acoustical-space properties of Virutal Reality (VR) technologies Sonic interaction: the human-computer interplay through auditory feedback in VE VR systems: naturally support multimodal integration, impacting different application domains Sonic Interactions in Virtual Environments will feature state-of-the-art research on real-time auralization, sonic interaction design in VR, quality of the experience in multimodal scenarios, and applications. Contributors and editors include interdisciplinary experts from the fields of computer science, engineering, acoustics, psychology, design, humanities, and beyond. Their mission is to shape an emerging new field of study at the intersection of sonic interaction design and immersive media, embracing an archipelago of existing research spread in different audio communities and to increase among the VR communities, researchers, and practitioners, the awareness of the importance of sonic elements when designing immersive environments
    corecore