166 research outputs found

    Suivi Multi-Locuteurs avec des Informations Audio-Visuelles pour la Perception des Robots

    Get PDF
    Robot perception plays a crucial role in human-robot interaction (HRI). Perception system provides the robot information of the surroundings and enables the robot to give feedbacks. In a conversational scenario, a group of people may chat in front of the robot and move freely. In such situations, robots are expected to understand where are the people, who are speaking, or what are they talking about. This thesis concentrates on answering the first two questions, namely speaker tracking and diarization. We use different modalities of the robot’s perception system to achieve the goal. Like seeing and hearing for a human-being, audio and visual information are the critical cues for a robot in a conversational scenario. The advancement of computer vision and audio processing of the last decade has revolutionized the robot perception abilities. In this thesis, we have the following contributions: we first develop a variational Bayesian framework for tracking multiple objects. The variational Bayesian framework gives closed-form tractable problem solutions, which makes the tracking process efficient. The framework is first applied to visual multiple-person tracking. Birth and death process are built jointly with the framework to deal with the varying number of the people in the scene. Furthermore, we exploit the complementarity of vision and robot motorinformation. On the one hand, the robot’s active motion can be integrated into the visual tracking system to stabilize the tracking. On the other hand, visual information can be used to perform motor servoing. Moreover, audio and visual information are then combined in the variational framework, to estimate the smooth trajectories of speaking people, and to infer the acoustic status of a person- speaking or silent. In addition, we employ the model to acoustic-only speaker localization and tracking. Online dereverberation techniques are first applied then followed by the tracking system. Finally, a variant of the acoustic speaker tracking model based on von-Mises distribution is proposed, which is specifically adapted to directional data. All the proposed methods are validated on datasets according to applications.La perception des robots joue un rĂŽle crucial dans l’interaction homme-robot (HRI). Le systĂšme de perception fournit les informations au robot sur l’environnement, ce qui permet au robot de rĂ©agir en consequence. Dans un scĂ©nario de conversation, un groupe de personnes peut discuter devant le robot et se dĂ©placer librement. Dans de telles situations, les robots sont censĂ©s comprendre oĂč sont les gens, ceux qui parlent et de quoi ils parlent. Cette thĂšse se concentre sur les deux premiĂšres questions, Ă  savoir le suivi et la diarisation des locuteurs. Nous utilisons diffĂ©rentes modalitĂ©s du systĂšme de perception du robot pour remplir cet objectif. Comme pour l’humain, l’ouie et la vue sont essentielles pour un robot dans un scĂ©nario de conversation. Les progrĂšs de la vision par ordinateur et du traitement audio de la derniĂšre dĂ©cennie ont rĂ©volutionnĂ© les capacitĂ©s de perception des robots. Dans cette thĂšse, nous dĂ©veloppons les contributions suivantes : nous dĂ©veloppons d’abord un cadre variationnel bayĂ©sien pour suivre plusieurs objets. Le cadre bayĂ©sien variationnel fournit des solutions explicites, rendant le processus de suivi trĂšs efficace. Cette approche est d’abord appliquĂ© au suivi visuel de plusieurs personnes. Les processus de crĂ©ations et de destructions sont en adĂ©quation avecle modĂšle probabiliste proposĂ© pour traiter un nombre variable de personnes. De plus, nous exploitons la complĂ©mentaritĂ© de la vision et des informations du moteur du robot : d’une part, le mouvement actif du robot peut ĂȘtre intĂ©grĂ© au systĂšme de suivi visuel pour le stabiliser ; d’autre part, les informations visuelles peuvent ĂȘtre utilisĂ©es pour effectuer l’asservissement du moteur. Par la suite, les informations audio et visuelles sont combinĂ©es dans le modĂšle variationnel, pour lisser les trajectoires et dĂ©duire le statut acoustique d’une personne : parlant ou silencieux. Pour experimenter un scenario oĂč l’informationvisuelle est absente, nous essayons le modĂšle pour la localisation et le suivi des locuteurs basĂ© sur l’information acoustique uniquement. Les techniques de dĂ©rĂ©verbĂ©ration sont d’abord appliquĂ©es, dont le rĂ©sultat est fourni au systĂšme de suivi. Enfin, une variante du modĂšle de suivi des locuteurs basĂ©e sur la distribution de von-Mises est proposĂ©e, celle-ci Ă©tant plus adaptĂ©e aux donnĂ©es directionnelles. Toutes les mĂ©thodes proposĂ©es sont validĂ©es sur des bases de donnĂ©es specifiques Ă  chaque application

    Neural and computational approaches to auditory scene analysis

    Get PDF
    Our perception of the world is highly dependent on the complex processing of the sensory inputs by the brain. Hearing is one of those seemingly effortless sensory tasks that enables us to perceive the auditory world and integrate acoustic information from the environment into cognitive experiences. The main purpose of studying auditory system is to shed light on the neural mechanisms underlying our hearing ability. Understanding the systematic approach of the brain in performing such complicated tasks is an ultimate goal with numerous clinical and intellectual applications. In this thesis, we take advantage of various experimental and computational approaches to understand the functionality of the brain in analyzing complex auditory scenes. We first focus on investigating the behavioral and neural mechanisms underlying auditory sound segregation, also known as auditory streaming. Employing an informational masking paradigm, we explore the interaction between stimulus-driven and task-driven attentional process in the auditory cortex using magnetoencephalography (MEG) recordings from the human brain. The results demonstrate close links between perceptual and neural consequences of the auditory stream segregation, suggesting the neural activity to be viewed as an indicator of the auditory streaming percept. We examine more realistic auditory scenarios consisted of two speakers simultaneously present in an auditory scene and introduce a novel computational approach for decoding the attentional state of listeners in such environment. The proposed model focuses on an efficient implementation of a decoder for tracking the cognitive state of the brain, inspired from neural representation of auditory objects in the auditory cortex. The structure is based on an state-space model with the recorded MEG signal and individual speech envelopes as the input and the probability of attending to the target speaker as the output of the model. The proposed approach benefits from accurate and highly resolved estimation of attentional state in time as well as the inherent model-based dynamic denoising of the underlying state-space model, which makes it possible to reliably decode the attentional state under very low SNR conditions. As part of this research work, we investigate the neural representation of ambiguous auditory stimuli at the level of the auditory cortex. In perceiving a typical auditory scene, we may receive incomplete or ambiguous auditory information from the environment. This can lead to multiple interpretations of the same acoustic scene and formation of an ambitious perceptual state in the brain. Here, in a series of experimental studies, we focus on a particular example of ambitious stimulus (ambitious Shepard tone pair) and investigate the neural correlates of the contextual effect and perceptual biasing using MEG. The results from psychoacoustic and neural recordings suggest a set of hypothesis about the underlying neural mechanism of short-term memory and expectation modulation in the nervous system

    Innovative Methods and Materials in Structural Health Monitoring of Civil Infrastructures

    Get PDF
    In the past, when elements in sructures were composed of perishable materials, such as wood, the maintenance of houses, bridges, etc., was considered of vital importance for their safe use and to preserve their efficiency. With the advent of materials such as reinforced concrete and steel, given their relatively long useful life, periodic and constant maintenance has often been considered a secondary concern. When it was realized that even for structures fabricated with these materials that the useful life has an end and that it was being approached, planning maintenance became an important and non-negligible aspect. Thus, the concept of structural health monitoring (SHM) was introduced, designed, and implemented as a multidisciplinary method. Computational mechanics, static and dynamic analysis of structures, electronics, sensors, and, recently, the Internet of Things (IoT) and artificial intelligence (AI) are required, but it is also important to consider new materials, especially those with intrinsic self-diagnosis characteristics, and to use measurement and survey methods typical of modern geomatics, such as satellite surveys and highly sophisticated laser tools

    Metamodel-based uncertainty quantification for the mechanical behavior of braided composites

    Get PDF
    The main design requirement for any high-performance structure is minimal dead weight. Producing lighter structures for aerospace and automotive industry directly leads to fuel efficiency and, hence, cost reduction. For wind energy, lighter wings allow larger rotor blades and, consequently, better performance. Prosthetic implants for missing body parts and athletic equipment such as rackets and sticks should also be lightweight for augmented functionality. Additional demands depending on the application, can very often be improved fatigue strength and damage tolerance, crashworthiness, temperature and corrosion resistance etc. Fiber-reinforced composite materials lie within the intersection of all the above requirements since they offer competing stiffness and ultimate strength levels at much lower weight than metals, and also high optimization and design potential due to their versatility. Braided composites are a special category with continuous fiber bundles interlaced around a preform. The automated braiding manufacturing process allows simultaneous material-structure assembly, and therefore, high-rate production with minimal material waste. The multi-step material processes and the intrinsic heterogeneity are the basic origins of the observed variability during mechanical characterization and operation of composite end-products. Conservative safety factors are applied during the design process accounting for uncertainties, even though stochastic modeling approaches lead to more rational estimations of structural safety and reliability. Such approaches require statistical modeling of the uncertain parameters which is quite expensive to be performed experimentally. A robust virtual uncertainty quantification framework is presented, able to integrate material and geometric uncertainties of different nature and statistically assess the response variability of braided composites in terms of effective properties. Information-passing multiscale algorithms are employed for high-fidelity predictions of stiffness and strength. In order to bypass the numerical cost of the repeated multiscale model evaluations required for the probabilistic approach, smart and efficient solutions should be applied. Surrogate models are, thus, trained to map manifolds at different scales and eventually substitute the finite element models. The use of machine learning is viable for uncertainty quantification, optimization and reliability applications of textile materials, but not straightforward for failure responses with complex response surfaces. Novel techniques based on variable-fidelity data and hybrid surrogate models are also integrated. Uncertain parameters are classified according to their significance to the corresponding response via variance-based global sensitivity analysis procedures. Quantification of the random properties in terms of mean and variance can be achieved by inverse approaches based on Bayesian inference. All stochastic and machine learning methods included in the framework are non-intrusive and data-driven, to ensure direct extensions towards more load cases and different materials. Moreover, experimental validation of the adopted multiscale models is presented and an application of stochastic recreation of random textile yarn distortions based on computed tomography data is demonstrated

    Stochastic Methods for Fine-Grained Image Segmentation and Uncertainty Estimation in Computer Vision

    Get PDF
    In this dissertation, we exploit concepts of probability theory, stochastic methods and machine learning to address three existing limitations of deep learning-based models for image understanding. First, although convolutional neural networks (CNN) have substantially improved the state of the art in image understanding, conventional CNNs provide segmentation masks that poorly adhere to object boundaries, a critical limitation for many potential applications. Second, training deep learning models requires large amounts of carefully selected and annotated data, but large-scale annotation of image segmentation datasets is often prohibitively expensive. And third, conventional deep learning models also lack the capability of uncertainty estimation, which compromises both decision making and model interpretability. To address these limitations, we introduce the Region Growing Refinement (RGR) algorithm, an unsupervised post-processing algorithm that exploits Monte Carlo sampling and pixel similarities to propagate high-confidence labels into regions of low-confidence classification. The probabilistic Region Growing Refinement (pRGR) provides RGR with a rigorous mathematical foundation that exploits concepts of Bayesian estimation and variance reduction techniques. Experiments demonstrate both the effectiveness of (p)RGR for the refinement of segmentation predictions, as well as its suitability for uncertainty estimation, since its variance estimates obtained in the Monte Carlo iterations are highly correlated with segmentation accuracy. We also introduce FreeLabel, an intuitive open-source web interface that exploits RGR to allow users to obtain high-quality segmentation masks with just a few freehand scribbles, in a matter of seconds. Designed to benefit the computer vision community, FreeLabel can be used for both crowdsourced or private annotation and has a modular structure that can be easily adapted for any image dataset. The practical relevance of methods developed in this dissertation are illustrated through applications on agricultural and healthcare-related domains. We have combined RGR and modern CNNs for fine segmentation of fruit flowers, motivated by the importance of automated bloom intensity estimation for optimization of fruit orchard management and, possibly, automatizing procedures such as flower thinning and pollination. We also exploited an early version of FreeLabel to annotate novel datasets for segmentation of fruit flowers, which are currently publicly available. Finally, this dissertation also describes works on fine segmentation and gaze estimation for images collected from assisted living environments, with the ultimate goal of assisting geriatricians in evaluating health status of patients in such facilities

    Engineering Dynamics and Life Sciences

    Get PDF
    From Preface: This is the fourteenth time when the conference “Dynamical Systems: Theory and Applications” gathers a numerous group of outstanding scientists and engineers, who deal with widely understood problems of theoretical and applied dynamics. Organization of the conference would not have been possible without a great effort of the staff of the Department of Automation, Biomechanics and Mechatronics. The patronage over the conference has been taken by the Committee of Mechanics of the Polish Academy of Sciences and Ministry of Science and Higher Education of Poland. It is a great pleasure that our invitation has been accepted by recording in the history of our conference number of people, including good colleagues and friends as well as a large group of researchers and scientists, who decided to participate in the conference for the first time. With proud and satisfaction we welcomed over 180 persons from 31 countries all over the world. They decided to share the results of their research and many years experiences in a discipline of dynamical systems by submitting many very interesting papers. This year, the DSTA Conference Proceedings were split into three volumes entitled “Dynamical Systems” with respective subtitles: Vibration, Control and Stability of Dynamical Systems; Mathematical and Numerical Aspects of Dynamical System Analysis and Engineering Dynamics and Life Sciences. Additionally, there will be also published two volumes of Springer Proceedings in Mathematics and Statistics entitled “Dynamical Systems in Theoretical Perspective” and “Dynamical Systems in Applications”

    6th International Conference on Mechanical Models in Structural Engineering

    Get PDF
    ProducciĂłn CientĂ­ficaThis ebook contains the 37 full papers submitted to the 6th International Conference on Mechanical Models in Structural Engineering (CMMOST 2021) held in Valladolid on December 2021

    Application of generative models in speech processing tasks

    Get PDF
    Generative probabilistic and neural models of the speech signal are shown to be effective in speech synthesis and speech enhancement, where generating natural and clean speech is the goal. This thesis develops two probabilistic signal processing algorithms based on the source-filter model of speech production, and two based on neural generative models of the speech signal. They are a model-based speech enhancement algorithm with ad-hoc microphone array, called GRAB; a probabilistic generative model of speech called PAT; a neural generative F0 model called TEReTA; and a Bayesian enhancement network, call BaWN, that incorporates a neural generative model of speech, called WaveNet. PAT and TEReTA aim to develop better generative models for speech synthesis. BaWN and GRAB aim to improve the naturalness and noise robustness of speech enhancement algorithms. Probabilistic Acoustic Tube (PAT) is a probabilistic generative model for speech, whose basis is the source-filter model. The highlights of the model are threefold. First, it is among the very first works to build a complete probabilistic model for speech. Second, it has a well-designed model for the phase spectrum of speech, which has been hard to model and often neglected. Third, it models the AM-FM effects in speech, which are perceptually significant but often ignored in frame-based speech processing algorithms. Experiments show that the proposed model has good potential for a number of speech processing tasks. TEReTA generates pitch contours by incorporating a theoretical model of pitch planning, the piece-wise linear target approximation (TA) model, as the output layer of a deep recurrent neural network. It aims to model semantic variations in the F0 contour, which is challenging for existing network. By combining the TA model, TEReTA is able to memorize semantic context and capture the semantic variations. Experiments on contrastive focus verify TEReTA's ability in semantics modeling. BaWN is a neural network based algorithm for single-channel enhancement. The biggest challenges of the neural network based speech enhancement algorithm are the poor generalizability to unseen noises and unnaturalness of the output speech. By incorporating a neural generative model, WaveNet, in the Bayesian framework, where WaveNet predicts the prior for speech, and where a separate enhancement network incorporates the likelihood function, BaWN is able to achieve satisfactory generalizability and a good intelligibility score of its output, even when the noisy training set is small. GRAB is a beamforming algorithm for ad-hoc microphone arrays. The task of enhancing speech with ad-hoc microphone array is challenging because of the inaccuracy in position and interference calibration. Inspired by the source-filter model, GRAB does not rely on any position or interference calibration. Instead, it incorporates a source-filter speech model and minimizes the energy that cannot be accounted for by the model. Objective and subjective evaluations on both simulated and real-world data show that GRAB is able to suppress noise effectively while keeping the speech natural and dry. Final chapters discuss the implications of this work for future research in speech processing
    • 

    corecore