483 research outputs found

    Detection of acoustic events with application to environment monitoring

    Full text link
    The goal of this work is to present different detection techniques and its feasibility for detecting unknown acoustic signals with general applicability to different noise conditions. These conditions replicate those commonly found in real-world acoustic scenarios where information about the noise and signal characteristics is frequently lacking. For this purpose, different extensions of the energy detector and even new structures for improving the robustness in detection are considered and explained. Furthermore, three different research lines of application are presented in which the energy detector and its extensions are used to improve the localization accuracy and the classification rates of acoustic sounds.Moragues Escrivá, J.; Serrano Cartagena, A.; Lara Martínez, G.; Gosálbez Castillo, J.; Vergara Domínguez, L. (2012). Detection of acoustic events with application to environment monitoring. Waves. 4:25-33. http://hdl.handle.net/10251/56161S2533

    A pervasive body sensor network for monitoring post-operative recovery

    Get PDF
    Over the past decade, miniaturisation and cost reduction brought about by the semiconductor industry has led to computers smaller in size than a pin head, powerful enough to carry out the processing required, and affordable enough to be disposable. Similar technological advances in wireless communication, sensor design, and energy storage have resulted in the development of wireless “Body Sensor Network (BSN) platforms comprising of tiny integrated micro sensors with onboard processing and wireless data transfer capability, offering the prospect of pervasive and continuous home health monitoring. In surgery, the reduced trauma of minimally invasive interventions combined with initiatives to reduce length of hospital stay and a socioeconomic drive to reduce hospitalisation costs, have all resulted in a trend towards earlier discharge from hospital. There is now a real need for objective, pervasive, and continuous post-operative home recovery monitoring systems. Surgical recovery is a multi-faceted and dynamic process involving biological, physiological, functional, and psychological components. Functional recovery (physical independence, activities of daily living, and mobility) is recognised as a good global indicator of a patient’s post-operative course, but has traditionally been difficult to objectively quantify. This thesis outlines the development of a pervasive wireless BSN system to objectively monitor the functional recovery of post-operative patients at home. Biomechanical markers were identified as surrogate measures for activities of daily living and mobility impairment, and an ear-worn activity recognition (e-AR) sensor containing a three-axis accelerometer and a pulse oximeter was used to collect this data. A simulated home environment was created to test a Bayesian classifier framework with multivariate Gaussians to model activity classes. A real-time activity index was used to provide information on the intensity of activity being performed. Mobility impairment was simulated with bracing systems and a multiresolution wavelet analysis and margin-based feature selection framework was used to detect impaired mobility. The e-AR sensor was tested in a home environment before its clinical use in monitoring post-operative home recovery of real patients who have undergone surgery. Such a system may eventually form part of an objective pervasive home recovery monitoring system tailored to the needs of today’s post-operative patient.Open acces

    Deep Learning for Distant Speech Recognition

    Full text link
    Deep learning is an emerging technology that is considered one of the most promising directions for reaching higher levels of artificial intelligence. Among the other achievements, building computers that understand speech represents a crucial leap towards intelligent machines. Despite the great efforts of the past decades, however, a natural and robust human-machine speech interaction still appears to be out of reach, especially when users interact with a distant microphone in noisy and reverberant environments. The latter disturbances severely hamper the intelligibility of a speech signal, making Distant Speech Recognition (DSR) one of the major open challenges in the field. This thesis addresses the latter scenario and proposes some novel techniques, architectures, and algorithms to improve the robustness of distant-talking acoustic models. We first elaborate on methodologies for realistic data contamination, with a particular emphasis on DNN training with simulated data. We then investigate on approaches for better exploiting speech contexts, proposing some original methodologies for both feed-forward and recurrent neural networks. Lastly, inspired by the idea that cooperation across different DNNs could be the key for counteracting the harmful effects of noise and reverberation, we propose a novel deep learning paradigm called network of deep neural networks. The analysis of the original concepts were based on extensive experimental validations conducted on both real and simulated data, considering different corpora, microphone configurations, environments, noisy conditions, and ASR tasks.Comment: PhD Thesis Unitn, 201

    Acoustic event detection and localization using distributed microphone arrays

    Get PDF
    Automatic acoustic scene analysis is a complex task that involves several functionalities: detection (time), localization (space), separation, recognition, etc. This thesis focuses on both acoustic event detection (AED) and acoustic source localization (ASL), when several sources may be simultaneously present in a room. In particular, the experimentation work is carried out with a meeting-room scenario. Unlike previous works that either employed models of all possible sound combinations or additionally used video signals, in this thesis, the time overlapping sound problem is tackled by exploiting the signal diversity that results from the usage of multiple microphone array beamformers. The core of this thesis work is a rather computationally efficient approach that consists of three processing stages. In the first, a set of (null) steering beamformers is used to carry out diverse partial signal separations, by using multiple arbitrarily located linear microphone arrays, each of them composed of a small number of microphones. In the second stage, each of the beamformer output goes through a classification step, which uses models for all the targeted sound classes (HMM-GMM, in the experiments). Then, in a third stage, the classifier scores, either being intra- or inter-array, are combined using a probabilistic criterion (like MAP) or a machine learning fusion technique (fuzzy integral (FI), in the experiments). The above-mentioned processing scheme is applied in this thesis to a set of complexity-increasing problems, which are defined by the assumptions made regarding identities (plus time endpoints) and/or positions of sounds. In fact, the thesis report starts with the problem of unambiguously mapping the identities to the positions, continues with AED (positions assumed) and ASL (identities assumed), and ends with the integration of AED and ASL in a single system, which does not need any assumption about identities or positions. The evaluation experiments are carried out in a meeting-room scenario, where two sources are temporally overlapped; one of them is always speech and the other is an acoustic event from a pre-defined set. Two different databases are used, one that is produced by merging signals actually recorded in the UPCÂżs department smart-room, and the other consists of overlapping sound signals directly recorded in the same room and in a rather spontaneous way. From the experimental results with a single array, it can be observed that the proposed detection system performs better than either the model based system or a blind source separation based system. Moreover, the product rule based combination and the FI based fusion of the scores resulting from the multiple arrays improve the accuracies further. On the other hand, the posterior position assignment is performed with a very small error rate. Regarding ASL and assuming an accurate AED system output, the 1-source localization performance of the proposed system is slightly better than that of the widely-used SRP-PHAT system, working in an event-based mode, and it even performs significantly better than the latter one in the more complex 2-source scenario. Finally, though the joint system suffers from a slight degradation in terms of classification accuracy with respect to the case where the source positions are known, it shows the advantage of carrying out the two tasks, recognition and localization, with a single system, and it allows the inclusion of information about the prior probabilities of the source positions. It is worth noticing also that, although the acoustic scenario used for experimentation is rather limited, the approach and its formalism were developed for a general case, where the number and identities of sources are not constrained
    • …
    corecore