1,645 research outputs found

    Detection of activity and position of speakers by using deep neural networks and acoustic data augmentation

    Get PDF
    The task of Speaker LOCalization (SLOC) has been the focus of numerous works in the research field, where SLOC is performed on pure speech data, requiring the presence of an Oracle Voice Activity Detection (VAD) algorithm. Nevertheless, this perfect working condition is not satisfied in a real world scenario, where employed VADs do commit errors. This work addresses this issue with an extensive analysis focusing on the relationship between several data-driven VAD and SLOC models, finally proposing a reliable framework for VAD and SLOC. The effectiveness of the approach here discussed is assessed against a multi-room scenario, which is close to a real-world environment. Furthermore, up to the authors’ best knowledge, only one contribution proposes a unique framework for VAD and SLOC acting in this addressed scenario; however, this solution does not rely on data-driven approaches. This work comes as an extension of the authors’ previous research addressing the VAD and SLOC tasks, by proposing numerous advancements to the original neural network architectures. In details, four different models based on convolutional neural networks (CNNs) are here tested, in order to easily highlight the advantages of the introduced novelties. In addition, two different CNN models go under study for SLOC. Furthermore, training of data-driven models is here improved through a specific data augmentation technique. During this procedure, the room impulse responses (RIRs) of two virtual rooms are generated from the knowledge of the room size, reverberation time and microphones and sources placement. Finally, the only other framework for simultaneous detection and localization in a multi-room scenario is here taken into account to fairly compare the proposed method. As result, the proposed method is more accurate than the baseline framework, and remarkable improvements are specially observed when the data augmentation techniques are applied for both the VAD and SLOC tasks

    Secure indoor navigation and operation of mobile robots

    Get PDF
    In future work environments, robots will navigate and work side by side to humans. This raises big challenges related to the safety of these robots. In this Dissertation, three tasks have been realized: 1) implementing a localization and navigation system based on StarGazer sensor and Kalman filter; 2) realizing a human-robot interaction system using Kinect sensor and BPNN and SVM models to define the gestures and 3) a new collision avoidance system is realized. The system works on generating the collision-free paths based on the interaction between the human and the robot.In zukünftigen Arbeitsumgebungen werden Roboter navigieren nebeneinander an Menschen. Das wirft Herausforderungen im Zusammenhang mit der Sicherheit dieser Roboter auf. In dieser Dissertation drei Aufgaben realisiert: 1. Implementierung eines Lokalisierungs und Navigationssystem basierend auf Kalman Filter: 2. Realisierung eines Mensch-Roboter-Interaktionssystem mit Kinect und AI zur Definition der Gesten und 3. ein neues Kollisionsvermeidungssystem wird realisiert. Das System arbeitet an der Erzeugung der kollisionsfreien Pfade, die auf der Wechselwirkung zwischen dem Menschen und dem Roboter basieren

    A Novel Electrocardiogram Segmentation Algorithm Using a Multiple Model Adaptive Estimator

    Get PDF
    This thesis presents a novel electrocardiogram (ECG) processing algorithm design based on a Multiple Model Adaptive Estimator (MMAE) for a physiological monitoring system. Twenty ECG signals from the MIT ECG database were used to develop system models for the MMAE. The P-wave, QRS complex, and T-wave segments from the characteristic ECG waveform were used to develop hypothesis filter banks. By adding a threshold filter-switching algorithm to the conventional MMAE implementation, the device mimics the way a human analyzer searches the complex ECG signal for a useable temporal landmark and then branches out to find the other key wave components and their timing. The twenty signals and an additional signal from an animal exsanuinaiton experiment were then used to test the algorithm. Using a conditional hypothesis-testing algorithm, the MMAE correctly identified the ECG signal segments corresponding to the hypothesis models with a 96.8% accuracy-rate for the 11539 possible segments tested. The robust MMAE algorithm also detected any misalignments in the filter hypotheses and automatically restarted filters within the MMAE to synchronize the hypotheses with the incoming signal. Finally, the MMAE selects the optimal filter bank based on incoming ECG measurements. The algorithm also provides critical heart-related information such as heart rate, QT, and PR intervals from the ECG signal. This analyzer could be easily added as a software update to the standard physiological monitors universally used in emergency vehicles and treatment facilities and potentially saving thousands of lives and reducing the pain and suffering of the injured

    Speech Modeling and Robust Estimation for Diagnosis of Parkinson’s Disease

    Get PDF

    Articulated human tracking and behavioural analysis in video sequences

    Get PDF
    Recently, there has been a dramatic growth of interest in the observation and tracking of human subjects through video sequences. Arguably, the principal impetus has come from the perceived demand for technological surveillance, however applications in entertainment, intelligent domiciles and medicine are also increasing. This thesis examines human articulated tracking and the classi cation of human movement, rst separately and then as a sequential process. First, this thesis considers the development and training of a 3D model of human body structure and dynamics. To process video sequences, an observation model is also designed with a multi-component likelihood based on edge, silhouette and colour. This is de ned on the articulated limbs, and visible from a single or multiple cameras, each of which may be calibrated from that sequence. Second, for behavioural analysis, we develop a methodology in which actions and activities are described by semantic labels generated from a Movement Cluster Model (MCM). Third, a Hierarchical Partitioned Particle Filter (HPPF) was developed for human tracking that allows multi-level parameter search consistent with the body structure. This tracker relies on the articulated motion prediction provided by the MCM at pose or limb level. Fourth, tracking and movement analysis are integrated to generate a probabilistic activity description with action labels. The implemented algorithms for tracking and behavioural analysis are tested extensively and independently against ground truth on human tracking and surveillance datasets. Dynamic models are shown to predict and generate synthetic motion, while MCM recovers both periodic and non-periodic activities, de ned either on the whole body or at the limb level. Tracking results are comparable with the state of the art, however the integrated behaviour analysis adds to the value of the approach.Overseas Research Students Awards Scheme (ORSAS
    • …
    corecore