4,072 research outputs found

    Semi-supervised Deep Generative Modelling of Incomplete Multi-Modality Emotional Data

    Full text link
    There are threefold challenges in emotion recognition. First, it is difficult to recognize human's emotional states only considering a single modality. Second, it is expensive to manually annotate the emotional data. Third, emotional data often suffers from missing modalities due to unforeseeable sensor malfunction or configuration issues. In this paper, we address all these problems under a novel multi-view deep generative framework. Specifically, we propose to model the statistical relationships of multi-modality emotional data using multiple modality-specific generative networks with a shared latent space. By imposing a Gaussian mixture assumption on the posterior approximation of the shared latent variables, our framework can learn the joint deep representation from multiple modalities and evaluate the importance of each modality simultaneously. To solve the labeled-data-scarcity problem, we extend our multi-view model to semi-supervised learning scenario by casting the semi-supervised classification problem as a specialized missing data imputation task. To address the missing-modality problem, we further extend our semi-supervised multi-view model to deal with incomplete data, where a missing view is treated as a latent variable and integrated out during inference. This way, the proposed overall framework can utilize all available (both labeled and unlabeled, as well as both complete and incomplete) data to improve its generalization ability. The experiments conducted on two real multi-modal emotion datasets demonstrated the superiority of our framework.Comment: arXiv admin note: text overlap with arXiv:1704.07548, 2018 ACM Multimedia Conference (MM'18

    Deep learning-based anomalous object detection system powered by microcontroller for PTZ cameras

    Get PDF
    Automatic video surveillance systems are usually designed to detect anomalous objects being present in a scene or behaving dangerously. In order to perform adequately, they must incorporate models able to achieve accurate pattern recognition in an image, and deep learning neural networks excel at this task. However, exhaustive scan of the full image results in multiple image blocks or windows to analyze, which could make the time performance of the system very poor when implemented on low cost devices. This paper presents a system which attempts to detect abnormal moving objects within an area covered by a PTZ camera while it is panning. The decision about the block of the image to analyze is based on a mixture distribution composed of two components: a uniform probability distribution, which represents a blind random selection, and a mixture of Gaussian probability distributions. Gaussian distributions represent windows in the image where anomalous objects were detected previously and contribute to generate the next window to analyze close to those windows of interest. The system is implemented on a Raspberry Pi microcontroller-based board, which enables the design and implementation of a low-cost monitoring system that is able to perform image processing.Universidad de Málaga. Campus de Excelencia Internacional Andalucía Tech

    A machine learning approach to pedestrian detection for autonomous vehicles using High-Definition 3D Range Data

    Get PDF
    This article describes an automated sensor-based system to detect pedestrians in an autonomous vehicle application. Although the vehicle is equipped with a broad set of sensors, the article focuses on the processing of the information generated by a Velodyne HDL-64E LIDAR sensor. The cloud of points generated by the sensor (more than 1 million points per revolution) is processed to detect pedestrians, by selecting cubic shapes and applying machine vision and machine learning algorithms to the XY, XZ, and YZ projections of the points contained in the cube. The work relates an exhaustive analysis of the performance of three different machine learning algorithms: k-Nearest Neighbours (kNN), Naïve Bayes classifier (NBC), and Support Vector Machine (SVM). These algorithms have been trained with 1931 samples. The final performance of the method, measured a real traffic scenery, which contained 16 pedestrians and 469 samples of non-pedestrians, shows sensitivity (81.2%), accuracy (96.2%) and specificity (96.8%).This work was partially supported by ViSelTR (ref. TIN2012-39279) and cDrone (ref. TIN2013-45920-R) projects of the Spanish Government, and the “Research Programme for Groups of Scientific Excellence at Region of Murcia” of the Seneca Foundation (Agency for Science and Technology of the Region of Murcia—19895/GERM/15). 3D LIDAR has been funded by UPCA13-3E-1929 infrastructure projects of the Spanish Government. Diego Alonso wishes to thank the Spanish Ministerio de Educación, Cultura y Deporte, Subprograma Estatal de Movilidad, Plan Estatal de Investigación Científica y Técnica y de Innovación 2013–2016 for grant CAS14/00238

    Bibliographic Review on Distributed Kalman Filtering

    Get PDF
    In recent years, a compelling need has arisen to understand the effects of distributed information structures on estimation and filtering. In this paper, a bibliographical review on distributed Kalman filtering (DKF) is provided.\ud The paper contains a classification of different approaches and methods involved to DKF. The applications of DKF are also discussed and explained separately. A comparison of different approaches is briefly carried out. Focuses on the contemporary research are also addressed with emphasis on the practical applications of the techniques. An exhaustive list of publications, linked directly or indirectly to DKF in the open literature, is compiled to provide an overall picture of different developing aspects of this area

    Visual / acoustic detection and localisation in embedded systems

    Get PDF
    ©Cranfield UniversityThe continuous miniaturisation of sensing and processing technologies is increasingly offering a variety of embedded platforms, enabling the accomplishment of a broad range of tasks using such systems. Motivated by these advances, this thesis investigates embedded detection and localisation solutions using vision and acoustic sensors. Focus is particularly placed on surveillance applications using sensor networks. Existing vision-based detection solutions for embedded systems suffer from the sensitivity to environmental conditions. In the literature, there seems to be no algorithm able to simultaneously tackle all the challenges inherent to real-world videos. Regarding the acoustic modality, many research works have investigated acoustic source localisation solutions in distributed sensor networks. Nevertheless, it is still a challenging task to develop an ecient algorithm that deals with the experimental issues, to approach the performance required by these systems and to perform the data processing in a distributed and robust manner. The movement of scene objects is generally accompanied with sound emissions with features that vary from an environment to another. Therefore, considering the combination of the visual and acoustic modalities would offer a significant opportunity for improving the detection and/or localisation using the described platforms. In the light of the described framework, we investigate in the first part of the thesis the use of a cost-effective visual based method that can deal robustly with the issue of motion detection in static, dynamic and moving background conditions. For motion detection in static and dynamic backgrounds, we present the development and the performance analysis of a spatio- temporal form of the Gaussian mixture model. On the other hand, the problem of motion detection in moving backgrounds is addressed by accounting for registration errors in the captured images. By adopting a robust optimisation technique that takes into account the uncertainty about the visual measurements, we show that high detection accuracy can be achieved. In the second part of this thesis, we investigate solutions to the problem of acoustic source localisation using a trust region based optimisation technique. The proposed method shows an overall higher accuracy and convergence improvement compared to a linear-search based method. More importantly, we show that through characterising the errors in measurements, which is a common problem for such platforms, higher accuracy in the localisation can be attained. The last part of this work studies the different possibilities of combining visual and acoustic information in a distributed sensors network. In this context, we first propose to include the acoustic information in the visual model. The obtained new augmented model provides promising improvements in the detection and localisation processes. The second investigated solution consists in the fusion of the measurements coming from the different sensors. An evaluation of the accuracy of localisation and tracking using a centralised/decentralised architecture is conducted in various scenarios and experimental conditions. Results have shown the capability of this fusion approach to yield higher accuracy in the localisation and tracking of an active acoustic source than by using a single type of data
    corecore