3,072 research outputs found

    Online Audio-Visual Multi-Source Tracking and Separation: A Labeled Random Finite Set Approach

    Get PDF
    The dissertation proposes an online solution for separating an unknown and time-varying number of moving sources using audio and visual data. The random finite set framework is used for the modeling and fusion of audio and visual data. This enables an online tracking algorithm to estimate the source positions and identities for each time point. With this information, a set of beamformers can be designed to separate each desired source and suppress the interfering sources

    Multi-sensor data fusion techniques for RPAS detect, track and avoid

    Get PDF
    Accurate and robust tracking of objects is of growing interest amongst the computer vision scientific community. The ability of a multi-sensor system to detect and track objects, and accurately predict their future trajectory is critical in the context of mission- and safety-critical applications. Remotely Piloted Aircraft System (RPAS) are currently not equipped to routinely access all classes of airspace since certified Detect-and-Avoid (DAA) systems are yet to be developed. Such capabilities can be achieved by incorporating both cooperative and non-cooperative DAA functions, as well as providing enhanced communications, navigation and surveillance (CNS) services. DAA is highly dependent on the performance of CNS systems for Detection, Tacking and avoiding (DTA) tasks and maneuvers. In order to perform an effective detection of objects, a number of high performance, reliable and accurate avionics sensors and systems are adopted including non-cooperative sensors (visual and thermal cameras, Laser radar (LIDAR) and acoustic sensors) and cooperative systems (Automatic Dependent Surveillance-Broadcast (ADS-B) and Traffic Collision Avoidance System (TCAS)). In this paper the sensors and system information candidates are fully exploited in a Multi-Sensor Data Fusion (MSDF) architecture. An Unscented Kalman Filter (UKF) and a more advanced Particle Filter (PF) are adopted to estimate the state vector of the objects based for maneuvering and non-maneuvering DTA tasks. Furthermore, an artificial neural network is conceptualised/adopted to exploit the use of statistical learning methods, which acts to combined information obtained from the UKF and PF. After describing the MSDF architecture, the key mathematical models for data fusion are presented. Conceptual studies are carried out on visual and thermal image fusion architectures

    Visual / acoustic detection and localisation in embedded systems

    Get PDF
    ©Cranfield UniversityThe continuous miniaturisation of sensing and processing technologies is increasingly offering a variety of embedded platforms, enabling the accomplishment of a broad range of tasks using such systems. Motivated by these advances, this thesis investigates embedded detection and localisation solutions using vision and acoustic sensors. Focus is particularly placed on surveillance applications using sensor networks. Existing vision-based detection solutions for embedded systems suffer from the sensitivity to environmental conditions. In the literature, there seems to be no algorithm able to simultaneously tackle all the challenges inherent to real-world videos. Regarding the acoustic modality, many research works have investigated acoustic source localisation solutions in distributed sensor networks. Nevertheless, it is still a challenging task to develop an ecient algorithm that deals with the experimental issues, to approach the performance required by these systems and to perform the data processing in a distributed and robust manner. The movement of scene objects is generally accompanied with sound emissions with features that vary from an environment to another. Therefore, considering the combination of the visual and acoustic modalities would offer a significant opportunity for improving the detection and/or localisation using the described platforms. In the light of the described framework, we investigate in the first part of the thesis the use of a cost-effective visual based method that can deal robustly with the issue of motion detection in static, dynamic and moving background conditions. For motion detection in static and dynamic backgrounds, we present the development and the performance analysis of a spatio- temporal form of the Gaussian mixture model. On the other hand, the problem of motion detection in moving backgrounds is addressed by accounting for registration errors in the captured images. By adopting a robust optimisation technique that takes into account the uncertainty about the visual measurements, we show that high detection accuracy can be achieved. In the second part of this thesis, we investigate solutions to the problem of acoustic source localisation using a trust region based optimisation technique. The proposed method shows an overall higher accuracy and convergence improvement compared to a linear-search based method. More importantly, we show that through characterising the errors in measurements, which is a common problem for such platforms, higher accuracy in the localisation can be attained. The last part of this work studies the different possibilities of combining visual and acoustic information in a distributed sensors network. In this context, we first propose to include the acoustic information in the visual model. The obtained new augmented model provides promising improvements in the detection and localisation processes. The second investigated solution consists in the fusion of the measurements coming from the different sensors. An evaluation of the accuracy of localisation and tracking using a centralised/decentralised architecture is conducted in various scenarios and experimental conditions. Results have shown the capability of this fusion approach to yield higher accuracy in the localisation and tracking of an active acoustic source than by using a single type of data

    Sensor based real-time process monitoring for ultra-precision manufacturing processes with non-linearity and non-stationarity

    Get PDF
    This research investigates methodologies for real-time process monitoring in ultra-precision manufacturing processes, specifically, chemical mechanical planarization (CMP) and ultra-precision machining (UPM), are investigated in this dissertation.The three main components of this research are as follows: (1) developing a predictive modeling approaches for early detection of process anomalies/change points, (2) devising approaches that can capture the non-Gaussian and non-stationary characteristics of CMP and UPM processes, and (3) integrating multiple sensor data to make more reliable process related decisions in real-time.In the first part, we establish a quantitative relationship between CMP process performance, such as material removal rate (MRR) and data acquired from wireless vibration sensors. Subsequently, a non-linear sequential Bayesian analysis is integrated with decision theoretic concepts for detection of CMP process end-point for blanket copper wafers. Using this approach, CMP polishing end-point was detected within a 5% error rate.Next, a non-parametric Bayesian analytical approach is utilized to capture the inherently complex, non-Gaussian, and non-stationary sensor signal patterns observed in CMP process. An evolutionary clustering analysis, called Recurrent Nested Dirichlet Process (RNDP) approach is developed for monitoring CMP process changes using MEMS vibration signals. Using this novel signal analysis approach, process drifts are detected within 20 milliseconds and is assessed to be 3-7 times faster than traditional SPC charts. This is very beneficial to the industry from an application standpoint, because, wafer yield losses will be mitigated to a great extent, if the onset of CMP process drifts can be detected timely and accurately.Lastly, a non-parametric Bayesian modeling approach, termed Dirichlet Process (DP) is combined with a multi-level hierarchical information fusion technique for monitoring of surface finish in UPM process. Using this approach, signal patterns from six different sensors (three axis vibration and force) are integrated based on information fusion theory. It was observed that using experimental UPM sensor data that process decisions based on the multiple sensor information fusion approach were 15%-30% more accurate than the decisions from individual sensors. This will enable more accurate and reliable estimation of process conditions in ultra-precision manufacturing applications

    Performance Analysis of Bearings-only Tracking Problems for Maneuvering Target and Heterogeneous Sensor Applications

    Get PDF
    State estimation, i.e. determining the trajectory, of a maneuvering target from noisy measurements collected by a single or multiple passive sensors (e.g. passive sonar and radar) has wide civil and military applications, for example underwater surveillance, air defence, wireless communications, and self-protection of military vehicles. These passive sensors are listening to target emitted signals without emitting signals themselves which give them concealing properties. Tactical scenarios exists where the own position shall not be revealed, e.g. for tracking submarines with passive sonar or tracking an aerial target by means of electro-optic image sensors like infrared sensors. This estimation process is widely known as bearings-only tracking. On the one hand, a challenge is the high degree of nonlinearity in the estimation process caused by the nonlinear relation of angular measurements to the Cartesian state. On the other hand, passive sensors cannot provide direct target location measurements, so bearings-only tracking suffers from poor target trajectory estimation accuracy due to marginal observability from sensor measurements. In order to achieve observability, that means to be able to estimate the complete target state, multiple passive sensor measurements must be fused. The measurements can be recorded spatially distributed by multiple dislocated sensor platforms or temporally distributed by a single, moving sensor platform. Furthermore, an extended case of bearings-only tracking is given if heterogeneous measurements from targets emitting different types of signals, are involved. With this, observability can also be achieved on a single, not necessarily moving platform. In this work, a performance bound for complex motion models, i.e. piecewisely maneuvering targets with unknown maneuver change times, by means of bearings-only measurements from a single, moving sensor platform is derived and an efficient estimator is implemented and analyzed. Furthermore, an observability analysis is carried out for targets emitting acoustic and electromagnetic signals. Here, the different signal propagation velocities can be exploited to ensure observability on a single, not necessarily moving platform. Based on the theoretical performance and observability analyses a distributed fusion system has been realized by means of heterogeneous sensors, which shall detect an event and localize a threat. This is performed by a microphone array to detect sound waves emitted by the threat as well as a radar detector that detects electromagnetic emissions from the threat. Since multiple platforms are involved to provide increased observability and also redundancy against possible breakdowns, a WiFi mobile ad hoc network is used for communications. In order to keep up the network in a breakdown OLSR (optimized link state routing) routing approach is employed

    Suivi Multi-Locuteurs avec des Informations Audio-Visuelles pour la Perception des Robots

    Get PDF
    Robot perception plays a crucial role in human-robot interaction (HRI). Perception system provides the robot information of the surroundings and enables the robot to give feedbacks. In a conversational scenario, a group of people may chat in front of the robot and move freely. In such situations, robots are expected to understand where are the people, who are speaking, or what are they talking about. This thesis concentrates on answering the first two questions, namely speaker tracking and diarization. We use different modalities of the robot’s perception system to achieve the goal. Like seeing and hearing for a human-being, audio and visual information are the critical cues for a robot in a conversational scenario. The advancement of computer vision and audio processing of the last decade has revolutionized the robot perception abilities. In this thesis, we have the following contributions: we first develop a variational Bayesian framework for tracking multiple objects. The variational Bayesian framework gives closed-form tractable problem solutions, which makes the tracking process efficient. The framework is first applied to visual multiple-person tracking. Birth and death process are built jointly with the framework to deal with the varying number of the people in the scene. Furthermore, we exploit the complementarity of vision and robot motorinformation. On the one hand, the robot’s active motion can be integrated into the visual tracking system to stabilize the tracking. On the other hand, visual information can be used to perform motor servoing. Moreover, audio and visual information are then combined in the variational framework, to estimate the smooth trajectories of speaking people, and to infer the acoustic status of a person- speaking or silent. In addition, we employ the model to acoustic-only speaker localization and tracking. Online dereverberation techniques are first applied then followed by the tracking system. Finally, a variant of the acoustic speaker tracking model based on von-Mises distribution is proposed, which is specifically adapted to directional data. All the proposed methods are validated on datasets according to applications.La perception des robots joue un rôle crucial dans l’interaction homme-robot (HRI). Le système de perception fournit les informations au robot sur l’environnement, ce qui permet au robot de réagir en consequence. Dans un scénario de conversation, un groupe de personnes peut discuter devant le robot et se déplacer librement. Dans de telles situations, les robots sont censés comprendre où sont les gens, ceux qui parlent et de quoi ils parlent. Cette thèse se concentre sur les deux premières questions, à savoir le suivi et la diarisation des locuteurs. Nous utilisons différentes modalités du système de perception du robot pour remplir cet objectif. Comme pour l’humain, l’ouie et la vue sont essentielles pour un robot dans un scénario de conversation. Les progrès de la vision par ordinateur et du traitement audio de la dernière décennie ont révolutionné les capacités de perception des robots. Dans cette thèse, nous développons les contributions suivantes : nous développons d’abord un cadre variationnel bayésien pour suivre plusieurs objets. Le cadre bayésien variationnel fournit des solutions explicites, rendant le processus de suivi très efficace. Cette approche est d’abord appliqué au suivi visuel de plusieurs personnes. Les processus de créations et de destructions sont en adéquation avecle modèle probabiliste proposé pour traiter un nombre variable de personnes. De plus, nous exploitons la complémentarité de la vision et des informations du moteur du robot : d’une part, le mouvement actif du robot peut être intégré au système de suivi visuel pour le stabiliser ; d’autre part, les informations visuelles peuvent être utilisées pour effectuer l’asservissement du moteur. Par la suite, les informations audio et visuelles sont combinées dans le modèle variationnel, pour lisser les trajectoires et déduire le statut acoustique d’une personne : parlant ou silencieux. Pour experimenter un scenario où l’informationvisuelle est absente, nous essayons le modèle pour la localisation et le suivi des locuteurs basé sur l’information acoustique uniquement. Les techniques de déréverbération sont d’abord appliquées, dont le résultat est fourni au système de suivi. Enfin, une variante du modèle de suivi des locuteurs basée sur la distribution de von-Mises est proposée, celle-ci étant plus adaptée aux données directionnelles. Toutes les méthodes proposées sont validées sur des bases de données specifiques à chaque application

    Audio‐Visual Speaker Tracking

    Get PDF
    Target motion tracking found its application in interdisciplinary fields, including but not limited to surveillance and security, forensic science, intelligent transportation system, driving assistance, monitoring prohibited area, medical science, robotics, action and expression recognition, individual speaker discrimination in multi‐speaker environments and video conferencing in the fields of computer vision and signal processing. Among these applications, speaker tracking in enclosed spaces has been gaining relevance due to the widespread advances of devices and technologies and the necessity for seamless solutions in real‐time tracking and localization of speakers. However, speaker tracking is a challenging task in real‐life scenarios as several distinctive issues influence the tracking process, such as occlusions and an unknown number of speakers. One approach to overcome these issues is to use multi‐modal information, as it conveys complementary information about the state of the speakers compared to single‐modal tracking. To use multi‐modal information, several approaches have been proposed which can be classified into two categories, namely deterministic and stochastic. This chapter aims at providing multimedia researchers with a state‐of‐the‐art overview of tracking methods, which are used for combining multiple modalities to accomplish various multimedia analysis tasks, classifying them into different categories and listing new and future trends in this field
    corecore