25,827 research outputs found

    Hierarchical Task Network Planning with Common-Sense Reasoning for Multiple-People Behaviour Analysis

    Get PDF
    Safety on public transport is a major concern for the relevant authorities. We address this issue by proposing an automated surveillance platform which combines data from video, infrared and pressure sensors. Data homogenisation and integration is achieved by a distributed architecture based on communication middleware that resolves interconnection issues, thereby enabling data modelling. A common-sense knowledge base models and encodes knowledge about public-transport platforms and the actions and activities of passengers. Trajectory data from passengers is modelled as a time-series of human activities. Common-sense knowledge and rules are then applied to detect inconsistencies or errors in the data interpretation. Lastly, the rationality that characterises human behaviour is also captured here through a bottom-up Hierarchical Task Network planner that, along with common-sense, corrects misinterpretations to explain passenger behaviour. The system is validated using a simulated bus saloon scenario as a case-study. Eighteen video sequences were recorded with up to six passengers. Four metrics were used to evaluate performance. The system, with an accuracy greater than 90% for each of the four metrics, was found to outperform a rule-base system and a system containing planning alone

    Multiple-Target Tracking in Complex Scenarios

    Get PDF
    In this dissertation, we develop computationally efficient algorithms for multiple-target tracking: MTT) in complex scenarios. For each of these scenarios, we develop measurement and state-space models, and then exploit the structure in these models to propose efficient tracking algorithms. In addition, we address design issues such as sensor selection and resource allocation. First, we consider MTT when the targets themselves are moving in a time-varying multipath environment. We develop a sparse-measurement model that allows us to exploit the inherent joint delay-Doppler diversity offered by the environment. We then reformulate the problem of MTT as a block-support recovery problem using the sparse measurement model. We exploit the structure of the dictionary matrix to develop a computationally efficient block support recovery algorithm: and thereby a multiple-target tracking algorithm) under the assumption that the channel state describing the time-varying multipath environment is known. Further, we also derive an upper bound on the overall error probability of wrongly identifying the support of the sparse signal. We then relax the assumption that the channel state is known. We develop a new particle filter called the Multiple Rao-Blackwellized Particle Filter: MRBPF) to jointly estimate both the target and the channel states. We also compute the posterior Cramér-Rao bound: PCRB) on the estimates of the target and the channel states and use the PCRB to find a suitable subset of antennas to be used for transmission in each tracking interval, as well as the power transmitted by these antennas. Second, we consider the problem of tracking an unknown number and types of targets using a multi-modal sensor network. In a multi-modal sensor network, different quantities associated with the same state are measured using sensors of different kinds. Hence, an efficient method that can suitably combine the diverse information measured by each sensor is required. We first develop a Hierarchical Particle Filter: HPF) to estimate the unknown state from the multi-modal measurements for a special class of problems which can be modeled hierarchically. We then model our problem of tracking using a hierarchical model and then use the proposed HPF for joint initiation, termination and tracking of multiple targets. The multi-modal data consists of the measurements collected from a radar, an infrared camera and a human scout. We also propose a unified framework for multi-modal sensor management that comprises sensor selection: SS), resource allocation: RA) and data fusion: DF). Our approach is inspired by the trading behavior of economic agents in commercial markets. We model the sensors and the sensor manager as economic agents, and the interaction among them as a double sided market with both consumers and producers. We propose an iterative double auction mechanism for computing the equilibrium of such a market. We relate the equilibrium point to the solutions of SS, RA and DF. Third, we address MTT problem in the presence of data association ambiguity that arises due to clutter. Data association corresponds to the problem of assigning a measurement to each target. We treat the data association and state estimation as separate subproblems. We develop a game-theoretic framework to solve the data association, in which we model each tracker as a player and the set of measurements as strategies. We develop utility functions for each player, and then use a regret-based learning algorithm to find the correlated equilibrium of this game. The game-theoretic approach allows us to associate measurements to all the targets simultaneously. We then use particle filtering on the reduced dimensional state of each target, independently

    Towards Automatic Speech Identification from Vocal Tract Shape Dynamics in Real-time MRI

    Full text link
    Vocal tract configurations play a vital role in generating distinguishable speech sounds, by modulating the airflow and creating different resonant cavities in speech production. They contain abundant information that can be utilized to better understand the underlying speech production mechanism. As a step towards automatic mapping of vocal tract shape geometry to acoustics, this paper employs effective video action recognition techniques, like Long-term Recurrent Convolutional Networks (LRCN) models, to identify different vowel-consonant-vowel (VCV) sequences from dynamic shaping of the vocal tract. Such a model typically combines a CNN based deep hierarchical visual feature extractor with Recurrent Networks, that ideally makes the network spatio-temporally deep enough to learn the sequential dynamics of a short video clip for video classification tasks. We use a database consisting of 2D real-time MRI of vocal tract shaping during VCV utterances by 17 speakers. The comparative performances of this class of algorithms under various parameter settings and for various classification tasks are discussed. Interestingly, the results show a marked difference in the model performance in the context of speech classification with respect to generic sequence or video classification tasks.Comment: To appear in the INTERSPEECH 2018 Proceeding
    • …
    corecore