129 research outputs found
Dynamical Hyperspectral Unmixing with Variational Recurrent Neural Networks
Multitemporal hyperspectral unmixing (MTHU) is a fundamental tool in the
analysis of hyperspectral image sequences. It reveals the dynamical evolution
of the materials (endmembers) and of their proportions (abundances) in a given
scene. However, adequately accounting for the spatial and temporal variability
of the endmembers in MTHU is challenging, and has not been fully addressed so
far in unsupervised frameworks. In this work, we propose an unsupervised MTHU
algorithm based on variational recurrent neural networks. First, a stochastic
model is proposed to represent both the dynamical evolution of the endmembers
and their abundances, as well as the mixing process. Moreover, a new model
based on a low-dimensional parametrization is used to represent spatial and
temporal endmember variability, significantly reducing the amount of variables
to be estimated. We propose to formulate MTHU as a Bayesian inference problem.
However, the solution to this problem does not have an analytical solution due
to the nonlinearity and non-Gaussianity of the model. Thus, we propose a
solution based on deep variational inference, in which the posterior
distribution of the estimated abundances and endmembers is represented by using
a combination of recurrent neural networks and a physically motivated model.
The parameters of the model are learned using stochastic backpropagation.
Experimental results show that the proposed method outperforms state of the art
MTHU algorithms
Gaussian mixture model classifiers for detection and tracking in UAV video streams.
Masters Degree. University of KwaZulu-Natal, Durban.Manual visual surveillance systems are subject to a high degree of human-error and operator fatigue. The automation of such systems often employs detectors, trackers and classifiers as fundamental building blocks. Detection, tracking and classification are especially useful and challenging in Unmanned Aerial Vehicle (UAV) based surveillance systems. Previous solutions have addressed challenges via complex classification methods. This dissertation proposes less complex Gaussian Mixture Model (GMM) based classifiers that can simplify the process; where data is represented as a reduced set of model parameters, and classification is performed in the low dimensionality parameter-space. The specification and adoption of GMM based classifiers on the UAV visual tracking feature space formed the principal contribution of the work. This methodology can be generalised to other feature spaces.
This dissertation presents two main contributions in the form of submissions to ISI accredited journals. In the first paper, objectives are demonstrated with a vehicle detector incorporating a two stage GMM classifier, applied to a single feature space, namely Histogram of Oriented Gradients (HoG). While the second paper demonstrates objectives with a vehicle tracker using colour histograms (in RGB and HSV), with Gaussian Mixture Model (GMM) classifiers and a Kalman filter.
The proposed works are comparable to related works with testing performed on benchmark datasets. In the tracking domain for such platforms, tracking alone is insufficient. Adaptive detection and classification can assist in search space reduction, building of knowledge priors and improved target representations. Results show that the proposed approach improves performance and robustness. Findings also indicate potential further enhancements such as a multi-mode tracker with global and local tracking based on a combination of both papers
Veni Vidi Vici, A Three-Phase Scenario For Parameter Space Analysis in Image Analysis and Visualization
Automatic analysis of the enormous sets of images is a critical task in life
sciences. This faces many challenges such as: algorithms are highly
parameterized, significant human input is intertwined, and lacking a standard
meta-visualization approach. This paper proposes an alternative iterative
approach for optimizing input parameters, saving time by minimizing the user
involvement, and allowing for understanding the workflow of algorithms and
discovering new ones. The main focus is on developing an interactive
visualization technique that enables users to analyze the relationships between
sampled input parameters and corresponding output. This technique is
implemented as a prototype called Veni Vidi Vici, or "I came, I saw, I
conquered." This strategy is inspired by the mathematical formulas of numbering
computable functions and is developed atop ImageJ, a scientific image
processing program. A case study is presented to investigate the proposed
framework. Finally, the paper explores some potential future issues in the
application of the proposed approach in parameter space analysis in
visualization
Large Scale Inverse Problems
This book is thesecond volume of a three volume series recording the "Radon Special Semester 2011 on Multiscale Simulation & Analysis in Energy and the Environment" that took placein Linz, Austria, October 3-7, 2011. This volume addresses the common ground in the mathematical and computational procedures required for large-scale inverse problems and data assimilation in forefront applications. The solution of inverse problems is fundamental to a wide variety of applications such as weather forecasting, medical tomography, and oil exploration. Regularisation techniques are needed to ensure solutions of sufficient quality to be useful, and soundly theoretically based. This book addresses the common techniques required for all the applications, and is thus truly interdisciplinary. This collection of survey articles focusses on the large inverse problems commonly arising in simulation and forecasting in the earth sciences
MEG and fMRI Fusion for Non-Linear Estimation of Neural and BOLD Signal Changes
The combined analysis of magnetoencephalography (MEG)/electroencephalography and functional magnetic resonance imaging (fMRI) measurements can lead to improvement in the description of the dynamical and spatial properties of brain activity. In this paper we empirically demonstrate this improvement using simulated and recorded task related MEG and fMRI activity. Neural activity estimates were derived using a dynamic Bayesian network with continuous real valued parameters by means of a sequential Monte Carlo technique. In synthetic data, we show that MEG and fMRI fusion improves estimation of the indirectly observed neural activity and smooths tracking of the blood oxygenation level dependent (BOLD) response. In recordings of task related neural activity the combination of MEG and fMRI produces a result with greater signal-to-noise ratio, that confirms the expectation arising from the nature of the experiment. The highly non-linear model of the BOLD response poses a difficult inference problem for neural activity estimation; computational requirements are also high due to the time and space complexity. We show that joint analysis of the data improves the system's behavior by stabilizing the differential equations system and by requiring fewer computational resources
Multichannel source separation and tracking with phase differences by random sample consensus
Blind audio source separation (BASS) is a fascinating problem that has been tackled from many different angles. The use case of interest in this thesis is that of multiple moving and simultaneously-active speakers in a reverberant room. This is a common situation, for example, in social gatherings. We human beings have the remarkable ability to focus attention on a particular speaker while effectively ignoring the rest. This is referred to as the ``cocktail party effect'' and has been the holy grail of source separation for many decades. Replicating this feat in real-time with a machine is the goal of BASS.
Single-channel methods attempt to identify the individual speakers from a single recording. However, with the advent of hand-held consumer electronics, techniques based on microphone array processing are becoming increasingly popular. Multichannel methods record a sound field from various locations to incorporate spatial information. If the speakers move over time, we need an algorithm capable of tracking their positions in the room. For compact arrays with 1-10 cm of separation between the microphones, this can be accomplished by applying a temporal filter on estimates of the directions-of-arrival (DOA) of the speakers.
In this thesis, we review recent work on BSS with inter-channel phase difference (IPD) features and provide extensions to the case of moving speakers. It is shown that IPD features compose a noisy circular-linear dataset. This data is clustered with the RANdom SAmple Consensus (RANSAC) algorithm in the presence of strong reverberation to simultaneously localize and separate speakers. The remarkable performance of RANSAC is due to its natural tendency to reject outliers. To handle the case of non-stationary speakers, a factorial wrapped Kalman filter (FWKF) and a factorial von Mises-Fisher particle filter (FvMFPF) are proposed that track source DOAs directly on the unit circle and unit sphere, respectively. These algorithms combine directional statistics, Bayesian filtering theory, and probabilistic data association techniques to track the speakers with mixtures of directional distributions
Multimodal methods for blind source separation of audio sources
The enhancement of the performance of frequency domain convolutive
blind source separation (FDCBSS) techniques when applied to the
problem of separating audio sources recorded in a room environment
is the focus of this thesis. This challenging application is termed the
cocktail party problem and the ultimate aim would be to build a machine
which matches the ability of a human being to solve this task.
Human beings exploit both their eyes and their ears in solving this task
and hence they adopt a multimodal approach, i.e. they exploit both
audio and video modalities. New multimodal methods for blind source
separation of audio sources are therefore proposed in this work as a
step towards realizing such a machine.
The geometry of the room environment is initially exploited to improve
the separation performance of a FDCBSS algorithm. The positions
of the human speakers are monitored by video cameras and this
information is incorporated within the FDCBSS algorithm in the form
of constraints added to the underlying cross-power spectral density
matrix-based cost function which measures separation performance. [Continues.
Review on Sparse-Based Multipath Estimation and Mitigation: Intense Solution to Counteract the Effects in Software GPS Receivers
Multipath is the major concern in GPS receivers that fade the actual GPS signal causes positioning error up to 10 m so special care need to be taken to mitigate the multipath effects. Numerous methods like hardware based antenna arrays technique, receiver based narrow correlator receiver, double -delta discriminator, Adaptive Multipath Estimator, Wavelet Transformation and Particle filter, Kalman filter based post receiver methods etc. used to resolve the problem. But some of the methods can only reduce code multipath error but not effective in eliminating carrier multipath error. Most of these techniques are based on the assumption that the Line-of-Sight (LOS) signal is stronger than the Non-Line of-Sight (NLOS) signals. However, in the scenarios where the LOS signal is weaker than the composite multipath signal, this approach may result in a bias in code tracking. In this chapter, different types of multipath mitigation and its limitation are described. The recent development in sparse signal processing based blind channel estimation is investigated to compensate the multipath error. The Rayleigh and Rician fading model with different multipath parameters are simulated to test the urban scenario. The inverse problem of finding the GPS signal is addressed based on the deconvolution approach. To solve linear inverse problems, the suitable kind of appropriate objective function has been formulated to find the signal of interest. By exploiting this methods, the signal is observed and the carrier and code tracking loop parameters are computed with minimal error
Bayesian framework for multiple acoustic source tracking
Acoustic source (speaker) tracking in the room environment plays an important role in many
speech and audio applications such as multimedia, hearing aids and hands-free speech communication
and teleconferencing systems; the position information can be fed into a higher
processing stage for high-quality speech acquisition, enhancement of a specific speech signal
in the presence of other competing talkers, or keeping a camera focused on the speaker in
a video-conferencing scenario. Most of existing systems focus on the single source tracking
problem, which assumes one and only one source is active all the time, and the state to be estimated
is simply the source position. However, in practical scenarios, multiple speakers may
be simultaneously active, and the tracking algorithm should be able to localise each individual
source and estimate the number of sources. This thesis contains three contributions towards
solutions to multiple acoustic source tracking in a moderate noisy and reverberant environment.
The first contribution of this thesis is proposing a time-delay of arrival (TDOA) estimation
approach for multiple sources. Although the phase transform (PHAT) weighted generalised
cross-correlation (GCC) method has been employed to extract the TDOAs of multiple sources,
it is primarily used for a single source scenario and its performance for multiple TDOA estimation
has not been comprehensively studied. The proposed approach combines the degenerate
unmixing estimation technique (DUET) and GCC method. Since the speech mixtures are assumed
window-disjoint orthogonal (WDO) in the time-frequency domain, the spectrograms can
be separated by employing DUET, and the GCC method can then be applied to the spectrogram
of each individual source. The probabilities of detection and false alarm are also proposed to
evaluate the TDOA estimation performance under a series of experimental parameters.
Next, considering multiple acoustic sources may appear nonconcurrently, an extended Kalman
particle filtering (EKPF) is developed for a special multiple acoustic source tracking problem,
namely “nonconcurrent multiple acoustic tracking (NMAT)”. The extended Kalman filter
(EKF) is used to approximate the optimum weights, and the subsequent particle filtering (PF)
naturally takes the previous position estimates as well as the current TDOA measurements into
account. The proposed approach is thus able to lock on the sharp change of the source position
quickly, and avoid the tracking-lag in the general sequential importance resampling (SIR) PF.
Finally, these investigations are extended into an approach to track the multiple unknown and
time-varying number of acoustic sources. The DUET-GCC method is used to obtain the TDOA
measurements for multiple sources and a random finite set (RFS) based Rao-blackwellised PF
is employed and modified to track the sources. Each particle has a RFS form encapsulating
the states of all sources and is capable of addressing source dynamics: source survival, new
source appearance and source deactivation. A data association variable is defined to depict the
source dynamic and its relation to the measurements. The Rao-blackwellisation step is used
to decompose the state: the source positions are marginalised by using an EKF, and only the
data association variable needs to be handled by a PF. The performances of all the proposed
approaches are extensively studied under different noisy and reverberant environments, and are
favorably comparable with the existing tracking techniques
- …