19 research outputs found
Deep Learning Methods for Detection and Tracking of Particles in Fluorescence Microscopy Images
Studying the dynamics of sub-cellular structures such as receptors, filaments, and vesicles is a prerequisite for investigating cellular processes at the molecular level. In addition, it is important to characterize the dynamic behavior of virus structures to gain a better understanding of infection mechanisms and to develop novel drugs. To investigate the dynamics of fluorescently labeled sub-cellular and viral structures, time-lapse fluorescence microscopy is the most often used imaging technique. Due to the limited spatial resolution of microscopes caused by diffraction, these very small structures appear as bright, blurred spots, denoted as particles, in microscopy images. To draw statistically meaningful biological conclusions, a large number of such particles need to be analyzed. However, since manual analysis of fluorescent particles is very time consuming, fully automated computer-based methods are indispensable.
We introduce novel deep learning methods for detection and tracking of multiple particles in fluorescence microscopy images. We propose a particle detection method based on a convolutional neural network which performs image-to-image mapping by density map regression and uses the adaptive wing loss. For particle tracking, we present a recurrent neural network that exploits past and future information in both forward and backward direction. Assignment probabilities across multiple detections as well as the probabilities for missing detections are computed jointly. To resolve tracking ambiguities using future information, several track hypotheses are propagated to later time points. In addition, we developed a novel probabilistic deep learning method for particle tracking, which is based on a recurrent neural network mimicking classical Bayesian filtering. The method includes both aleatoric and epistemic uncertainty, and provides valuable information about the reliability of the computed trajectories. Short and long-term temporal dependencies of individual object dynamics are exploited for state prediction, and assigned detections are used to update the predicted states. Moreover, we developed a convolutional Long Short-Term Memory neural network for combined particle tracking and colocalization analysis in two-channel microscopy image sequences. The network determines colocalization probabilities, and colocalization information is exploited to improve tracking. Short and long-term temporal dependencies of object motion as well as image intensities are taken into account to compute assignment probabilities jointly across multiple detections. We also introduce a deep learning method for probabilistic particle detection and tracking. For particle detection, temporal information is integrated to regress a density map and determine sub-pixel particle positions. For tracking, a fully Bayesian neural network is presented that mimics classical Bayesian filtering and takes into account both aleatoric and epistemic uncertainty. Uncertainty information of individual particle detections is considered. Network training for the developed deep learning-based particle tracking methods relies only on synthetic data, avoiding the need of time-consuming manual annotation. We performed an extensive evaluation of our methods based on image data of the Particle Tracking Challenge as well as on fluorescence microscopy images displaying virus proteins of HCV and HIV, chromatin structures, and cell-surface receptors. It turned out that the methods outperform previous methods
Sparse variational regularization for visual motion estimation
The computation of visual motion is a key component in numerous computer vision tasks such as object detection, visual object tracking and activity recognition. Despite exten- sive research effort, efficient handling of motion discontinuities, occlusions and illumina- tion changes still remains elusive in visual motion estimation. The work presented in this thesis utilizes variational methods to handle the aforementioned problems because these methods allow the integration of various mathematical concepts into a single en- ergy minimization framework. This thesis applies the concepts from signal sparsity to the variational regularization for visual motion estimation. The regularization is designed in such a way that it handles motion discontinuities and can detect object occlusions
Robust Subspace Estimation via Low-Rank and Sparse Decomposition and Applications in Computer Vision
PhDRecent advances in robust subspace estimation have made dimensionality reduction and
noise and outlier suppression an area of interest for research, along with continuous
improvements in computer vision applications. Due to the nature of image and video
signals that need a high dimensional representation, often storage, processing, transmission,
and analysis of such signals is a difficult task. It is therefore desirable to obtain a
low-dimensional representation for such signals, and at the same time correct for corruptions,
errors, and outliers, so that the signals could be readily used for later processing.
Major recent advances in low-rank modelling in this context were initiated by the work of
Cand`es et al. [17] where the authors provided a solution for the long-standing problem of
decomposing a matrix into low-rank and sparse components in a Robust Principal Component
Analysis (RPCA) framework. However, for computer vision applications RPCA
is often too complex, and/or may not yield desirable results. The low-rank component
obtained by the RPCA has usually an unnecessarily high rank, while in certain tasks
lower dimensional representations are required. The RPCA has the ability to robustly
estimate noise and outliers and separate them from the low-rank component, by a sparse
part. But, it has no mechanism of providing an insight into the structure of the sparse
solution, nor a way to further decompose the sparse part into a random noise and a structured
sparse component that would be advantageous in many computer vision tasks. As
videos signals are usually captured by a camera that is moving, obtaining a low-rank
component by RPCA becomes impossible. In this thesis, novel Approximated RPCA
algorithms are presented, targeting different shortcomings of the RPCA. The Approximated
RPCA was analysed to identify the most time consuming RPCA solutions, and
replace them with simpler yet tractable alternative solutions. The proposed method is
able to obtain the exact desired rank for the low-rank component while estimating a
global transformation to describe camera-induced motion. Furthermore, it is able to
decompose the sparse part into a foreground sparse component, and a random noise
part that contains no useful information for computer vision processing. The foreground
sparse component is obtained by several novel structured sparsity-inducing norms, that
better encapsulate the needed pixel structure in visual signals. Moreover, algorithms for
reducing complexity of low-rank estimation have been proposed that achieve significant
complexity reduction without sacrificing the visual representation of video and image
information. The proposed algorithms are applied to several fundamental computer
vision tasks, namely, high efficiency video coding, batch image alignment, inpainting,
and recovery, video stabilisation, background modelling and foreground segmentation,
robust subspace clustering and motion estimation, face recognition, and ultra high definition
image and video super-resolution. The algorithms proposed in this thesis including
batch image alignment and recovery, background modelling and foreground segmentation,
robust subspace clustering and motion segmentation, and ultra high definition
image and video super-resolution achieve either state-of-the-art or comparable results to
existing methods
Online Audio-Visual Multi-Source Tracking and Separation: A Labeled Random Finite Set Approach
The dissertation proposes an online solution for separating an unknown and time-varying number of moving sources using audio and visual data. The random finite set framework is used for the modeling and fusion of audio and visual data. This enables an online tracking algorithm to estimate the source positions and identities for each time point. With this information, a set of beamformers can be designed to separate each desired source and suppress the interfering sources
Artificial Intelligence Tools for Facial Expression Analysis.
Inner emotions show visibly upon the human face and are understood as a basic guide to an individual’s inner world. It is, therefore, possible to determine a person’s attitudes and the effects of others’ behaviour on their deeper feelings through examining facial expressions. In real world applications, machines that interact with people need strong facial expression recognition. This recognition is seen to hold advantages for varied applications in affective computing, advanced human-computer interaction, security, stress and depression analysis, robotic systems, and machine learning. This thesis starts by proposing a benchmark of dynamic versus static methods for facial Action Unit (AU) detection. AU activation is a set of local individual facial muscle parts that occur in unison constituting a natural facial expression event. Detecting AUs automatically can provide explicit benefits since it considers both static and dynamic facial features. For this research, AU occurrence activation detection was conducted by extracting features (static and dynamic) of both nominal hand-crafted and deep learning representation from each static image of a video. This confirmed the superior ability of a pretrained model that leaps in performance. Next, temporal modelling was investigated to detect the underlying temporal variation phases using supervised and unsupervised methods from dynamic sequences. During these processes, the importance of stacking dynamic on top of static was discovered in encoding deep features for learning temporal information when combining the spatial and temporal schemes simultaneously. Also, this study found that fusing both temporal and temporal features will give more long term temporal pattern information. Moreover, we hypothesised that using an unsupervised method would enable the leaching of invariant information from dynamic textures. Recently, fresh cutting-edge developments have been created by approaches based on Generative Adversarial Networks (GANs). In the second section of this thesis, we propose a model based on the adoption of an unsupervised DCGAN for the facial features’ extraction and classification to achieve the following: the creation of facial expression images under different arbitrary poses (frontal, multi-view, and in the wild), and the recognition of emotion categories and AUs, in an attempt to resolve the problem of recognising the static seven classes of emotion in the wild. Thorough experimentation with the proposed cross-database performance demonstrates that this approach can improve the generalization results. Additionally, we showed that the features learnt by the DCGAN process are poorly suited to encoding facial expressions when observed under multiple views, or when trained from a limited number of positive examples. Finally, this research focuses on disentangling identity from expression for facial expression recognition. A novel technique was implemented for emotion recognition from a single monocular image. A large-scale dataset (Face vid) was created from facial image videos which were rich in variations and distribution of facial dynamics, appearance, identities, expressions, and 3D poses. This dataset was used to train a DCNN (ResNet) to regress the expression parameters from a 3D Morphable Model jointly with a back-end classifier
Recommended from our members
High dimensional information processing
Part I: Consider the n-dimensional vector y = Xβ + ǫ where β ∈ Rp has only k nonzero entries and ǫ ∈ Rn is a Gaussian noise. This can be viewed as a linear system with sparsity constraints corrupted by noise, where the objective is to estimate the sparsity pattern of β given the observation vector y and the measurement matrix X. First, we derive a non-asymptotic upper bound on the probability that a specific wrong sparsity pattern is identified by the maximum-likelihood estimator. We find that this probability depends (inversely) exponentially on the difference of kXβk2 and the ℓ2-norm of Xβ projected onto the range of columns of X indexed by the wrong sparsity pattern. Second, when X is randomly drawn from a Gaussian ensemble, we calculate a non-asymptotic upper bound on the probability of the maximum-likelihood decoder not declaring (partially) the true sparsity pattern. Consequently, we obtain sufficient conditions on the sample size n that guarantee almost surely the recovery of the true sparsity pattern. We find that the required growth rate of sample size n matches the growth rate of previously established necessary conditions. Part II: Estimating two-dimensional firing rate maps is a common problem, arising in a number of contexts: the estimation of place fields in hippocampus, the analysis of temporally nonstationary tuning curves in sensory and motor areas, the estimation of firing rates following spike-triggered covariance analyses, etc. Here we introduce methods based on Gaussian process nonparametric Bayesian techniques for estimating these two-dimensional rate maps. These techniques offer a number of advantages: the estimates may be computed efficiently, come equipped with natural errorbars, adapt their smoothness automatically to the local density and informativeness of the observed data, and permit direct fitting of the model hyperparameters (e.g., the prior smoothness of the rate map) via maximum marginal likelihood. We illustrate the flexibility and performance of the new techniques on a variety of simulated and real data. Part III: Many fundamental questions in theoretical neuroscience involve optimal decoding and the computation of Shannon information rates in populations of spiking neurons. In this paper, we apply methods from the asymptotic theory of statistical inference to obtain a clearer analytical understanding of these quantities. We find that for large neural populations carrying a finite total amount of information, the full spiking population response is asymptotically as informative as a single observation from a Gaussian process whose mean and covariance can be characterized explicitly in terms of network and single neuron properties. The Gaussian form of this asymptotic sufficient statistic allows us in certain cases to perform optimal Bayesian decoding by simple linear transformations, and to obtain closed-form expressions of the Shannon information carried by the network. One technical advantage of the theory is that it may be applied easily even to non-Poisson point process network models; for example, we find that under some conditions, neural populations with strong history-dependent (non-Poisson) effects carry exactly the same information as do simpler equivalent populations of non-interacting Poisson neurons with matched firing rates. We argue that our findings help to clarify some results from the recent literature on neural decoding and neuroprosthetic design. Part IV: A model of distributed parameter estimation in networks is introduced, where agents have access to partially informative measurements over time. Each agent faces a local identification problem, in the sense that it cannot consistently estimate the parameter in isolation. We prove that, despite local identification problems, if agents update their estimates recursively as a function of their neighbors’ beliefs, they can consistently estimate the true parameter provided that the communication network is strongly connected; that is, there exists an information path between any two agents in the network. We also show that the estimates of all agents are asymptotically normally distributed. Finally, we compute the asymptotic variance of the agents’ estimates in terms of their observation models and the network topology, and provide conditions under which the distributed estimators are as efficient as any centralized estimator
Super-resolution of 3-dimensional scenes
Super-resolution is an image enhancement method that increases the resolution of images and video. Previously this technique could only be applied to 2D scenes. The super-resolution algorithm developed in this thesis creates high-resolution views of 3-dimensional scenes, using low-resolution images captured from varying, unknown positions