72 research outputs found
A new ADMM algorithm for the Euclidean median and its application to robust patch regression
The Euclidean Median (EM) of a set of points in an Euclidean space
is the point x minimizing the (weighted) sum of the Euclidean distances of x to
the points in . While there exits no closed-form expression for the EM,
it can nevertheless be computed using iterative methods such as the Wieszfeld
algorithm. The EM has classically been used as a robust estimator of centrality
for multivariate data. It was recently demonstrated that the EM can be used to
perform robust patch-based denoising of images by generalizing the popular
Non-Local Means algorithm. In this paper, we propose a novel algorithm for
computing the EM (and its box-constrained counterpart) using variable splitting
and the method of augmented Lagrangian. The attractive feature of this approach
is that the subproblems involved in the ADMM-based optimization of the
augmented Lagrangian can be resolved using simple closed-form projections. The
proposed ADMM solver is used for robust patch-based image denoising and is
shown to exhibit faster convergence compared to an existing solver.Comment: 5 pages, 3 figures, 1 table. To appear in Proc. IEEE International
Conference on Acoustics, Speech, and Signal Processing, April 19-24, 201
Bayesian image restoration and bacteria detection in optical endomicroscopy
Optical microscopy systems can be used to obtain high-resolution microscopic images of tissue cultures and ex vivo tissue samples. This imaging technique can be translated for in vivo, in situ applications by using optical fibres and miniature optics. Fibred optical endomicroscopy (OEM) can enable optical biopsy in organs inaccessible by any other imaging systems, and hence can provide rapid and accurate diagnosis in a short time. The raw data the system produce is difficult to interpret as it is modulated by a fibre bundle pattern, producing what is called the “honeycomb effect”. Moreover, the data is further degraded due to the fibre core cross coupling problem. On the other hand, there is an unmet clinical need for automatic tools that can help the clinicians to detect fluorescently labelled bacteria in distal lung images. The aim of this thesis is to develop advanced image processing algorithms that can address the above mentioned problems. First, we provide a statistical model for the fibre core cross coupling problem and the sparse sampling by imaging fibre bundles (honeycomb artefact), which are formulated here as a restoration problem for the first time in the literature. We then introduce a non-linear interpolation method, based on Gaussian processes regression, in order to recover an interpretable scene from the deconvolved data. Second, we develop two bacteria detection algorithms, each of which provides different characteristics. The first approach considers joint formulation to the sparse coding and anomaly detection problems. The anomalies here are considered as candidate bacteria, which are annotated with the help of a trained clinician. Although this approach provides good detection performance and outperforms existing methods in the literature, the user has to carefully tune some crucial model parameters. Hence, we propose a more adaptive approach, for which a Bayesian framework is adopted. This approach not only outperforms the proposed supervised approach and existing methods in the literature but also provides computation time that competes with optimization-based methods
Proceedings of the second "international Traveling Workshop on Interactions between Sparse models and Technology" (iTWIST'14)
The implicit objective of the biennial "international - Traveling Workshop on
Interactions between Sparse models and Technology" (iTWIST) is to foster
collaboration between international scientific teams by disseminating ideas
through both specific oral/poster presentations and free discussions. For its
second edition, the iTWIST workshop took place in the medieval and picturesque
town of Namur in Belgium, from Wednesday August 27th till Friday August 29th,
2014. The workshop was conveniently located in "The Arsenal" building within
walking distance of both hotels and town center. iTWIST'14 has gathered about
70 international participants and has featured 9 invited talks, 10 oral
presentations, and 14 posters on the following themes, all related to the
theory, application and generalization of the "sparsity paradigm":
Sparsity-driven data sensing and processing; Union of low dimensional
subspaces; Beyond linear and convex inverse problem; Matrix/manifold/graph
sensing/processing; Blind inverse problems and dictionary learning; Sparsity
and computational neuroscience; Information theory, geometry and randomness;
Complexity/accuracy tradeoffs in numerical methods; Sparsity? What's next?;
Sparse machine learning and inference.Comment: 69 pages, 24 extended abstracts, iTWIST'14 website:
http://sites.google.com/site/itwist1
Multi-frame reconstruction using super-resolution, inpainting, segmentation and codecs
In this thesis, different aspects of video and light field reconstruction are considered such as super-resolution, inpainting, segmentation and codecs. For this purpose, each of these strategies are analyzed based on a specific goal and a specific database. Accordingly, databases which are relevant to film industry, sport videos, light fields and hyperspectral videos are used for the sake of improvement.
This thesis is constructed around six related manuscripts, in which several approaches are proposed for multi-frame reconstruction. Initially, a novel multi-frame reconstruction strategy is proposed for lightfield super-resolution in which graph-based regularization is applied along with edge preserving filtering for improving the spatio-angular quality of lightfield. Second, a novel video reconstruction is proposed which is built based on compressive sensing (CS), Gaussian mixture models (GMM) and sparse 3D transform-domain block matching. The motivation of the proposed technique is the improvement in visual quality performance of the video frames and decreasing the reconstruction error in comparison with the former video reconstruction methods. In the next approach, student-t mixture models and edge preserving filtering are applied for the purpose of video super-resolution. Student-t mixture model has a heavy tail which makes it robust and suitable as a video frame patch prior and rich in terms of log likelihood for information retrieval. In another approach, a hyperspectral video database is considered, and a Bayesian dictionary learning process is used for hyperspectral video super-resolution. To that end, Beta process is used in Bayesian dictionary learning and a sparse coding is generated regarding the hyperspectral video super-resolution. The spatial super-resolution is followed by a spectral video restoration strategy, and the whole process leveraged two different dictionary learnings, in which the first one is trained for spatial super-resolution and the second one is trained for the spectral restoration.
Furthermore, in another approach, a novel framework is proposed for replacing advertisement contents in soccer videos in an automatic way by using deep learning strategies. For this purpose, a UNET architecture is applied (an image segmentation convolutional neural network technique) for content segmentation and detection. Subsequently, after reconstructing the segmented content in the video frames (considering the apparent loss in detection), the unwanted content is replaced by new one using a homography mapping procedure.
In addition, in another research work, a novel video compression framework is presented using autoencoder networks that encode and decode videos by using less chroma information than luma information. For this purpose, instead of converting Y'CbCr 4:2:2/4:2:0 videos to and from RGB 4:4:4, the video is kept in Y'CbCr 4:2:2/4:2:0 and merged the luma and chroma channels after the luma is downsampled to match the chroma size. An inverse function is performed for the decoder. The performance of these models is evaluated by using CPSNR, MS-SSIM, and VMAF metrics. The experiments reveal that, as compared to video compression involving conversion to and from RGB 4:4:4, the proposed method increases the video quality by about 5.5% for Y'CbCr 4:2:2 and 8.3% for Y'CbCr 4:2:0 while reducing the amount of computation by nearly 37% for Y'CbCr 4:2:2 and 40% for Y'CbCr 4:2:0.
The thread that ties these approaches together is reconstruction of the video and light field frames based on different aspects of problems such as having loss of information, blur in the frames, existing noise after reconstruction, existing unpleasant content, excessive size of information and high computational overhead. In three of the proposed approaches, we have used Plug-and-Play ADMM model for the first time regarding reconstruction of videos and light fields in order to address both information retrieval in the frames and tackling noise/blur at the same time. In two of the proposed models, we applied sparse dictionary learning to reduce the data dimension and demonstrate them as an efficient linear combination of basis frame patches. Two of the proposed approaches are developed in collaboration with industry, in which deep learning frameworks are used to handle large set of features and to learn high-level features from the data
Robust motion segmentation with subspace constraints
Motion segmentation is an important task in computer vision with
many applications such as dynamic scene understanding and
multi-body structure from motion. When the point correspondences
across frames are given, motion segmentation can be addressed as
a subspace clustering problem under an affine camera model. In
the first two parts of this thesis, we target the general
subspace clustering problem and propose two novel methods, namely
Efficient Dense Subspace Clustering (EDSC) and the Robust Shape
Interaction Matrix (RSIM) method.
Instead of following the standard compressive sensing approach,
in EDSC we formulate subspace clustering as a Frobenius norm
minimization problem, which inherently yields denser connections
between data points. While in the noise-free case we rely on the
self-expressiveness of the observations, in the presence of noise
we recover a clean dictionary to represent the data. Our
formulation lets us solve the subspace clustering problem
efficiently. More specifically, for outlier-free observations,
the solution can be obtained in closed-form, and in the presence
of outliers, we solve the problem by performing a series of
linear operations. Furthermore, we show that our Frobenius norm
formulation shares the same solution as the popular nuclear norm
minimization approach when the data is free of any noise.
In RSIM, we revisit the Shape Interaction Matrix (SIM) method,
one of the earliest approaches for motion segmentation (or
subspace clustering), and reveal its connections to several
recent subspace clustering methods. We derive a simple, yet
effective algorithm to robustify the SIM method and make it
applicable to real-world scenarios where the data is corrupted by
noise. We validate the proposed method by intuitive examples and
justify it with the matrix perturbation theory. Moreover, we show
that RSIM can be extended to handle missing data with a
Grassmannian gradient descent method.
The above subspace clustering methods work well for motion
segmentation, yet they require that point trajectories across
frames are known {\it a priori}. However, finding point
correspondences is in itself a challenging task. Existing
approaches tackle the correspondence estimation and motion
segmentation problems separately. In the third part of this
thesis, given a set of feature points detected in each frame of
the sequence, we develop an approach which simultaneously
performs motion segmentation and finds point correspondences
across the frames. We formulate this problem in terms of Partial
Permutation Matrices (PPMs) and aim to match feature descriptors
while simultaneously encouraging point trajectories to satisfy
subspace constraints. This lets us handle outliers in both point
locations and feature appearance. The resulting optimization
problem is solved via the Alternating Direction Method of
Multipliers (ADMM), where each subproblem has an efficient
solution. In particular, we show that most of the subproblems can
be solved in closed-form, and one binary assignment subproblem
can be solved by the Hungarian algorithm.
Obtaining reliable feature tracks in a frame-by-frame manner is
desirable in applications such as online motion segmentation. In
the final part of the thesis, we introduce a novel multi-body
feature tracker that exploits a multi-body rigidity assumption to
improve tracking robustness under a general perspective camera
model. A conventional approach to addressing this problem would
consist of alternating between solving two subtasks: motion
segmentation and feature tracking under rigidity constraints for
each segment. This approach, however, requires knowing the number
of motions, as well as assigning points to motion groups, which
is typically sensitive to motion estimates. By contrast, we
introduce a segmentation-free solution to multi-body feature
tracking that bypasses the motion assignment step and reduces to
solving a series of subproblems with closed-form solutions.
In summary, in this thesis, we exploit the powerful subspace
constraints and develop robust motion segmentation methods in
different challenging scenarios where the trajectories are either
given as input, or unknown beforehand. We also present a general
robust multi-body feature tracker which can be used as the first
step of motion segmentation to get reliable trajectories
Advanced Restoration Techniques for Images and Disparity Maps
With increasing popularity of digital cameras, the field of Computa-
tional Photography emerges as one of the most demanding areas of
research. In this thesis we study and develop novel priors and op-
timization techniques to solve inverse problems, including disparity
estimation and image restoration.
The disparity map estimation method proposed in this thesis incor-
porates multiple frames of a stereo video sequence to ensure temporal
coherency. To enforce smoothness, we use spatio-temporal connec-
tions between the pixels of the disparity map to constrain our solution.
Apart from smoothness, we enforce a consistency constraint for the
disparity assignments by using connections between the left and right
views. These constraints are then formulated in a graphical model,
which we solve using mean-field approximation. We use a filter-based
mean-field optimization that perform efficiently by updating the dis-
parity variables in parallel. The parallel updates scheme, however, is
not guaranteed to converge to a stationary point. To compare and
demonstrate the effectiveness of our approach, we developed a new
optimization technique that uses sequential updates, which runs ef-
ficiently and guarantees convergence. Our empirical results indicate
that with proper initialization, we can employ the parallel update
scheme and efficiently optimize our disparity maps without loss of
quality. Our method ranks amongst the state of the art in common
benchmarks, and significantly reduces the temporal flickering artifacts
in the disparity maps.
In the second part of this thesis, we address several image restora-
tion problems such as image deblurring, demosaicing and super-
resolution. We propose to use denoising autoencoders to learn an
approximation of the true natural image distribution. We parametrize
our denoisers using deep neural networks and show that they learn
the gradient of the smoothed density of natural images. Based on
this analysis, we propose a restoration technique that moves the so-
lution towards the local extrema of this distribution by minimizing
the difference between the input and output of our denoiser. Weii
demonstrate the effectiveness of our approach using a single trained
neural network in several restoration tasks such as deblurring and
super-resolution. In a more general framework, we define a new
Bayes formulation for the restoration problem, which leads to a more
efficient and robust estimator. The proposed framework achieves state
of the art performance in various restoration tasks such as deblurring
and demosaicing, and also for more challenging tasks such as noise-
and kernel-blind image deblurring.
Keywords. disparity map estimation, stereo matching, mean-field
optimization, graphical models, image processing, linear inverse prob-
lems, image restoration, image deblurring, image denoising, single
image super-resolution, image demosaicing, deep neural networks,
denoising autoencoder
What's in a Prior? Learned Proximal Networks for Inverse Problems
Proximal operators are ubiquitous in inverse problems, commonly appearing as
part of algorithmic strategies to regularize problems that are otherwise
ill-posed. Modern deep learning models have been brought to bear for these
tasks too, as in the framework of plug-and-play or deep unrolling, where they
loosely resemble proximal operators. Yet, something essential is lost in
employing these purely data-driven approaches: there is no guarantee that a
general deep network represents the proximal operator of any function, nor is
there any characterization of the function for which the network might provide
some approximate proximal. This not only makes guaranteeing convergence of
iterative schemes challenging but, more fundamentally, complicates the analysis
of what has been learned by these networks about their training data. Herein we
provide a framework to develop learned proximal networks (LPN), prove that they
provide exact proximal operators for a data-driven nonconvex regularizer, and
show how a new training strategy, dubbed proximal matching, provably promotes
the recovery of the log-prior of the true data distribution. Such LPN provide
general, unsupervised, expressive proximal operators that can be used for
general inverse problems with convergence guarantees. We illustrate our results
in a series of cases of increasing complexity, demonstrating that these models
not only result in state-of-the-art performance, but provide a window into the
resulting priors learned from data
Generative Models for Preprocessing of Hospital Brain Scans
I will in this thesis present novel computational methods for processing routine clinical brain scans. Such scans were originally acquired for qualitative assessment by trained radiologists, and present a number of difficulties for computational models, such as those within common neuroimaging analysis software. The overarching objective of this work is to enable efficient and fully automated analysis of large neuroimaging datasets, of the type currently present in many hospitals worldwide. The methods presented are based on probabilistic, generative models of the observed imaging data, and therefore rely on informative priors and realistic forward models. The first part of the thesis will present a model for image quality improvement, whose key component is a novel prior for multimodal datasets. I will demonstrate its effectiveness for super-resolving thick-sliced clinical MR scans and for denoising CT images and MR-based, multi-parametric mapping acquisitions. I will then show how the same prior can be used for within-subject, intermodal image registration, for more robustly registering large numbers of clinical scans. The second part of the thesis focusses on improved, automatic segmentation and spatial normalisation of routine clinical brain scans. I propose two extensions to a widely used segmentation technique. First, a method for this model to handle missing data, which allows me to predict entirely missing modalities from one, or a few, MR contrasts. Second, a principled way of combining the strengths of probabilistic, generative models with the unprecedented discriminative capability of deep learning. By introducing a convolutional neural network as a Markov random field prior, I can model nonlinear class interactions and learn these using backpropagation. I show that this model is robust to sequence and scanner variability. Finally, I show examples of fitting a population-level, generative model to various neuroimaging data, which can model, e.g., CT scans with haemorrhagic lesions
- …