34 research outputs found
Image Deblurring and Super-resolution by Adaptive Sparse Domain Selection and Adaptive Regularization
As a powerful statistical image modeling technique, sparse representation has
been successfully used in various image restoration applications. The success
of sparse representation owes to the development of l1-norm optimization
techniques, and the fact that natural images are intrinsically sparse in some
domain. The image restoration quality largely depends on whether the employed
sparse domain can represent well the underlying image. Considering that the
contents can vary significantly across different images or different patches in
a single image, we propose to learn various sets of bases from a pre-collected
dataset of example image patches, and then for a given patch to be processed,
one set of bases are adaptively selected to characterize the local sparse
domain. We further introduce two adaptive regularization terms into the sparse
representation framework. First, a set of autoregressive (AR) models are
learned from the dataset of example image patches. The best fitted AR models to
a given patch are adaptively selected to regularize the image local structures.
Second, the image non-local self-similarity is introduced as another
regularization term. In addition, the sparsity regularization parameter is
adaptively estimated for better image restoration performance. Extensive
experiments on image deblurring and super-resolution validate that by using
adaptive sparse domain selection and adaptive regularization, the proposed
method achieves much better results than many state-of-the-art algorithms in
terms of both PSNR and visual perception.Comment: 35 pages. This paper is under review in IEEE TI
Sparse Modeling for Image and Vision Processing
In recent years, a large amount of multi-disciplinary research has been
conducted on sparse models and their applications. In statistics and machine
learning, the sparsity principle is used to perform model selection---that is,
automatically selecting a simple model among a large collection of them. In
signal processing, sparse coding consists of representing data with linear
combinations of a few dictionary elements. Subsequently, the corresponding
tools have been widely adopted by several scientific communities such as
neuroscience, bioinformatics, or computer vision. The goal of this monograph is
to offer a self-contained view of sparse modeling for visual recognition and
image processing. More specifically, we focus on applications where the
dictionary is learned and adapted to data, yielding a compact representation
that has been successful in various contexts.Comment: 205 pages, to appear in Foundations and Trends in Computer Graphics
and Visio
Methods to Improve the Prediction Accuracy and Performance of Ensemble Models
The application of ensemble predictive models has been an important research area in predicting medical diagnostics, engineering diagnostics, and other related smart devices and
related technologies. Most of the current predictive models are complex and not reliable despite numerous efforts in the past by the research community. The performance accuracy of the predictive models have not always been realised due to many factors such as complexity and class imbalance. Therefore there is a need to improve the predictive accuracy of current ensemble models and to enhance their applications and reliability and non-visual predictive tools.
The research work presented in this thesis has adopted a pragmatic phased approach to propose and develop new ensemble models using multiple methods and validated the methods through rigorous testing and implementation in different phases. The first phase comprises of empirical investigations on standalone and ensemble algorithms that were carried out to ascertain their performance effects on complexity and simplicity of the classifiers. The second phase comprises of an improved ensemble model based on the integration of Extended Kalman Filter (EKF), Radial Basis Function Network (RBFN) and AdaBoost algorithms. The third phase comprises of an extended model based on early stop concepts, AdaBoost algorithm, and statistical performance of the training samples to minimize overfitting performance of the proposed model. The fourth phase comprises of an enhanced analytical multivariate logistic regression predictive model developed to minimize the complexity and improve prediction accuracy of logistic regression model.
To facilitate the practical application of the proposed models; an ensemble non-invasive analytical tool is proposed and developed. The tool links the gap between theoretical concepts and practical application of theories to predict breast cancer survivability. The empirical findings suggested that: (1) increasing the complexity and topology of algorithms does not necessarily lead to a better algorithmic performance, (2) boosting by resampling performs slightly better than boosting by reweighting, (3) the prediction accuracy of the proposed ensemble EKF-RBFN-AdaBoost model performed better than several established ensemble models, (4) the proposed early stopped model converges faster and minimizes overfitting better compare with other models, (5) the proposed multivariate logistic regression concept minimizes the complexity models (6) the performance of the proposed analytical non-invasive tool performed comparatively better than many of the benchmark analytical tools used in predicting breast cancers and diabetics ailments.
The research contributions to ensemble practice are: (1) the integration and development of EKF, RBFN and AdaBoost algorithms as an ensemble model, (2) the development and validation of ensemble model based on early stop concepts, AdaBoost, and statistical concepts of the training samples, (3) the development and validation of predictive logistic regression model based on breast cancer, and (4) the development and validation of a non-invasive breast cancer analytic tools based on the proposed and developed predictive models in this thesis.
To validate prediction accuracy of ensemble models, in this thesis the proposed models were applied in modelling breast cancer survivability and diabeticsâ diagnostic tasks. In comparison with other established models the simulation results of the models showed improved predictive accuracy.
The research outlines the benefits of the proposed models, whilst proposes new directions for future work that could further extend and improve the proposed models discussed in this
thesis
The Emerging Trends of Multi-Label Learning
Exabytes of data are generated daily by humans, leading to the growing need
for new efforts in dealing with the grand challenges for multi-label learning
brought by big data. For example, extreme multi-label classification is an
active and rapidly growing research area that deals with classification tasks
with an extremely large number of classes or labels; utilizing massive data
with limited supervision to build a multi-label classification model becomes
valuable for practical applications, etc. Besides these, there are tremendous
efforts on how to harvest the strong learning capability of deep learning to
better capture the label dependencies in multi-label learning, which is the key
for deep learning to address real-world classification tasks. However, it is
noted that there has been a lack of systemic studies that focus explicitly on
analyzing the emerging trends and new challenges of multi-label learning in the
era of big data. It is imperative to call for a comprehensive survey to fulfill
this mission and delineate future research directions and new applications.Comment: Accepted to TPAMI 202
Sparse and low-rank techniques for the efficient restoration of images
Image reconstruction is a key problem in numerous applications of computer vision and medical imaging. By removing noise and artifacts from corrupted images, or by enhancing the quality of low-resolution images, reconstruction methods are essential to provide high-quality images for these applications. Over the years, extensive research efforts have been invested toward the development of accurate and efficient approaches for this problem.
Recently, considerable improvements have been achieved by exploiting the principles of sparse representation and nonlocal self-similarity. However, techniques based on these principles often suffer from important limitations that impede their use in high-quality and large-scale applications. Thus, sparse representation approaches consider local patches during reconstruction, but ignore the global structure of the image. Likewise, because they average over groups of similar patches, nonlocal self-similarity methods tend to over-smooth images. Such methods can also be computationally expensive, requiring a hour or more to reconstruct a single image. Furthermore, existing reconstruction approaches consider either local patch-based regularization or global structure regularization, due to the complexity of combining both regularization strategies in a single model. Yet, such combined model could improve upon existing techniques by removing noise or reconstruction artifacts, while preserving both local details and global structure in the image. Similarly, current approaches rarely consider external information during the reconstruction process. When the structure to reconstruct is known, external information like statistical atlases or geometrical priors could also improve performance by guiding the reconstruction.
This thesis addresses limitations of the prior art through three distinct contributions. The first contribution investigates the histogram of image gradients as a powerful prior for image reconstruction. Due to the trade-off between noise removal and smoothing, image reconstruction techniques based on global or local regularization often over-smooth the image, leading to the loss of edges and textures. To alleviate this problem, we propose a novel prior for preserving the distribution of image gradients modeled as a histogram. This prior is combined with low-rank patch regularization in a single efficient model, which is then shown to improve reconstruction accuracy for the problems of denoising and deblurring.
The second contribution explores the joint modeling of local and global structure regularization for image restoration. Toward this goal, groups of similar patches are reconstructed simultaneously using an adaptive regularization technique based on the weighted nuclear norm. An innovative strategy, which decomposes the image into a smooth component and a sparse residual, is proposed to preserve global image structure. This strategy is shown to better exploit the property of structure sparsity than standard techniques like total variation. The proposed model is evaluated on the problems of completion and super-resolution, outperforming state-of-the-art approaches for these tasks.
Lastly, the third contribution of this thesis proposes an atlas-based prior for the efficient reconstruction of MR data. Although popular, image priors based on total variation and nonlocal patch similarity often over-smooth edges and textures in the image due to the uniform regularization of gradients. Unlike natural images, the spatial characteristics of medical images are often restricted by the target anatomical structure and imaging modality. Based on this principle, we propose a novel MRI reconstruction method that leverages external information in the form of an probabilistic atlas. This atlas controls the level of gradient regularization at each image location, via a weighted total-variation prior. The proposed method also exploits the redundancy of nonlocal similar patches through a sparse representation model. Experiments on a large scale dataset of T1-weighted images show this method to be highly competitive with the state-of-the-art
Feature Learning for RGB-D Data
RGB-D data has turned out to be a very useful representation for solving fundamental computer
vision problems. It takes the advantages of the color images that provide appearance
information of an object and also the depth image that is immune to the variations in color,
illumination, rotation angle and scale. With the invention of the low-cost Microsoft Kinect
sensor, which was initially used for gaming and later became a popular device for computer
vision, high quality RGB-D data can be acquired easily. RGB-D image/video can facilitate
a wide range of application areas, such as computer vision, robotics, construction and medical
imaging. Furthermore, how to fuse RGB information and depth information is still a
problem in computer vision. It is not enough to simply concatenate RGB data and depth
data together. A new fusion method could better fuse RGB images and depth images. It
still needs more powerful algorithms on this. In this thesis, to explore more advantages of
RGB-D data, we use some popular RGB-D datasets for deep feature learning algorithms
evaluation, hyper-parameter optimization, local multi-modal feature learning, RGB-D data
fusion and recognizing RGB information from RGB-D images: i)With the success of Deep
Neural Network in computer vision, deep features from fused RGB-D data can be proved to
gain better results than RGB data only. However, different deep learning algorithms show
different performance on different RGB-D datasets. Through large-scale experiments to
comprehensively evaluate the performance of deep feature learning models for RGB-D image/
video classification, we obtain the conclusion that RGB-D fusion methods using CNNs
always outperform other selected methods (DBNs, SDAE and LSTM). On the other side, since
LSTM can learn from experience to classify, process and predict time series, it achieved
better performances than DBN and SDAE in video classification tasks. ii) Hyper-parameter
optimization can help researchers quickly choose an initial set of hyper-parameters for a new
coming classification task, thus reducing the number of trials in terms of hyper-parameter
space. We present a simple and efficient framework for improving the efficiency and accuracy
of hyper-parameter optimization by considering the classification complexity of a
particular dataset. We verify this framework on three real-world RGB-D datasets. After
the analysis of experiments, we confirm that our framework can provide deeper insights
into the relationship between dataset classification tasks and hyperparameters optimization, thus quickly choosing an accurate initial set of hyper-parameters for a new coming classification
task. iii) We propose a new Convolutional Neural Networks (CNNs)-based local
multi-modal feature learning framework for RGB-D scene classification. This method can
effectively capture much of the local structure from the RGB-D scene images and automatically
learn a fusion strategy for the object-level recognition step instead of simply training a
classifier on top of features extracted from both modalities. Experiments are conducted on
two popular datasets to thoroughly test the performance of our method, which show that our
method with local multi-modal CNNs greatly outperforms state-of-the-art approaches. Our
method has the potential to improve RGB-D scene understanding. Some extended evaluation
shows that CNNs trained using a scene-centric dataset is able to achieve an improvement
on scene benchmarks compared to a network trained using an object-centric dataset.
iv) We propose a novel method for RGB-D data fusion. We project raw RGB-D data into
a complex space and then jointly extract features from the fused RGB-D images. Besides
three observations about the fusion methods, the experimental results also show that our
method achieves competing performance against the classical SIFT. v) We propose a novel
method called adaptive Visual-Depth Embedding (aVDE) which learns the compact shared
latent space between two representations of labeled RGB and depth modalities in the source
domain first. Then the shared latent space can help the transfer of the depth information to
the unlabeled target dataset. At last, aVDE matches features and reweights instances jointly
across the shared latent space and the projected target domain for an adaptive classifier. This
method can utilize the additional depth information in the source domain and simultaneously
reduce the domain mismatch between the source and target domains. On two real-world
image datasets, the experimental results illustrate that the proposed method significantly
outperforms the state-of-the-art methods
Recent Advances of Local Mechanisms in Computer Vision: A Survey and Outlook of Recent Work
Inspired by the fact that human brains can emphasize discriminative parts of
the input and suppress irrelevant ones, substantial local mechanisms have been
designed to boost the development of computer vision. They can not only focus
on target parts to learn discriminative local representations, but also process
information selectively to improve the efficiency. In terms of application
scenarios and paradigms, local mechanisms have different characteristics. In
this survey, we provide a systematic review of local mechanisms for various
computer vision tasks and approaches, including fine-grained visual
recognition, person re-identification, few-/zero-shot learning, multi-modal
learning, self-supervised learning, Vision Transformers, and so on.
Categorization of local mechanisms in each field is summarized. Then,
advantages and disadvantages for every category are analyzed deeply, leaving
room for exploration. Finally, future research directions about local
mechanisms have also been discussed that may benefit future works. To the best
our knowledge, this is the first survey about local mechanisms on computer
vision. We hope that this survey can shed light on future research in the
computer vision field
Discovery in Physics
Volume 2 covers knowledge discovery in particle and astroparticle physics. Instruments gather petabytes of data and machine learning is used to process the vast amounts of data and to detect relevant examples efficiently. The physical knowledge is encoded in simulations used to train the machine learning models. The interpretation of the learned models serves to expand the physical knowledge resulting in a cycle of theory enhancement
Statistical Models and Optimization Algorithms for High-Dimensional Computer Vision Problems
Data-driven and computational approaches are showing significant promise in solving several challenging problems in various fields such as bioinformatics, finance and many branches of engineering. In this dissertation, we explore the potential of these approaches, specifically statistical data models and optimization algorithms, for solving several challenging problems in computer vision. In doing so, we contribute to the literatures of both statistical data models and computer vision. In the context of statistical data models, we propose principled approaches for solving robust regression problems, both linear and kernel, and missing data matrix factorization problem. In computer vision, we propose statistically optimal and efficient algorithms for solving the remote face recognition and structure from motion (SfM) problems.
The goal of robust regression is to estimate the functional relation between two variables from a given data set which might be contaminated with outliers. Under the reasonable assumption that there are fewer outliers than inliers in a data set, we formulate the robust linear regression problem as a sparse learning problem, which can be solved using efficient polynomial-time algorithms. We also provide sufficient conditions under which the proposed algorithms correctly solve the robust regression problem. We then extend our robust formulation to the case of kernel regression, specifically to propose a robust version for relevance vector machine (RVM) regression.
Matrix factorization is used for finding a low-dimensional representation for data embedded in a high-dimensional space. Singular value decomposition is the standard algorithm for solving this problem. However, when the matrix has many missing elements this is a hard problem to solve. We formulate the missing data matrix factorization problem as a low-rank semidefinite programming problem (essentially a rank constrained SDP), which allows us to find accurate and efficient solutions for large-scale factorization problems.
Face recognition from remotely acquired images is a challenging problem because of variations due to blur and illumination. Using the convolution model for blur, we show that the set of all images obtained by blurring a given image forms a convex set. We then use convex optimization techniques to find the distances between a given blurred (probe) image and the gallery images to find the best match. Further, using a low-dimensional linear subspace model for illumination variations, we extend our theory in a similar fashion to recognize blurred and poorly illuminated faces.
Bundle adjustment is the final optimization step of the SfM problem where the goal is to obtain the 3-D structure of the observed scene and the camera parameters from multiple images of the scene. The traditional bundle adjustment algorithm, based on minimizing the l_2 norm of the image re-projection error, has cubic complexity in the number of unknowns. We propose an algorithm, based on minimizing the l_infinity norm of the re-projection error, that has quadratic complexity in the number of unknowns. This is achieved by reducing the large-scale optimization problem into many small scale sub-problems each of which can be solved using second-order cone programming