59 research outputs found
Low-level spatiochromatic grouping for saliency estimation
We propose a saliency model termed SIM (saliency by induction mechanisms), which is based on a low-level spatiochromatic model that has successfully predicted chromatic induction phenomena. In so doing, we hypothesize that the low-level visual mechanisms that enhance or suppress image detail are also responsible for making some image regions more salient. Moreover, SIM adds geometrical grouplets to enhance complex low-level features such as corners, and suppress relatively simpler features such as edges. Since our model has been fitted on psychophysical chromatic induction data, it is largely nonparametric. SIM outperforms state-of-the-art methods in predicting eye fixations on two datasets and using two metrics
A Hierarchical Bayesian Model for Frame Representation
In many signal processing problems, it may be fruitful to represent the
signal under study in a frame. If a probabilistic approach is adopted, it
becomes then necessary to estimate the hyper-parameters characterizing the
probability distribution of the frame coefficients. This problem is difficult
since in general the frame synthesis operator is not bijective. Consequently,
the frame coefficients are not directly observable. This paper introduces a
hierarchical Bayesian model for frame representation. The posterior
distribution of the frame coefficients and model hyper-parameters is derived.
Hybrid Markov Chain Monte Carlo algorithms are subsequently proposed to sample
from this posterior distribution. The generated samples are then exploited to
estimate the hyper-parameters and the frame coefficients of the target signal.
Validation experiments show that the proposed algorithms provide an accurate
estimation of the frame coefficients and hyper-parameters. Application to
practical problems of image denoising show the impact of the resulting Bayesian
estimation on the recovered signal quality
Active Learning of Classification Models from Enriched Label-related Feedback
Our ability to learn accurate classification models from data is often limited by the number of available labeled data instances. This limitation is of particular concern when data instances need to be manually labeled by human annotators and when the labeling process carries a significant cost. Recent years witnessed increased research interest in developing methods in different directions capable of learning models from a smaller number of examples. One such direction is active learning, which finds the most informative unlabeled instances to be labeled next. Another, more recent direction showing a great promise utilizes enriched label-related feedback. In this case, such feedback from the human annotator provides additional information reflecting the relations among possible labels. The cost of such feedback is often negligible compared with the cost of instance review. The enriched label-related feedback may come in different forms. In this work, we propose, develop and study classification models for binary, multi-class and multi-label classification problems that utilize the different forms of enriched label-related feedback. We show that this new feedback can help us improve the quality of classification models compared with the standard class-label feedback. For each of the studied feedback forms, we also develop new active learning strategies for selecting the most informative unlabeled instances that are compatible with the respective feedback form, effectively combining two approaches for reducing the number of required labeled instances. We demonstrate the effectiveness of our new framework on both simulated and real-world datasets
Bayesian optimization for sparse neural networks with trainable activation functions
In the literature on deep neural networks, there is considerable interest in
developing activation functions that can enhance neural network performance. In
recent years, there has been renewed scientific interest in proposing
activation functions that can be trained throughout the learning process, as
they appear to improve network performance, especially by reducing overfitting.
In this paper, we propose a trainable activation function whose parameters need
to be estimated. A fully Bayesian model is developed to automatically estimate
from the learning data both the model weights and activation function
parameters. An MCMC-based optimization scheme is developed to build the
inference. The proposed method aims to solve the aforementioned problems and
improve convergence time by using an efficient sampling scheme that guarantees
convergence to the global maximum. The proposed scheme is tested on three
datasets with three different CNNs. Promising results demonstrate the
usefulness of our proposed approach in improving model accuracy due to the
proposed activation function and Bayesian estimation of the parameters
A Panorama on Multiscale Geometric Representations, Intertwining Spatial, Directional and Frequency Selectivity
The richness of natural images makes the quest for optimal representations in
image processing and computer vision challenging. The latter observation has
not prevented the design of image representations, which trade off between
efficiency and complexity, while achieving accurate rendering of smooth regions
as well as reproducing faithful contours and textures. The most recent ones,
proposed in the past decade, share an hybrid heritage highlighting the
multiscale and oriented nature of edges and patterns in images. This paper
presents a panorama of the aforementioned literature on decompositions in
multiscale, multi-orientation bases or dictionaries. They typically exhibit
redundancy to improve sparsity in the transformed domain and sometimes its
invariance with respect to simple geometric deformations (translation,
rotation). Oriented multiscale dictionaries extend traditional wavelet
processing and may offer rotation invariance. Highly redundant dictionaries
require specific algorithms to simplify the search for an efficient (sparse)
representation. We also discuss the extension of multiscale geometric
decompositions to non-Euclidean domains such as the sphere or arbitrary meshed
surfaces. The etymology of panorama suggests an overview, based on a choice of
partially overlapping "pictures". We hope that this paper will contribute to
the appreciation and apprehension of a stream of current research directions in
image understanding.Comment: 65 pages, 33 figures, 303 reference
Multivariate Dirichlet Moments and a Polychromatic Ewens Sampling Formula
We present an elementary non-recursive formula for the multivariate moments
of the Dirichlet distribution on the standard simplex, in terms of the pattern
inventory of the moments' exponents. We obtain analog formulas for the
multivariate moments of the Dirichlet-Ferguson and Gamma measures. We further
introduce a polychromatic analogue of Ewens sampling formula on colored integer
partitions, discuss its relation with suitable extensions of Hoppe's urn model
and of the Chinese restaurant process, and prove that it satisfies an adapted
notion of consistency in the sense of Kingman.Comment: 22 page
Adaptive Edge-guided Block-matching and 3D filtering (BM3D) Image Denoising Algorithm
Image denoising is a well studied field, yet reducing noise from images is still a valid challenge. Recently proposed Block-matching and 3D filtering (BM3D) is the current state of the art algorithm for denoising images corrupted by Additive White Gaussian noise (AWGN). Though BM3D outperforms all existing methods for AWGN denoising, still its performance decreases as the noise level increases in images, since it is harder to find proper match for reference blocks in the presence of highly corrupted pixel values. It also blurs sharp edges and textures. To overcome these problems we proposed an edge guided BM3D with selective pixel restoration. For higher noise levels it is possible to detect noisy pixels form its neighborhoods gray level statistics. We exploited this property to reduce noise as much as possible by applying a pre-filter. We also introduced an edge guided pixel restoration process in the hard-thresholding step of BM3D to restore the sharpness of edges and textures. Experimental results confirm that our proposed method is competitive and outperforms the state of the art BM3D in all considered subjective and objective quality measurements, particularly in preserving edges, textures and image contrast
Scene and Video Understanding
There have been significant improvements in the accuracy of scene understanding due to a shift from recognizing objects ``in isolation'' to context based recognition systems. Such systems improve recognition rates by augmenting appearance based models of individual objects with contextual information based on pairwise relationships between objects. These pairwise relations incorporate common sense world knowledge such as co-occurrences and spatial arrangements of objects, temporal consistency, scene layout, etc. However, these relations, even though consistent in the 3D world, change due to viewpoint of the scene. In this thesis, we investigate incorporating contextual information from three different perspectives for scene and video understanding (a) ``what'' contextual relations are useful and ``how'' they should be incorporated into Markov network during inference, (b) jointly solving the segmentation and recognition problem using a multiple segmentation framework based on contextual information in conjunction with appearance matching, and (c) proposing a discriminative spatio-temporal patch based representation for videos which incorporates contextual information for video understanding.
Our work departs from traditional view of incorporating context into scene understanding where a fixed model for context is learned. We argue that context is scene dependent and propose a data-driven approach to predict the importance of relationships and construct a Markov network for image analysis based on statistical models of global and local image features. Since all contextual information is not equally important, we also address the related problem of predicting the feature weights associated with each edge of a Markov network for evaluation of context. We then address the problem of fixed segmentation while modeling context by using a multiple segmentation framework and formulating the problem as ``a jigsaw puzzle''. We formulate the labeling problem as segment selection from a pool of segments (jigsaws), assigning each selected segment a class label. Previous multiple segmentation approaches used local appearance matching to select segments in a greedy manner. In contrast, our approach is based on a cost function that combines contextual information with appearance matching. A relaxed form of the cost function is minimized using an efficient quadratic programming solver.
Lastly, we propose a new representation for videos based on mid-level discriminative spatio-temporal patches. These patches might correspond to a primitive human action, a semantic object, or perhaps a random but informative spatiotemporal patch in the video. What define these spatiotemporal patches are their discriminative and representative properties. We automatically mine these patches from hundreds of training videos and experimentally demonstrate that these patches establish correspondence across videos. We propose a cost function that incorporates co-occurrence statistics and temporal context along with appearance matching to select subset of these patches for label transfer. Furthermore, these patches can be used as a discriminative vocabulary for action classification
- …