761 research outputs found
Multiscale Discriminant Saliency for Visual Attention
The bottom-up saliency, an early stage of humans' visual attention, can be
considered as a binary classification problem between center and surround
classes. Discriminant power of features for the classification is measured as
mutual information between features and two classes distribution. The estimated
discrepancy of two feature classes very much depends on considered scale
levels; then, multi-scale structure and discriminant power are integrated by
employing discrete wavelet features and Hidden markov tree (HMT). With wavelet
coefficients and Hidden Markov Tree parameters, quad-tree like label structures
are constructed and utilized in maximum a posterior probability (MAP) of hidden
class variables at corresponding dyadic sub-squares. Then, saliency value for
each dyadic square at each scale level is computed with discriminant power
principle and the MAP. Finally, across multiple scales is integrated the final
saliency map by an information maximization rule. Both standard quantitative
tools such as NSS, LCC, AUC and qualitative assessments are used for evaluating
the proposed multiscale discriminant saliency method (MDIS) against the
well-know information-based saliency method AIM on its Bruce Database wity
eye-tracking data. Simulation results are presented and analyzed to verify the
validity of MDIS as well as point out its disadvantages for further research
direction.Comment: 16 pages, ICCSA 2013 - BIOCA sessio
Learning Gaussian Graphical Models with Observed or Latent FVSs
Gaussian Graphical Models (GGMs) or Gauss Markov random fields are widely
used in many applications, and the trade-off between the modeling capacity and
the efficiency of learning and inference has been an important research
problem. In this paper, we study the family of GGMs with small feedback vertex
sets (FVSs), where an FVS is a set of nodes whose removal breaks all the
cycles. Exact inference such as computing the marginal distributions and the
partition function has complexity using message-passing algorithms,
where k is the size of the FVS, and n is the total number of nodes. We propose
efficient structure learning algorithms for two cases: 1) All nodes are
observed, which is useful in modeling social or flight networks where the FVS
nodes often correspond to a small number of high-degree nodes, or hubs, while
the rest of the networks is modeled by a tree. Regardless of the maximum
degree, without knowing the full graph structure, we can exactly compute the
maximum likelihood estimate in if the FVS is known or in
polynomial time if the FVS is unknown but has bounded size. 2) The FVS nodes
are latent variables, where structure learning is equivalent to decomposing a
inverse covariance matrix (exactly or approximately) into the sum of a
tree-structured matrix and a low-rank matrix. By incorporating efficient
inference into the learning steps, we can obtain a learning algorithm using
alternating low-rank correction with complexity per
iteration. We also perform experiments using both synthetic data as well as
real data of flight delays to demonstrate the modeling capacity with FVSs of
various sizes
Local Color Voxel and Spatial Pattern for 3D Textured Recognition
3D textured retrieval including shape, color dan pattern is still a challenging research. Some approaches are proposed, but voxel-based approach has not much been made yet, where by using this approach, it still keeps both geometry and texture information. It also maps all 3D models into the same dimension. Based on this fact, a novel voxel pattern based is proposed by considering local pattern on a voxel called local color voxel pattern (LCVP). Voxels textured is observed by considering voxel to its neighbors. LCVP is computed around each voxel to its neighbors. LCVP value will indicate uniq pattern on each 3D models. LCVP also quantizes color on each voxel to generate a specific pattern. Shift and reflection circular also will be done. In an additional way, inspired by promising recent results from image processing, this paper also implement spatial pattern which utilizing Weber, Oriented Gradient to extract global spatial descriptor. Finally, a combination of local spectra and spatial and established global features approach called multi Fourier descriptor are proposed. For optimal retrieval, the rank combination is performed between local and global approaches. Experiments were performed by using dataset SHREC'13 and SHREC'14 and showed that the proposed method could outperform some performances to state-of-the-art
Graph Signal Processing: Overview, Challenges and Applications
Research in Graph Signal Processing (GSP) aims to develop tools for
processing data defined on irregular graph domains. In this paper we first
provide an overview of core ideas in GSP and their connection to conventional
digital signal processing. We then summarize recent developments in developing
basic GSP tools, including methods for sampling, filtering or graph learning.
Next, we review progress in several application areas using GSP, including
processing and analysis of sensor network data, biological data, and
applications to image processing and machine learning. We finish by providing a
brief historical perspective to highlight how concepts recently developed in
GSP build on top of prior research in other areas.Comment: To appear, Proceedings of the IEE
Combined Mutual Information of Intensity and Gradient for Multi-modal Medical Image Registration
In this thesis, registration methods for multi-modal medical images are reviewed with mutual information-based methods discussed in detail. Since it was proposed, mutual information has gained intensive research and is getting very popular, however its robustness is questionable and may fail in some cases. The possible reason might be it does not consider the spatial information in the image pair. In order to improve this measure, the thesis proposes to use combined mutual information of intensity and gradient for multi-modal medical image registration. The proposed measure utilizes both the intensity and gradient information of an image pair. Maximization of this measure is assumed to correctly register an image pair. Optimization of the registration measure in a multi-dimensional space is another major issue in multi-modal medical image registration. The thesis first briefly reviews the commonly used optimization techniques and then discusses in detail the Powell\u27s conjugate direction set method, which is implemented to find the maximum of the combined mutual information of an image pair. In the experiment, we first register slice images scanned in a single patient in the same or different scanning sessions by the proposed method. Then 20 pairs of co-registered CT and PET slice images at three different resolutions are used to study the performance of the proposed measure and four other measures discussed in this thesis. Experimental results indicate that the proposed combined measure produces reliable registrations and it outperforms the intensity- and gradient-based measures at all three resolutions
A DWT based perceptual video coding framework: concepts, issues and techniques
The work in this thesis explore the DWT based video coding by the introduction of a novel DWT (Discrete Wavelet Transform) / MC (Motion Compensation) / DPCM (Differential Pulse Code Modulation) video coding framework, which adopts the EBCOT as the coding engine for both the intra- and the inter-frame coder. The adaptive switching mechanism between the frame/field coding modes is investigated for this coding framework. The Low-Band-Shift (LBS) is employed for the MC in the DWT domain. The LBS based MC is proven to provide consistent improvement on the Peak Signal-to-Noise Ratio (PSNR) of the coded video over the simple Wavelet Tree (WT) based MC. The Adaptive Arithmetic Coding (AAC) is adopted to code the motion information. The context set of the Adaptive Binary Arithmetic Coding (ABAC) for the inter-frame data is redesigned based on the statistical analysis. To further improve the perceived picture quality, a Perceptual Distortion Measure (PDM) based on human vision model is used for the EBCOT of the intra-frame coder. A visibility assessment of the quantization error of various subbands in the DWT domain is performed through subjective tests. In summary, all these findings have solved the issues originated from the proposed perceptual video coding framework. They include: a working DWT/MC/DPCM video coding framework with superior coding efficiency on sequences with translational or head-shoulder motion; an adaptive switching mechanism between frame and field coding mode; an effective LBS based MC scheme in the DWT domain; a methodology of the context design for entropy coding of the inter-frame data; a PDM which replaces the MSE inside the EBCOT coding engine for the intra-frame coder, which provides improvement on the perceived quality of intra-frames; a visibility assessment to the quantization errors in the DWT domain
- …