2,969 research outputs found
Multi-scale Discriminant Saliency with Wavelet-based Hidden Markov Tree Modelling
The bottom-up saliency, an early stage of humans' visual attention, can be
considered as a binary classification problem between centre and surround
classes. Discriminant power of features for the classification is measured as
mutual information between distributions of image features and corresponding
classes . As the estimated discrepancy very much depends on considered scale
level, multi-scale structure and discriminant power are integrated by employing
discrete wavelet features and Hidden Markov Tree (HMT). With wavelet
coefficients and Hidden Markov Tree parameters, quad-tree like label structures
are constructed and utilized in maximum a posterior probability (MAP) of hidden
class variables at corresponding dyadic sub-squares. Then, a saliency value for
each square block at each scale level is computed with discriminant power
principle. Finally, across multiple scales is integrated the final saliency map
by an information maximization rule. Both standard quantitative tools such as
NSS, LCC, AUC and qualitative assessments are used for evaluating the proposed
multi-scale discriminant saliency (MDIS) method against the well-know
information based approach AIM on its released image collection with
eye-tracking data. Simulation results are presented and analysed to verify the
validity of MDIS as well as point out its limitation for further research
direction.Comment: arXiv admin note: substantial text overlap with arXiv:1301.396
3D medical volume segmentation using hybrid multiresolution statistical approaches
This article is available through the Brunel Open Access Publishing Fund. Copyright © 2010 S AlZu’bi and A Amira.3D volume segmentation is the process of partitioning voxels into 3D regions (subvolumes) that represent meaningful physical entities which are more meaningful and easier to analyze and usable in future applications. Multiresolution Analysis (MRA) enables the preservation of an image according to certain levels of resolution or blurring. Because of multiresolution quality, wavelets have been deployed in image compression, denoising, and classification. This paper focuses on the implementation of efficient medical volume segmentation techniques. Multiresolution analysis including 3D wavelet and ridgelet has been used for feature extraction which can be modeled using Hidden Markov Models (HMMs) to segment the volume slices. A comparison study has been carried out to evaluate 2D and 3D techniques which reveals that 3D methodologies can accurately detect the Region Of Interest (ROI). Automatic segmentation has been achieved using HMMs where the ROI is detected accurately but suffers a long computation time for its calculations
Interpretable Structure-Evolving LSTM
This paper develops a general framework for learning interpretable data
representation via Long Short-Term Memory (LSTM) recurrent neural networks over
hierarchal graph structures. Instead of learning LSTM models over the pre-fixed
structures, we propose to further learn the intermediate interpretable
multi-level graph structures in a progressive and stochastic way from data
during the LSTM network optimization. We thus call this model the
structure-evolving LSTM. In particular, starting with an initial element-level
graph representation where each node is a small data element, the
structure-evolving LSTM gradually evolves the multi-level graph representations
by stochastically merging the graph nodes with high compatibilities along the
stacked LSTM layers. In each LSTM layer, we estimate the compatibility of two
connected nodes from their corresponding LSTM gate outputs, which is used to
generate a merging probability. The candidate graph structures are accordingly
generated where the nodes are grouped into cliques with their merging
probabilities. We then produce the new graph structure with a
Metropolis-Hasting algorithm, which alleviates the risk of getting stuck in
local optimums by stochastic sampling with an acceptance probability. Once a
graph structure is accepted, a higher-level graph is then constructed by taking
the partitioned cliques as its nodes. During the evolving process,
representation becomes more abstracted in higher-levels where redundant
information is filtered out, allowing more efficient propagation of long-range
data dependencies. We evaluate the effectiveness of structure-evolving LSTM in
the application of semantic object parsing and demonstrate its advantage over
state-of-the-art LSTM models on standard benchmarks.Comment: To appear in CVPR 2017 as a spotlight pape
Word Recognition with Deep Conditional Random Fields
Recognition of handwritten words continues to be an important problem in
document analysis and recognition. Existing approaches extract hand-engineered
features from word images--which can perform poorly with new data sets.
Recently, deep learning has attracted great attention because of the ability to
learn features from raw data. Moreover they have yielded state-of-the-art
results in classification tasks including character recognition and scene
recognition. On the other hand, word recognition is a sequential problem where
we need to model the correlation between characters. In this paper, we propose
using deep Conditional Random Fields (deep CRFs) for word recognition.
Basically, we combine CRFs with deep learning, in which deep features are
learned and sequences are labeled in a unified framework. We pre-train the deep
structure with stacked restricted Boltzmann machines (RBMs) for feature
learning and optimize the entire network with an online learning algorithm. The
proposed model was evaluated on two datasets, and seen to perform significantly
better than competitive baseline models. The source code is available at
https://github.com/ganggit/deepCRFs.Comment: 5 pages, published in ICIP 2016. arXiv admin note: substantial text
overlap with arXiv:1412.339
Grounding the Lexical Semantics of Verbs in Visual Perception using Force Dynamics and Event Logic
This paper presents an implemented system for recognizing the occurrence of
events described by simple spatial-motion verbs in short image sequences. The
semantics of these verbs is specified with event-logic expressions that
describe changes in the state of force-dynamic relations between the
participants of the event. An efficient finite representation is introduced for
the infinite sets of intervals that occur when describing liquid and
semi-liquid events. Additionally, an efficient procedure using this
representation is presented for inferring occurrences of compound events,
described with event-logic expressions, from occurrences of primitive events.
Using force dynamics and event logic to specify the lexical semantics of events
allows the system to be more robust than prior systems based on motion profile
- …