390 research outputs found
Multi-scale Discriminant Saliency with Wavelet-based Hidden Markov Tree Modelling
The bottom-up saliency, an early stage of humans' visual attention, can be
considered as a binary classification problem between centre and surround
classes. Discriminant power of features for the classification is measured as
mutual information between distributions of image features and corresponding
classes . As the estimated discrepancy very much depends on considered scale
level, multi-scale structure and discriminant power are integrated by employing
discrete wavelet features and Hidden Markov Tree (HMT). With wavelet
coefficients and Hidden Markov Tree parameters, quad-tree like label structures
are constructed and utilized in maximum a posterior probability (MAP) of hidden
class variables at corresponding dyadic sub-squares. Then, a saliency value for
each square block at each scale level is computed with discriminant power
principle. Finally, across multiple scales is integrated the final saliency map
by an information maximization rule. Both standard quantitative tools such as
NSS, LCC, AUC and qualitative assessments are used for evaluating the proposed
multi-scale discriminant saliency (MDIS) method against the well-know
information based approach AIM on its released image collection with
eye-tracking data. Simulation results are presented and analysed to verify the
validity of MDIS as well as point out its limitation for further research
direction.Comment: arXiv admin note: substantial text overlap with arXiv:1301.396
Automatic region-of-interest extraction in low depth-of-field images
PhD ThesisAutomatic extraction of focused regions from images with low depth-of-field
(DOF) is a problem without an efficient solution yet. The capability of
extracting focused regions can help to bridge the semantic gap by integrating
image regions which are meaningfully relevant and generally do not exhibit
uniform visual characteristics. There exist two main difficulties for extracting
focused regions from low DOF images using high-frequency based techniques:
computational complexity and performance.
A novel unsupervised segmentation approach based on ensemble clustering is
proposed to extract the focused regions from low DOF images in two stages.
The first stage is to cluster image blocks in a joint contrast-energy feature space
into three constituent groups. To achieve this, we make use of a normal
mixture-based model along with standard expectation-maximization (EM)
algorithm at two consecutive levels of block size. To avoid the common
problem of local optima experienced in many models, an ensemble EM
clustering algorithm is proposed. As a result, relevant blocks, i.e., block-based
region-of-interest (ROI), closely conforming to image objects are extracted.
In stage two, two different approaches have been developed to extract
pixel-based ROI. In the first approach, a binary saliency map is constructed
from the relevant blocks at the pixel level, which is based on difference of
Gaussian (DOG) and binarization methods. Then, a set of morphological
operations is employed to create the pixel-based ROI from the map.
Experimental results demonstrate that the proposed approach achieves an
average segmentation performance of 91.3% and is computationally 3 times
faster than the best existing approach. In the second approach, a minimal graph
cut is constructed by using the max-flow method and also by using
object/background seeds provided by the ensemble clustering algorithm.
Experimental results demonstrate an average segmentation performance of 91.7%
and approximately 50% reduction of the average computational time by the
proposed colour based approach compared with existing unsupervised
approaches
Real-time object detection using monocular vision for low-cost automotive sensing systems
This work addresses the problem of real-time object detection in automotive environments
using monocular vision. The focus is on real-time feature detection,
tracking, depth estimation using monocular vision and finally, object detection by
fusing visual saliency and depth information.
Firstly, a novel feature detection approach is proposed for extracting stable and
dense features even in images with very low signal-to-noise ratio. This methodology
is based on image gradients, which are redefined to take account of noise as
part of their mathematical model. Each gradient is based on a vector connecting a
negative to a positive intensity centroid, where both centroids are symmetric about
the centre of the area for which the gradient is calculated. Multiple gradient vectors
define a feature with its strength being proportional to the underlying gradient
vector magnitude. The evaluation of the Dense Gradient Features (DeGraF) shows
superior performance over other contemporary detectors in terms of keypoint density,
tracking accuracy, illumination invariance, rotation invariance, noise resistance
and detection time.
The DeGraF features form the basis for two new approaches that perform dense
3D reconstruction from a single vehicle-mounted camera. The first approach tracks
DeGraF features in real-time while performing image stabilisation with minimal
computational cost. This means that despite camera vibration the algorithm can
accurately predict the real-world coordinates of each image pixel in real-time by comparing
each motion-vector to the ego-motion vector of the vehicle. The performance
of this approach has been compared to different 3D reconstruction methods in order
to determine their accuracy, depth-map density, noise-resistance and computational
complexity. The second approach proposes the use of local frequency analysis of
i
ii
gradient features for estimating relative depth. This novel method is based on the
fact that DeGraF gradients can accurately measure local image variance with subpixel
accuracy. It is shown that the local frequency by which the centroid oscillates
around the gradient window centre is proportional to the depth of each gradient
centroid in the real world. The lower computational complexity of this methodology
comes at the expense of depth map accuracy as the camera velocity increases, but
it is at least five times faster than the other evaluated approaches.
This work also proposes a novel technique for deriving visual saliency maps by
using Division of Gaussians (DIVoG). In this context, saliency maps express the
difference of each image pixel is to its surrounding pixels across multiple pyramid
levels. This approach is shown to be both fast and accurate when evaluated against
other state-of-the-art approaches. Subsequently, the saliency information is combined
with depth information to identify salient regions close to the host vehicle.
The fused map allows faster detection of high-risk areas where obstacles are likely
to exist. As a result, existing object detection algorithms, such as the Histogram of
Oriented Gradients (HOG) can execute at least five times faster.
In conclusion, through a step-wise approach computationally-expensive algorithms
have been optimised or replaced by novel methodologies to produce a fast object
detection system that is aligned to the requirements of the automotive domain
Low complexity in-loop perceptual video coding
The tradition of broadcast video is today complemented with user generated content, as portable devices support video coding. Similarly, computing is becoming ubiquitous, where Internet of Things (IoT) incorporate heterogeneous networks to communicate with personal and/or infrastructure devices. Irrespective, the emphasises is on bandwidth and processor efficiencies, meaning increasing the signalling options in video encoding. Consequently, assessment for pixel differences applies uniform cost to be processor efficient, in contrast the Human Visual System (HVS) has non-uniform sensitivity based upon lighting, edges and textures. Existing perceptual assessments, are natively incompatible and processor demanding, making perceptual video coding (PVC) unsuitable for these environments. This research allows existing perceptual assessment at the native level using low complexity techniques, before producing new pixel-base image quality assessments (IQAs). To manage these IQAs a framework was developed and implemented in the high efficiency video coding (HEVC) encoder. This resulted in bit-redistribution, where greater bits and smaller partitioning were allocated to perceptually significant regions. Using a HEVC optimised processor the timing increase was < +4% and < +6% for video streaming and recording applications respectively, 1/3 of an existing low complexity PVC solution. Future work should be directed towards perceptual quantisation which offers the potential for perceptual coding gain
Advances in Robotics, Automation and Control
The book presents an excellent overview of the recent developments in the different areas of Robotics, Automation and Control. Through its 24 chapters, this book presents topics related to control and robot design; it also introduces new mathematical tools and techniques devoted to improve the system modeling and control. An important point is the use of rational agents and heuristic techniques to cope with the computational complexity required for controlling complex systems. Through this book, we also find navigation and vision algorithms, automatic handwritten comprehension and speech recognition systems that will be included in the next generation of productive systems developed by man
Visual Saliency Estimation Via HEVC Bitstream Analysis
Abstract
Since Information Technology developed dramatically from the last century 50's, digital images and video are ubiquitous. In the last decade, image and video processing have become more and more popular in biomedical, industrial, art and other fields. People made progress in the visual information such as images or video display, storage and transmission. The attendant problem is that video processing tasks in time domain become particularly arduous.
Based on the study of the existing compressed domain video saliency detection model, a new saliency estimation model for video based on High Efficiency Video Coding (HEVC) is presented. First, the relative features are extracted from HEVC encoded bitstream. The naive Bayesian model is used to train and test features based on original YUV videos and ground truth. The intra frame saliency map can be achieved after training and testing intra features. And inter frame saliency can be achieved by intra saliency with moving motion vectors. The ROC of our proposed intra mode is 0.9561. Other classification methods such as support vector machine (SVM), k nearest neighbors (KNN) and the decision tree are presented to compare the experimental outcomes. The variety of compression ratio has been analysis to affect the saliency
Scalable visualization of spatial data in 3D terrain
Designing visualizations of spatial data in 3D terrain is challenging because various heterogeneous data aspects need to be considered, including the terrain itself, multiple data attributes, and data uncertainty. It is hardly possible to visualize these data at full detail in a single image. Therefore, this thesis devises a scalable visualization approach that focuses on relevant information to be emphasized, while less-relevant information can be attenuated. In this context, a noval concept of visualizing spatial data in 3D terrain and different soft- and hardware solutions are proposed.Die Erstellung von Visualisierungen für räumliche Daten im 3D-Gelände ist schwierig, da viele heterogene Datenaspekte wie das Gelände selbst, die verschiedenen Datenattribute sowie Unsicherheiten bei der Darstellung zu berücksichtigen sind. Im Allgemeinen ist es nicht möglich, diese Datenaspekte gleichzeitig in einer Visualisierung darzustellen. Daher werden in der Arbeit skalierbare Visualisierungsstrategien entwickelt, welche die wichtigen Informationen hervorheben und trotzdem gleichzeitig Kontextinformationen liefern. Hierfür werden neue Systematisierungen und Konzepte vorgestellt
- …