11 research outputs found
Improving Performance of Object Detection using the Mechanisms of Visual Recognition in Humans
Object recognition systems are usually trained and evaluated on high
resolution images. However, in real world applications, it is common that the
images have low resolutions or have small sizes. In this study, we first track
the performance of the state-of-the-art deep object recognition network,
Faster- RCNN, as a function of image resolution. The results reveals negative
effects of low resolution images on recognition performance. They also show
that different spatial frequencies convey different information about the
objects in recognition process. It means multi-resolution recognition system
can provides better insight into optimal selection of features that results in
better recognition of objects. This is similar to the mechanisms of the human
visual systems that are able to implement multi-scale representation of a
visual scene simultaneously. Then, we propose a multi-resolution object
recognition framework rather than a single-resolution network. The proposed
framework is evaluated on the PASCAL VOC2007 database. The experimental results
show the performance of our adapted multi-resolution Faster-RCNN framework
outperforms the single-resolution Faster-RCNN on input images with various
resolutions with an increase in the mean Average Precision (mAP) of 9.14%
across all resolutions and 1.2% on the full-spectrum images. Furthermore, the
proposed model yields robustness of the performance over a wide range of
spatial frequencies
Technology assessment of advanced automation for space missions
Six general classes of technology requirements derived during the mission definition phase of the study were identified as having maximum importance and urgency, including autonomous world model based information systems, learning and hypothesis formation, natural language and other man-machine communication, space manufacturing, teleoperators and robot systems, and computer science and technology
Meaning maps and saliency models based on deep convolutional neural networks are insensitive to image meaning when predicting human fixations
Eye movements are vital for human vision, and it is therefore important to understand how observers decide where to look. Meaning maps (MMs), a technique to capture the distribution of semantic information across an image, have recently been proposed to support the hypothesis that meaning rather than image features guides human gaze. MMs have the potential to be an important tool far beyond eye-movements research. Here, we examine central assumptions underlying MMs. First, we compared the performance of MMs in predicting fixations to saliency models, showing that DeepGaze II – a deep neural network trained to predict fixations based on high-level features rather than meaning – outperforms MMs. Second, we show that whereas human observers respond to changes in meaning induced by manipulating object-context relationships, MMs and DeepGaze II do not. Together, these findings challenge central assumptions underlying the use of MMs to measure the distribution of meaning in images
Fractal methods in image analysis and coding
In this thesis we present an overview of image processing techniques which use fractal methods in some way. We show how these fields relate to each other, and examine various aspects of fractal methods in each area.
The three principal fields of image processing and analysis th a t we examine are texture classification, image segmentation and image coding.
In the area of texture classification, we examine fractal dimension estimators, comparing these methods to other methods in use, and to each other. We attempt to explain why differences arise between various estimators of the same quantity. We also examine texture generation methods which use fractal dimension to generate textures of varying complexity.
We examine how fractal dimension can contribute to image segmentation methods. We also present an in-depth analysis of a novel segmentation scheme based on fractal coding.
Finally, we present an overview of fractal and wavelet image coding, and the links between the two. We examine a possible scheme involving both fractal and wavelet methods
Advanced Automation for Space Missions
The feasibility of using machine intelligence, including automation and robotics, in future space missions was studied
Investigating the role of image meaning and prior knowledge in human eye movements control
Humans sample visual information by making eye movements towards different parts of their
surroundings. Understanding what guides this sampling process is an important goal of vision
science, and the present thesis is a contribution to this endeavour. Chapter One provides an
overview of factors influencing human eye movements, which are typically divided into
bottom-up (stimulus-dependent) and top-down (observer-dependent) processes. One of the
challenges in studying these factors stem from the fact that they are often difficult to
operationalize in a precise, unambiguous way. This is particularly problematic for semantic
information contained in visual scenes (‘image meaning’), a top-down factor which is the
backbone of the recently proposed framework for understanding human eye movements: the
meaning maps approach. Chapter Two evaluates this approach and demonstrates that
meaning maps – a crowd-sourced method designed to quantify the distribution of meaning in
natural scenes – might be sensitive to complex visual features, rather than meaning. Chapter
Three builds on that finding and shows that contextualized meaning maps, the most recent
variant of the original meaning maps, share the limitations of their predecessors. Chapter Four
adopts a novel perspective on eye-movement control and focuses on the interactions between
image features (a bottom-up factor) and prior object-knowledge possessed by an observer (a
top-down factor). Specifically, it shows that the same stimuli – black and white, Mooney-style
two-tone images – are looked at differently depending on whether the observer possesses
object-knowledge that enables them to bind images into coherent percepts of objects. The
final chapter summarizes the thesis and maps the future directions for studies on eye
movements. Taken together, findings reported here indicate that while top-down factors such
as prior object-knowledge play a crucial role in guiding human gaze, the tools to study them
offered by the meaning maps approach still need to be improve