1,464 research outputs found
DeepFuse: A Deep Unsupervised Approach for Exposure Fusion with Extreme Exposure Image Pairs
We present a novel deep learning architecture for fusing static
multi-exposure images. Current multi-exposure fusion (MEF) approaches use
hand-crafted features to fuse input sequence. However, the weak hand-crafted
representations are not robust to varying input conditions. Moreover, they
perform poorly for extreme exposure image pairs. Thus, it is highly desirable
to have a method that is robust to varying input conditions and capable of
handling extreme exposure without artifacts. Deep representations have known to
be robust to input conditions and have shown phenomenal performance in a
supervised setting. However, the stumbling block in using deep learning for MEF
was the lack of sufficient training data and an oracle to provide the
ground-truth for supervision. To address the above issues, we have gathered a
large dataset of multi-exposure image stacks for training and to circumvent the
need for ground truth images, we propose an unsupervised deep learning
framework for MEF utilizing a no-reference quality metric as loss function. The
proposed approach uses a novel CNN architecture trained to learn the fusion
operation without reference ground truth image. The model fuses a set of common
low level features extracted from each image to generate artifact-free
perceptually pleasing results. We perform extensive quantitative and
qualitative evaluation and show that the proposed technique outperforms
existing state-of-the-art approaches for a variety of natural images.Comment: ICCV 201
Fruit Detection and Tree Segmentation for Yield Mapping in Orchards
Accurate information gathering and processing is critical for precision horticulture, as growers aim to optimise their farm management practices. An accurate inventory of the crop that details its spatial distribution along with health and maturity, can help farmers efficiently target processes such as chemical and fertiliser spraying, crop thinning, harvest management, labour planning and marketing. Growers have traditionally obtained this information by using manual sampling techniques, which tend to be labour intensive, spatially sparse, expensive, inaccurate and prone to subjective biases. Recent advances in sensing and automation for field robotics allow for key measurements to be made for individual plants throughout an orchard in a timely and accurate manner. Farmer operated machines or unmanned robotic platforms can be equipped with a range of sensors to capture a detailed representation over large areas. Robust and accurate data processing techniques are therefore required to extract high level information needed by the grower to support precision farming. This thesis focuses on yield mapping in orchards using image and light detection and ranging (LiDAR) data captured using an unmanned ground vehicle (UGV). The contribution is the framework and algorithmic components for orchard mapping and yield estimation that is applicable to different fruit types and orchard configurations. The framework includes detection of fruits in individual images and tracking them over subsequent frames. The fruit counts are then associated to individual trees, which are segmented from image and LiDAR data, resulting in a structured spatial representation of yield. The first contribution of this thesis is the development of a generic and robust fruit detection algorithm. Images captured in the outdoor environment are susceptible to highly variable external factors that lead to significant appearance variations. Specifically in orchards, variability is caused by changes in illumination, target pose, tree types, etc. The proposed techniques address these issues by using state-of-the-art feature learning approaches for image classification, while investigating the utility of orchard domain knowledge for fruit detection. Detection is performed using both pixel-wise classification of images followed instance segmentation, and bounding-box regression approaches. The experimental results illustrate the versatility of complex deep learning approaches over a multitude of fruit types. The second contribution of this thesis is a tree segmentation approach to detect the individual trees that serve as a standard unit for structured orchard information systems. The work focuses on trellised trees, which present unique challenges for segmentation algorithms due to their intertwined nature. LiDAR data are used to segment the trellis face, and to generate proposals for individual trees trunks. Additional trunk proposals are provided using pixel-wise classification of the image data. The multi-modal observations are fine-tuned by modelling trunk locations using a hidden semi-Markov model (HSMM), within which prior knowledge of tree spacing is incorporated. The final component of this thesis addresses the visual occlusion of fruit within geometrically complex canopies by using a multi-view detection and tracking approach. Single image fruit detections are tracked over a sequence of images, and associated to individual trees or farm rows, with the spatial distribution of the fruit counting forming a yield map over the farm. The results show the advantage of using multi-view imagery (instead of single view analysis) for fruit counting and yield mapping. This thesis includes extensive experimentation in almond, apple and mango orchards, with data captured by a UGV spanning a total of 5 hectares of farm area, over 30 km of vehicle traversal and more than 7,000 trees. The validation of the different processes is performed using manual annotations, which includes fruit and tree locations in image and LiDAR data respectively. Additional evaluation of yield mapping is performed by comparison against fruit counts on trees at the farm and counts made by the growers post-harvest. The framework developed in this thesis is demonstrated to be accurate compared to ground truth at all scales of the pipeline, including fruit detection and tree mapping, leading to accurate yield estimation, per tree and per row, for the different crops. Through the multitude of field experiments conducted over multiple seasons and years, the thesis presents key practical insights necessary for commercial development of an information gathering system in orchards
Spatial and temporal background modelling of non-stationary visual scenes
PhDThe prevalence of electronic imaging systems in everyday life has become increasingly apparent
in recent years. Applications are to be found in medical scanning, automated manufacture, and
perhaps most significantly, surveillance. Metropolitan areas, shopping malls, and road traffic
management all employ and benefit from an unprecedented quantity of video cameras for monitoring
purposes. But the high cost and limited effectiveness of employing humans as the final
link in the monitoring chain has driven scientists to seek solutions based on machine vision techniques.
Whilst the field of machine vision has enjoyed consistent rapid development in the last
20 years, some of the most fundamental issues still remain to be solved in a satisfactory manner.
Central to a great many vision applications is the concept of segmentation, and in particular,
most practical systems perform background subtraction as one of the first stages of video
processing. This involves separation of ‘interesting foreground’ from the less informative but
persistent background. But the definition of what is ‘interesting’ is somewhat subjective, and
liable to be application specific. Furthermore, the background may be interpreted as including
the visual appearance of normal activity of any agents present in the scene, human or otherwise.
Thus a background model might be called upon to absorb lighting changes, moving trees and
foliage, or normal traffic flow and pedestrian activity, in order to effect what might be termed in
‘biologically-inspired’ vision as pre-attentive selection. This challenge is one of the Holy Grails
of the computer vision field, and consequently the subject has received considerable attention.
This thesis sets out to address some of the limitations of contemporary methods of background
segmentation by investigating methods of inducing local mutual support amongst pixels
in three starkly contrasting paradigms: (1) locality in the spatial domain, (2) locality in the shortterm
time domain, and (3) locality in the domain of cyclic repetition frequency.
Conventional per pixel models, such as those based on Gaussian Mixture Models, offer no
spatial support between adjacent pixels at all. At the other extreme, eigenspace models impose
a structure in which every image pixel bears the same relation to every other pixel. But Markov
Random Fields permit definition of arbitrary local cliques by construction of a suitable graph, and
3
are used here to facilitate a novel structure capable of exploiting probabilistic local cooccurrence
of adjacent Local Binary Patterns. The result is a method exhibiting strong sensitivity to multiple
learned local pattern hypotheses, whilst relying solely on monochrome image data.
Many background models enforce temporal consistency constraints on a pixel in attempt to
confirm background membership before being accepted as part of the model, and typically some
control over this process is exercised by a learning rate parameter. But in busy scenes, a true
background pixel may be visible for a relatively small fraction of the time and in a temporally
fragmented fashion, thus hindering such background acquisition. However, support in terms of
temporal locality may still be achieved by using Combinatorial Optimization to derive shortterm
background estimates which induce a similar consistency, but are considerably more robust
to disturbance. A novel technique is presented here in which the short-term estimates act as
‘pre-filtered’ data from which a far more compact eigen-background may be constructed.
Many scenes entail elements exhibiting repetitive periodic behaviour. Some road junctions
employing traffic signals are among these, yet little is to be found amongst the literature regarding
the explicit modelling of such periodic processes in a scene. Previous work focussing on gait
recognition has demonstrated approaches based on recurrence of self-similarity by which local
periodicity may be identified. The present work harnesses and extends this method in order
to characterize scenes displaying multiple distinct periodicities by building a spatio-temporal
model. The model may then be used to highlight abnormality in scene activity. Furthermore, a
Phase Locked Loop technique with a novel phase detector is detailed, enabling such a model to
maintain correct synchronization with scene activity in spite of noise and drift of periodicity.
This thesis contends that these three approaches are all manifestations of the same broad
underlying concept: local support in each of the space, time and frequency domains, and furthermore,
that the support can be harnessed practically, as will be demonstrated experimentally
A review on deep learning techniques for 3D sensed data classification
Over the past decade deep learning has driven progress in 2D image
understanding. Despite these advancements, techniques for automatic 3D sensed
data understanding, such as point clouds, is comparatively immature. However,
with a range of important applications from indoor robotics navigation to
national scale remote sensing there is a high demand for algorithms that can
learn to automatically understand and classify 3D sensed data. In this paper we
review the current state-of-the-art deep learning architectures for processing
unstructured Euclidean data. We begin by addressing the background concepts and
traditional methodologies. We review the current main approaches including;
RGB-D, multi-view, volumetric and fully end-to-end architecture designs.
Datasets for each category are documented and explained. Finally, we give a
detailed discussion about the future of deep learning for 3D sensed data, using
literature to justify the areas where future research would be most valuable.Comment: 25 pages, 9 figures. Review pape
3D Ground Truth Generation Using Pre-Trained Deep Neural Networks
Training 3D object detectors on publicly available data has been limited to small datasets
due to the large amount of effort required to generate annotations. The difficulty of labeling
in 3D using 2.5D sensors, such as LIDAR, is attributed to the high spatial reasoning skills
required to deal with occlusion and partial viewpoints. Additionally, the current methods
to label 3D objects are cognitively demanding due to frequent task switching. Reducing
both task complexity and the amount of task switching done by annotators is key to
reducing the effort and time required to generate 3D bounding box annotations. We
therefore seek to reduce the burden on the annotators by leveraging existing 3D object
detectors using deep neural networks.
This work introduces a novel ground truth generation method that combines human
supervision with pre-trained neural networks to generate per-instance 3D point cloud seg-
mentation, 3D bounding boxes, and class annotations. The annotators provide object
anchor clicks which behave as a seed to generate instance segmentation results in 3D. The
points belonging to each instance are then used to regress object centroids, bounding box
dimensions, and object orientation. The deep neural network model used to generate the
segmentation masks and bounding box parameters is based on the PointNet architecture.
We develop our approach with reliance on the KITTI dataset to analyze the quality
of the generated ground truth. The neural network model is trained on KITTI training
split and the 3D bounding box outputs are generated using annotation clicks collected
from the validation split. The validation split of KITTI detection dataset contains 3712
frames of pointcloud and image scenes and it took 16.35 hours to label with the following
method. Based on these results, our approach is 19 times faster than the latest published
3D object annotation scheme. Additionally, it is found that the annotators spent less
time per object as the number of objects in the scenes increase, making it a very efficient
for multi-object labeling. Furthermore, the quality of the generated 3D bounding boxes,
using the labeling method, is compared against the KITTI ground truth. It is shown that
the model performs on par with the current state-of-the-art 3D detectors and the labeling
procedure does not negatively impact the output quality of the bounding boxes. Lastly, the
proposed scheme is applied to previously unseen data from the Autonomoose self-driving
vehicle to demonstrate generalization capabilities of the network
Deep Probabilistic Models for Camera Geo-Calibration
The ultimate goal of image understanding is to transfer visual images into numerical or symbolic descriptions of the scene that are helpful for decision making. Knowing when, where, and in which direction a picture was taken, the task of geo-calibration makes it possible to use imagery to understand the world and how it changes in time. Current models for geo-calibration are mostly deterministic, which in many cases fails to model the inherent uncertainties when the image content is ambiguous. Furthermore, without a proper modeling of the uncertainty, subsequent processing can yield overly confident predictions. To address these limitations, we propose a probabilistic model for camera geo-calibration using deep neural networks. While our primary contribution is geo-calibration, we also show that learning to geo-calibrate a camera allows us to implicitly learn to understand the content of the scene
- …