10,077 research outputs found
Single-Shot Direct Block Address Encoding for Learning Screen Geometry
3D surface reconstruction has many applications in different domains such as projection mapping, virtual reality, robot navigation, human computer interaction and manufacturing inspection, to name a few. Among different methods of 3D reconstruction, structured light is widely used as it is comparatively cheap and accessible and solves the main problem of traditional stereo vision systems which is finding accurate pixel correspondences between two or multiple views.
Structured light techniques can be most fundamentally categorized in terms of the number of projected images over time, whether a single image (single-shot) or multiple images (multi-shot).
Multi-shot structured light methods take advantage of multiple images that are projected sequentially over time, allowing simple encoding / decoding of projector pixel addresses.
In contrast, single-shot structured light is preferred in contexts of dynamically moving cameras, projectors or surfaces, and in scenarios where short projection time is important.
In this thesis, a new framework for designing single-shot structured light images using tag embedding, called Direct Block Address Encoding, is presented which, unlike previous methods, results in efficient encoding, decoding and 3D reconstruction. Also, error detection and correction mechanisms are designed to detect pixel codewords with errors and find their correspondences in the projector image. In addition, the relationship between different design parameters (alphabet size, encoding Scheme, tag size, block size) are derived to cover projectors with different resolutions.
Experimental results demonstrate that the proposed scheme is capable of obtaining projector-camera pixel correspondences at higher speed in comparison with previous tag embedding methods, allowing for learning screen geometry from a single shot with high resolution projectors and dynamic cameras and projectors. The proposed Direct Block Address Encoding scheme offers 2-3 times speed up for 3D reconstruction and 5-6 times speed up for encoding/decoding stages due to not requiring a look-up table and/or an exhaustive search, something not achieved with other methods
Manifold Learning Approaches to Compressing Latent Spaces of Unsupervised Feature Hierarchies
Field robots encounter dynamic unstructured environments containing a vast array of unique objects. In order to make sense of the world in which they are placed, they collect large quantities of unlabelled data with a variety of sensors. Producing robust and reliable applications depends entirely on the ability of the robot to understand the unlabelled data it obtains. Deep Learning techniques have had a high level of success in learning powerful unsupervised representations for a variety of discriminative and generative models. Applying these techniques to problems encountered in field robotics remains a challenging endeavour. Modern Deep Learning methods are typically trained with a substantial labelled dataset, while datasets produced in a field robotics context contain limited labelled training data. The primary motivation for this thesis stems from the problem of applying large scale Deep Learning models to field robotics datasets that are label poor. While the lack of labelled ground truth data drives the desire for unsupervised methods, the need for improving the model scaling is driven by two factors, performance and computational requirements. When utilising unsupervised layer outputs as representations for classification, the classification performance increases with layer size. Scaling up models with multiple large layers of features is problematic, as the sizes of subsequent hidden layers scales with the size of the previous layer. This quadratic scaling, and the associated time required to train such networks has prevented adoption of large Deep Learning models beyond cluster computing. The contributions in this thesis are developed from the observation that parameters or filter el- ements learnt in Deep Learning systems are typically highly structured, and contain related ele- ments. Firstly, the structure of unsupervised filters is utilised to construct a mapping from the high dimensional filter space to a low dimensional manifold. This creates a significantly smaller repre- sentation for subsequent feature learning. This mapping, and its effect on the resulting encodings, highlights the need for the ability to learn highly overcomplete sets of convolutional features. Driven by this need, the unsupervised pretraining of Deep Convolutional Networks is developed to include a number of modern training and regularisation methods. These pretrained models are then used to provide initialisations for supervised convolutional models trained on low quantities of labelled data. By utilising pretraining, a significant increase in classification performance on a number of publicly available datasets is achieved. In order to apply these techniques to outdoor 3D Laser Illuminated Detection And Ranging data, we develop a set of resampling techniques to provide uniform input to Deep Learning models. The features learnt in these systems outperform the high effort hand engineered features developed specifically for 3D data. The representation of a given signal is then reinterpreted as a combination of modes that exist on the learnt low dimensional filter manifold. From this, we develop an encoding technique that allows the high dimensional layer output to be represented as a combination of low dimensional components. This allows the growth of subsequent layers to only be dependent on the intrinsic dimensionality of the filter manifold and not the number of elements contained in the previous layer. Finally, the resulting unsupervised convolutional model, the encoding frameworks and the em- bedding methodology are used to produce a new unsupervised learning stratergy that is able to encode images in terms of overcomplete filter spaces, without producing an explosion in the size of the intermediate parameter spaces. This model produces classification results on par with state of the art models, yet requires significantly less computational resources and is suitable for use in the constrained computation environment of a field robot
A simple way to estimate similarity between pairs of eye movement sequences
We propose a novel algorithm to estimate the similarity between a pair of eye movement sequences. The proposed algorithm relies on a straight-forward geometric representation of eye movement data. The algorithm is considerably simpler to implement and apply than existing similarity measures, and is particularly suited for exploratory analyses. To validate the algorithm, we conducted a benchmark experiment using realistic artificial eye movement data. Based on similarity ratings obtained from the proposed algorithm, we defined two clusters in an unlabelled set of eye movement sequences. As a measure of the algorithm's sensitivity, we quantified the extent to which these data-driven clusters matched two pre-defined groups (i.e., the 'real' clusters). The same analysis was performed using two other, commonly used similarity measures. The results show that the proposed algorithm is a viable similarity measure
TIDE: Temporally Incremental Disparity Estimation via Pattern Flow in Structured Light System
We introduced Temporally Incremental Disparity Estimation Network (TIDE-Net),
a learning-based technique for disparity computation in mono-camera structured
light systems. In our hardware setting, a static pattern is projected onto a
dynamic scene and captured by a monocular camera. Different from most former
disparity estimation methods that operate in a frame-wise manner, our network
acquires disparity maps in a temporally incremental way. Specifically, We
exploit the deformation of projected patterns (named pattern flow ) on captured
image sequences, to model the temporal information. Notably, this newly
proposed pattern flow formulation reflects the disparity changes along the
epipolar line, which is a special form of optical flow. Tailored for pattern
flow, the TIDE-Net, a recurrent architecture, is proposed and implemented. For
each incoming frame, our model fuses correlation volumes (from current frame)
and disparity (from former frame) warped by pattern flow. From fused features,
the final stage of TIDE-Net estimates the residual disparity rather than the
full disparity, as conducted by many previous methods. Interestingly, this
design brings clear empirical advantages in terms of efficiency and
generalization ability. Using only synthetic data for training, our extensitve
evaluation results (w.r.t. both accuracy and efficienty metrics) show superior
performance than several SOTA models on unseen real data. The code is available
on https://github.com/CodePointer/TIDENet
Information recovery from rank-order encoded images
The time to detection of a visual stimulus by the primate eye is recorded at
100 ā 150ms. This near instantaneous recognition is in spite of the considerable
processing required by the several stages of the visual pathway to recognise and
react to a visual scene. How this is achieved is still a matter of speculation.
Rank-order codes have been proposed as a means of encoding by the primate
eye in the rapid transmission of the initial burst of information from the sensory
neurons to the brain. We study the efficiency of rank-order codes in encoding
perceptually-important information in an image. VanRullen and Thorpe built a
model of the ganglion cell layers of the retina to simulate and study the viability
of rank-order as a means of encoding by retinal neurons. We validate their model
and quantify the information retrieved from rank-order encoded images in terms
of the visually-important information recovered. Towards this goal, we apply
the āperceptual information preservation algorithmā, proposed by Petrovic and
Xydeas after slight modification. We observe a low information recovery due
to losses suffered during the rank-order encoding and decoding processes. We
propose to minimise these losses to recover maximum information in minimum
time from rank-order encoded images. We first maximise information recovery by
using the pseudo-inverse of the filter-bank matrix to minimise losses during rankorder
decoding. We then apply the biological principle of lateral inhibition to
minimise losses during rank-order encoding. In doing so, we propose the Filteroverlap
Correction algorithm. To test the perfomance of rank-order codes in
a biologically realistic model, we design and simulate a model of the foveal-pit
ganglion cells of the retina keeping close to biological parameters. We use this
as a rank-order encoder and analyse its performance relative to VanRullen and
Thorpeās retinal model
Seismic Ray Impedance Inversion
This thesis investigates a prestack seismic inversion scheme implemented in the ray
parameter domain. Conventionally, most prestack seismic inversion methods are
performed in the incidence angle domain. However, inversion using the concept of
ray impedance, as it honours ray path variation following the elastic parameter
variation according to Snellās law, shows the capacity to discriminate different
lithologies if compared to conventional elastic impedance inversion.
The procedure starts with data transformation into the ray-parameter domain and then
implements the ray impedance inversion along constant ray-parameter profiles. With
different constant-ray-parameter profiles, mixed-phase wavelets are initially estimated
based on the high-order statistics of the data and further refined after a proper well-to-seismic
tie. With the estimated wavelets ready, a Cauchy inversion method is used to
invert for seismic reflectivity sequences, aiming at recovering seismic reflectivity
sequences for blocky impedance inversion. The impedance inversion from reflectivity
sequences adopts a standard generalised linear inversion scheme, whose results are
utilised to identify rock properties and facilitate quantitative interpretation. It has also
been demonstrated that we can further invert elastic parameters from ray impedance
values, without eliminating an extra density term or introducing a Gardnerās relation
to absorb this term.
Ray impedance inversion is extended to P-S converted waves by introducing the
definition of converted-wave ray impedance. This quantity shows some advantages in
connecting prestack converted wave data with well logs, if compared with the shearwave
elastic impedance derived from the Aki and Richards approximation to the
Zoeppritz equations. An analysis of P-P and P-S wave data under the framework of
ray impedance is conducted through a real multicomponent dataset, which can reduce
the uncertainty in lithology identification.Inversion is the key method in generating those examples throughout the entire thesis
as we believe it can render robust solutions to geophysical problems. Apart from the
reflectivity sequence, ray impedance and elastic parameter inversion mentioned above,
inversion methods are also adopted in transforming the prestack data from the offset
domain to the ray-parameter domain, mixed-phase wavelet estimation, as well as the
registration of P-P and P-S waves for the joint analysis.
The ray impedance inversion methods are successfully applied to different types of
datasets. In each individual step to achieving the ray impedance inversion, advantages,
disadvantages as well as limitations of the algorithms adopted are detailed. As a
conclusion, the ray impedance related analyses demonstrated in this thesis are highly
competent compared with the classical elastic impedance methods and the author
would like to recommend it for a wider application
Development and Testing of a Fractal Analysis Algorithm for Face Recognition
Following an earlier development for fingerprints by Deal (1) and Stoffa (2), it was suggested that this algorithm may work on faces (or more precisely, face images). First, this work transformed a 2-D electronic image file of a human face into a numeric system via a similar random walk process by Deal and Stoffa. Second, the numeric system was analyzed, and the numeric system may then be tested against a database of similarly converted images. The testing determined whether the subject of the image is part of the database. Finally, the efficiency, quickness, and accuracy of such an algorithm were tested and conclusions about the general effectiveness were made.;The algorithm employed a Random Walk analysis of digital photographs of human faces for a fixed number of binary images which were generated from the source photograph using a Boolean conversion scheme. The Random Walk generated a series of transition probabilities for a particular scale. In short, the numeric system used to describe the face will consisted of two dimensions of data---scale and binary image. The numeric system for a particular source photograph was tested against a database of similarly constructed systems to determine whether the subject of the source photograph was in the database.;For the purpose of this work, a database of 400 images was constructed from 167 individual subjects using the FERET database. The 400 images where then analyzed, and tested against the database to determine whether the algorithm could find the subjects in the database. The algorithm was able, in its best configuration, to identify correctly the subjects of 168 of the 400 photographs. However, the total time to run an image (after capture by a digital camera) to database comparison was only 62 seconds, which represents a substantial improvement over previous systems
- ā¦