10,077 research outputs found

    Single-Shot Direct Block Address Encoding for Learning Screen Geometry

    Get PDF
    3D surface reconstruction has many applications in different domains such as projection mapping, virtual reality, robot navigation, human computer interaction and manufacturing inspection, to name a few. Among different methods of 3D reconstruction, structured light is widely used as it is comparatively cheap and accessible and solves the main problem of traditional stereo vision systems which is finding accurate pixel correspondences between two or multiple views. Structured light techniques can be most fundamentally categorized in terms of the number of projected images over time, whether a single image (single-shot) or multiple images (multi-shot). Multi-shot structured light methods take advantage of multiple images that are projected sequentially over time, allowing simple encoding / decoding of projector pixel addresses. In contrast, single-shot structured light is preferred in contexts of dynamically moving cameras, projectors or surfaces, and in scenarios where short projection time is important. In this thesis, a new framework for designing single-shot structured light images using tag embedding, called Direct Block Address Encoding, is presented which, unlike previous methods, results in efficient encoding, decoding and 3D reconstruction. Also, error detection and correction mechanisms are designed to detect pixel codewords with errors and find their correspondences in the projector image. In addition, the relationship between different design parameters (alphabet size, encoding Scheme, tag size, block size) are derived to cover projectors with different resolutions. Experimental results demonstrate that the proposed scheme is capable of obtaining projector-camera pixel correspondences at higher speed in comparison with previous tag embedding methods, allowing for learning screen geometry from a single shot with high resolution projectors and dynamic cameras and projectors. The proposed Direct Block Address Encoding scheme offers 2-3 times speed up for 3D reconstruction and 5-6 times speed up for encoding/decoding stages due to not requiring a look-up table and/or an exhaustive search, something not achieved with other methods

    Feature Extraction Methods for Character Recognition

    Get PDF
    Not Include

    Manifold Learning Approaches to Compressing Latent Spaces of Unsupervised Feature Hierarchies

    Get PDF
    Field robots encounter dynamic unstructured environments containing a vast array of unique objects. In order to make sense of the world in which they are placed, they collect large quantities of unlabelled data with a variety of sensors. Producing robust and reliable applications depends entirely on the ability of the robot to understand the unlabelled data it obtains. Deep Learning techniques have had a high level of success in learning powerful unsupervised representations for a variety of discriminative and generative models. Applying these techniques to problems encountered in field robotics remains a challenging endeavour. Modern Deep Learning methods are typically trained with a substantial labelled dataset, while datasets produced in a field robotics context contain limited labelled training data. The primary motivation for this thesis stems from the problem of applying large scale Deep Learning models to field robotics datasets that are label poor. While the lack of labelled ground truth data drives the desire for unsupervised methods, the need for improving the model scaling is driven by two factors, performance and computational requirements. When utilising unsupervised layer outputs as representations for classification, the classification performance increases with layer size. Scaling up models with multiple large layers of features is problematic, as the sizes of subsequent hidden layers scales with the size of the previous layer. This quadratic scaling, and the associated time required to train such networks has prevented adoption of large Deep Learning models beyond cluster computing. The contributions in this thesis are developed from the observation that parameters or filter el- ements learnt in Deep Learning systems are typically highly structured, and contain related ele- ments. Firstly, the structure of unsupervised filters is utilised to construct a mapping from the high dimensional filter space to a low dimensional manifold. This creates a significantly smaller repre- sentation for subsequent feature learning. This mapping, and its effect on the resulting encodings, highlights the need for the ability to learn highly overcomplete sets of convolutional features. Driven by this need, the unsupervised pretraining of Deep Convolutional Networks is developed to include a number of modern training and regularisation methods. These pretrained models are then used to provide initialisations for supervised convolutional models trained on low quantities of labelled data. By utilising pretraining, a significant increase in classification performance on a number of publicly available datasets is achieved. In order to apply these techniques to outdoor 3D Laser Illuminated Detection And Ranging data, we develop a set of resampling techniques to provide uniform input to Deep Learning models. The features learnt in these systems outperform the high effort hand engineered features developed specifically for 3D data. The representation of a given signal is then reinterpreted as a combination of modes that exist on the learnt low dimensional filter manifold. From this, we develop an encoding technique that allows the high dimensional layer output to be represented as a combination of low dimensional components. This allows the growth of subsequent layers to only be dependent on the intrinsic dimensionality of the filter manifold and not the number of elements contained in the previous layer. Finally, the resulting unsupervised convolutional model, the encoding frameworks and the em- bedding methodology are used to produce a new unsupervised learning stratergy that is able to encode images in terms of overcomplete filter spaces, without producing an explosion in the size of the intermediate parameter spaces. This model produces classification results on par with state of the art models, yet requires significantly less computational resources and is suitable for use in the constrained computation environment of a field robot

    A simple way to estimate similarity between pairs of eye movement sequences

    Get PDF
    We propose a novel algorithm to estimate the similarity between a pair of eye movement sequences. The proposed algorithm relies on a straight-forward geometric representation of eye movement data. The algorithm is considerably simpler to implement and apply than existing similarity measures, and is particularly suited for exploratory analyses. To validate the algorithm, we conducted a benchmark experiment using realistic artificial eye movement data. Based on similarity ratings obtained from the proposed algorithm, we defined two clusters in an unlabelled set of eye movement sequences. As a measure of the algorithm's sensitivity, we quantified the extent to which these data-driven clusters matched two pre-defined groups (i.e., the 'real' clusters). The same analysis was performed using two other, commonly used similarity measures. The results show that the proposed algorithm is a viable similarity measure

    TIDE: Temporally Incremental Disparity Estimation via Pattern Flow in Structured Light System

    Full text link
    We introduced Temporally Incremental Disparity Estimation Network (TIDE-Net), a learning-based technique for disparity computation in mono-camera structured light systems. In our hardware setting, a static pattern is projected onto a dynamic scene and captured by a monocular camera. Different from most former disparity estimation methods that operate in a frame-wise manner, our network acquires disparity maps in a temporally incremental way. Specifically, We exploit the deformation of projected patterns (named pattern flow ) on captured image sequences, to model the temporal information. Notably, this newly proposed pattern flow formulation reflects the disparity changes along the epipolar line, which is a special form of optical flow. Tailored for pattern flow, the TIDE-Net, a recurrent architecture, is proposed and implemented. For each incoming frame, our model fuses correlation volumes (from current frame) and disparity (from former frame) warped by pattern flow. From fused features, the final stage of TIDE-Net estimates the residual disparity rather than the full disparity, as conducted by many previous methods. Interestingly, this design brings clear empirical advantages in terms of efficiency and generalization ability. Using only synthetic data for training, our extensitve evaluation results (w.r.t. both accuracy and efficienty metrics) show superior performance than several SOTA models on unseen real data. The code is available on https://github.com/CodePointer/TIDENet

    Information recovery from rank-order encoded images

    Get PDF
    The time to detection of a visual stimulus by the primate eye is recorded at 100 ā€“ 150ms. This near instantaneous recognition is in spite of the considerable processing required by the several stages of the visual pathway to recognise and react to a visual scene. How this is achieved is still a matter of speculation. Rank-order codes have been proposed as a means of encoding by the primate eye in the rapid transmission of the initial burst of information from the sensory neurons to the brain. We study the efficiency of rank-order codes in encoding perceptually-important information in an image. VanRullen and Thorpe built a model of the ganglion cell layers of the retina to simulate and study the viability of rank-order as a means of encoding by retinal neurons. We validate their model and quantify the information retrieved from rank-order encoded images in terms of the visually-important information recovered. Towards this goal, we apply the ā€˜perceptual information preservation algorithmā€™, proposed by Petrovic and Xydeas after slight modification. We observe a low information recovery due to losses suffered during the rank-order encoding and decoding processes. We propose to minimise these losses to recover maximum information in minimum time from rank-order encoded images. We first maximise information recovery by using the pseudo-inverse of the filter-bank matrix to minimise losses during rankorder decoding. We then apply the biological principle of lateral inhibition to minimise losses during rank-order encoding. In doing so, we propose the Filteroverlap Correction algorithm. To test the perfomance of rank-order codes in a biologically realistic model, we design and simulate a model of the foveal-pit ganglion cells of the retina keeping close to biological parameters. We use this as a rank-order encoder and analyse its performance relative to VanRullen and Thorpeā€™s retinal model

    Seismic Ray Impedance Inversion

    Get PDF
    This thesis investigates a prestack seismic inversion scheme implemented in the ray parameter domain. Conventionally, most prestack seismic inversion methods are performed in the incidence angle domain. However, inversion using the concept of ray impedance, as it honours ray path variation following the elastic parameter variation according to Snellā€™s law, shows the capacity to discriminate different lithologies if compared to conventional elastic impedance inversion. The procedure starts with data transformation into the ray-parameter domain and then implements the ray impedance inversion along constant ray-parameter profiles. With different constant-ray-parameter profiles, mixed-phase wavelets are initially estimated based on the high-order statistics of the data and further refined after a proper well-to-seismic tie. With the estimated wavelets ready, a Cauchy inversion method is used to invert for seismic reflectivity sequences, aiming at recovering seismic reflectivity sequences for blocky impedance inversion. The impedance inversion from reflectivity sequences adopts a standard generalised linear inversion scheme, whose results are utilised to identify rock properties and facilitate quantitative interpretation. It has also been demonstrated that we can further invert elastic parameters from ray impedance values, without eliminating an extra density term or introducing a Gardnerā€™s relation to absorb this term. Ray impedance inversion is extended to P-S converted waves by introducing the definition of converted-wave ray impedance. This quantity shows some advantages in connecting prestack converted wave data with well logs, if compared with the shearwave elastic impedance derived from the Aki and Richards approximation to the Zoeppritz equations. An analysis of P-P and P-S wave data under the framework of ray impedance is conducted through a real multicomponent dataset, which can reduce the uncertainty in lithology identification.Inversion is the key method in generating those examples throughout the entire thesis as we believe it can render robust solutions to geophysical problems. Apart from the reflectivity sequence, ray impedance and elastic parameter inversion mentioned above, inversion methods are also adopted in transforming the prestack data from the offset domain to the ray-parameter domain, mixed-phase wavelet estimation, as well as the registration of P-P and P-S waves for the joint analysis. The ray impedance inversion methods are successfully applied to different types of datasets. In each individual step to achieving the ray impedance inversion, advantages, disadvantages as well as limitations of the algorithms adopted are detailed. As a conclusion, the ray impedance related analyses demonstrated in this thesis are highly competent compared with the classical elastic impedance methods and the author would like to recommend it for a wider application

    Development and Testing of a Fractal Analysis Algorithm for Face Recognition

    Get PDF
    Following an earlier development for fingerprints by Deal (1) and Stoffa (2), it was suggested that this algorithm may work on faces (or more precisely, face images). First, this work transformed a 2-D electronic image file of a human face into a numeric system via a similar random walk process by Deal and Stoffa. Second, the numeric system was analyzed, and the numeric system may then be tested against a database of similarly converted images. The testing determined whether the subject of the image is part of the database. Finally, the efficiency, quickness, and accuracy of such an algorithm were tested and conclusions about the general effectiveness were made.;The algorithm employed a Random Walk analysis of digital photographs of human faces for a fixed number of binary images which were generated from the source photograph using a Boolean conversion scheme. The Random Walk generated a series of transition probabilities for a particular scale. In short, the numeric system used to describe the face will consisted of two dimensions of data---scale and binary image. The numeric system for a particular source photograph was tested against a database of similarly constructed systems to determine whether the subject of the source photograph was in the database.;For the purpose of this work, a database of 400 images was constructed from 167 individual subjects using the FERET database. The 400 images where then analyzed, and tested against the database to determine whether the algorithm could find the subjects in the database. The algorithm was able, in its best configuration, to identify correctly the subjects of 168 of the 400 photographs. However, the total time to run an image (after capture by a digital camera) to database comparison was only 62 seconds, which represents a substantial improvement over previous systems
    • ā€¦
    corecore