629 research outputs found

    Agent and object aware tracking and mapping methods for mobile manipulators

    Get PDF
    The age of the intelligent machine is upon us. They exist in our factories, our warehouses, our military, our hospitals, on our roads, and on the moon. Most of these things we call robots. When placed in a controlled or known environment such as an automotive factory or a distribution warehouse they perform their given roles with exceptional efficiency, achieving far more than is within reach of a humble human being. Despite the remarkable success of intelligent machines in such domains, they have yet to make a full-hearted deployment into our homes. The missing link between the robots we have now and the robots that are soon to come to our houses is perception. Perception as we mean it here refers to a level of understanding beyond the collection and aggregation of sensory data. Much of the available sensory information is noisy and unreliable, our homes contain many reflective surfaces, repeating textures on large flat surfaces, and many disruptive moving elements, including humans. These environments change over time, with objects frequently moving within and between rooms. This idea of change in an environment is fundamental to robotic applications, as in most cases we expect them to be effectors of such change. We can identify two particular challenges1 that must be solved for robots to make the jump to less structured environments - how to manage noise and disruptive elements in observational data, and how to understand the world as a set of changeable elements (objects) which move over time within a wider environment. In this thesis we look at one possible approach to solving each of these problems. For the first challenge we use proprioception aboard a robot with an articulated arm to handle difficult and unreliable visual data caused both by the robot and the environment. We use sensor data aboard the robot to improve the pose tracking of a visual system when the robot moves rapidly, with high jerk, or when observing a scene with little visual variation. For the second challenge, we build a model of the world on the level of rigid objects, and relocalise them both as they change location between different sequences and as they move. We use semantics, image keypoints, and 3D geometry to register and align objects between sequences, showing how their position has moved between disparate observations.Open Acces

    Neighborhood Defined Feature Selection Strategy for Improved Face Recognition in Different Sensor Modalitie

    Get PDF
    A novel feature selection strategy for improved face recognition in images with variations due to illumination conditions, facial expressions, and partial occlusions is presented in this dissertation. A hybrid face recognition system that uses feature maps of phase congruency and modular kernel spaces is developed. Phase congruency provides a measure that is independent of the overall magnitude of a signal, making it invariant to variations in image illumination and contrast. A novel modular kernel spaces approach is developed and implemented on the phase congruency feature maps. Smaller sub-regions from a predefined neighborhood within the phase congruency images of the training samples are merged to obtain a large set of features. These features are then projected into higher dimensional spaces using kernel methods. The unique modularization procedure developed in this research takes into consideration that the facial variations in a real world scenario are confined to local regions. The additional pixel dependencies that are considered based on their importance help in providing additional information for classification. This procedure also helps in robust localization of the variations, further improving classification accuracy. The effectiveness of the new feature selection strategy has been demonstrated by employing it in two specific applications via face authentication in low resolution cameras and face recognition using multiple sensors (visible and infrared). The face authentication system uses low quality images captured by a web camera. The optical sensor of the web camera is very sensitive to environmental illumination variations. It is observed that the feature selection policy overcomes the facial and environmental variations. A methodology based on multiple training images and clustering is also incorporated to overcome the additional challenges of computational efficiency and the subject\u27s non involvement. A multi-sensor image fusion based face recognition methodology that uses the proposed feature selection technique is presented in this dissertation. Research studies have indicated that complementary information from different sensors helps in improving the recognition accuracy compared to individual modalities. A decision level fusion methodology is also developed which provides better performance compared to individual as well as data level fusion modalities. The new decision level fusion technique is also robust to registration discrepancies, which is a very important factor in operational scenarios. Research work is progressing to use the new face recognition technique in multi-view images by employing independent systems for separate views and integrating the results with an appropriate voting procedure

    Towards Robust, Interpretable and Scalable Visual Representations

    Get PDF
    Visual representation is one of the central problems in computer vision. The essential problem is to develop a unified representation that effectively encodes both visual appearance and spatial information so that it can be easily applied to various vision applications such as face recognition, image matching, and multimodal image retrieval. Along with the history of computer vision research, there are four major levels of visual representations, i.e., geometric, low-level, mid-level and high-level. The dissertation comprises four works studying effective visual representations in the four different levels. Multiple approaches are proposed with the aim of improving the robustness, interpretability, and scalability of visual representations. Geometric features are effective in matching images under spatial transformations however their performance is sensitive to the noises. In the first part, we propose to model the uncertainty of geometric representation based on line segments and propose to equip these features with uncertainty modeling so that they could be robustly applied in the image-based geolocation application. We study in the second part the robustness of feature encoding to noisy keypoints. We show that traditional feature encoding is sensitive to background or noisy features. We propose the Selective Encoding framework which learns the relevance distribution of each codeword and incorporate such information with the original codebook model. Our approach is more robust to the localization errors or uncertainty in the active face authentication application. The mission of visual understanding is to express and describe the image content which is essentially relating images to human language. That typically involves finding a common representation inferable from both domains of data. In the third part, we propose a framework to extract a mid-level spatial representation directly from language descriptions and match such spatial layouts to the detected object bounding boxes for retrieving indoor scene images from user text queries. Modern high-level visual features are typically learned from supervised datasets, whose scalability is largely limited by the requirement of dedicated human annotation. In the last part, we propose to learn visual representations from large-scale weakly supervised data for a large number of natural language-based concepts, i.e., n-gram phrases. We propose the differentiable Jelinek-Mercer smoothing loss and train a deep convolutional neural network from images with associated user comments. We show that the learned model can predict a large number of phrase-based concepts from images, can be effectively applied to image-caption applications and transfers well to other visual recognition datasets

    Robust facial expression recognition in the presence of rotation and partial occlusion

    Get PDF
    >Magister Scientiae - MScThis research proposes an approach to recognizing facial expressions in the presence of rotations and partial occlusions of the face. The research is in the context of automatic machine translation of South African Sign Language (SASL) to English. The proposed method is able to accurately recognize frontal facial images at an average accuracy of 75%. It also achieves a high recognition accuracy of 70% for faces rotated to 60â—¦. It was also shown that the method is able to continue to recognize facial expressions even in the presence of full occlusions of the eyes, mouth and left/right sides of the face. The accuracy was as high as 70% for occlusion of some areas. An additional finding was that both the left and the right sides of the face are required for recognition. As an addition, the foundation was laid for a fully automatic facial expression recognition system that can accurately segment frontal or rotated faces in a video sequence

    Proceedings of the 2009 Joint Workshop of Fraunhofer IOSB and Institute for Anthropomatics, Vision and Fusion Laboratory

    Get PDF
    The joint workshop of the Fraunhofer Institute of Optronics, System Technologies and Image Exploitation IOSB, Karlsruhe, and the Vision and Fusion Laboratory (Institute for Anthropomatics, Karlsruhe Institute of Technology (KIT)), is organized annually since 2005 with the aim to report on the latest research and development findings of the doctoral students of both institutions. This book provides a collection of 16 technical reports on the research results presented on the 2009 workshop

    Conditional Image Synthesis by Generative Adversarial Modeling

    Get PDF
    Recent years, image synthesis has attracted more interests. This work explores the recovery of details (low-level information) from high-level features. The generative adversarial nets (GAN) has led to the explosion of image synthesis. Moving away from those application-oriented alternatives, this work investigates its intrinsic drawbacks and derives corresponding improvements in a theoretical manner.Based on GAN, this work further investigates the conditional image synthesis by incorporating an autoencoder (AE) to GAN. The GAN+AE structure has been demonstrated to be an effective framework for image manipulation. This work emphasizes the effectiveness of GAN+AE structure by proposing the conditional adversarial autoencoder (CAAE) for human facial age progression and regression. Instead of editing on the image level, i.e., explicitly changing the shape of face, adding wrinkle, etc., this work edits the high-level features which implicitly guide the recovery of images towards expected appearance.While GAN+AE being prevalent in image manipulation, its drawbacks lack exploration. For example, GAN+AE requires a weight to balance the effects of GAN and AE. An inappropriate weight would generate unstable results. This work provides an insight to such instability, which is due to the interaction between GAN and AE. Therefore, this work proposes the decoupled learning (GAN//AE) to avoid the interaction between them and achieve a robust and effective framework for image synthesis. Most existing works used GAN+AE structure could be easily adapted to the proposed GAN//AE structure to boost their robustness. Experimental results demonstrate the correctness and effectiveness of the provided derivation and proposed methods, respectively.In addition, this work extends the conditional image synthesis to the traditional area of image super-resolution, which recovers the high-resolution image according the low-resolution counterpart. Diverting from such traditional routine, this work explores a new research direction | reference-conditioned super-resolution, in which a reference image containing desired high-resolution texture details is used besides the low-resolution image. We focus on transferring the high-resolution texture from reference images to the super-resolution process without the constraint of content similarity between reference and target images, which is a key difference from previous example-based methods

    Efficient implementation of video processing algorithms on FPGA

    Get PDF
    The work contained in this portfolio thesis was carried out as part of an Engineering Doctorate (Eng.D) programme from the Institute for System Level Integration. The work was sponsored by Thales Optronics, and focuses on issues surrounding the implementation of video processing algorithms on field programmable gate arrays (FPGA). A description is given of FPGA technology and the currently dominant methods of designing and verifying firmware. The problems of translating a description of behaviour into one of structure are discussed, and some of the latest methodologies for tackling this problem are introduced. A number of algorithms are then looked at, including methods of contrast enhancement, deconvolution, and image fusion. Algorithms are characterised according to the nature of their execution flow, and this is used as justification for some of the design choices that are made. An efficient method of performing large two-dimensional convolutions is also described. The portfolio also contains a discussion of an FPGA implementation of a PID control algorithm, an overview of FPGA dynamic reconfigurability, and the development of a demonstration platform for rapid deployment of video processing algorithms in FPGA hardware

    Real-time synthetic primate vision

    Get PDF
    • …
    corecore