14 research outputs found

    Real-Time, Multiple Pan/Tilt/Zoom Computer Vision Tracking and 3D Positioning System for Unmanned Aerial System Metrology

    Get PDF
    The study of structural characteristics of Unmanned Aerial Systems (UASs) continues to be an important field of research for developing state of the art nano/micro systems. Development of a metrology system using computer vision (CV) tracking and 3D point extraction would provide an avenue for making these theoretical developments. This work provides a portable, scalable system capable of real-time tracking, zooming, and 3D position estimation of a UAS using multiple cameras. Current state-of-the-art photogrammetry systems use retro-reflective markers or single point lasers to obtain object poses and/or positions over time. Using a CV pan/tilt/zoom (PTZ) system has the potential to circumvent their limitations. The system developed in this paper exploits parallel-processing and the GPU for CV-tracking, using optical flow and known camera motion, in order to capture a moving object using two PTU cameras. The parallel-processing technique developed in this work is versatile, allowing the ability to test other CV methods with a PTZ system using known camera motion. Utilizing known camera poses, the object\u27s 3D position is estimated and focal lengths are estimated for filling the image to a desired amount. This system is tested against truth data obtained using an industrial system

    DESIGNING EYE TRACKING ALGORITHM FOR PARTNER-ASSISTED EYE SCANNING KEYBOARD FOR PHYSICALLY CHALLENGED PEOPLE

    Get PDF
    The proposed research work focuses on building a keyboard through designing an algorithm for eye movement detection using the partner-assisted scanning technique. The study covers all stages of gesture recognition, from data acquisition to eye detection and tracking, and finally classification. With the presence of many techniques to implement the gesture recognition stages, the main objective of this research work is implementing the simple and less expensive technique that produces the best possible results with a high level of accuracy. The results, finally, are compared with similar works done recently to prove the efficiency in implementation of the proposed algorithm. The system starts with the calibration phase, where a face detection algorithm is designed to detect the user‟s face by a trained support vector machine. Then, features are extracted, after which tracking of the eyes is possible by skin-colour segmentation. A couple of other operations were performed. The overall system is a keyboard that works by eye movement, through the partner-assisted scanning technique. A good level of accuracy was achieved, and a couple of alternative methods were implemented and compared. This keyboard adds to the research field, with a new and novel combination of techniques for eye detection and tracking. Also, the developed keyboard helps bridge the gap between physical paralysis and leading a normal life. This system can be used as comparison with other proposed algorithms for eye detection, and might be used as a proof for the efficiency of combining a number of different techniques into one algorithm. Also, it strongly supports the effectiveness of machine learning and appearance-based algorithms

    APPLICATION OF AUGMENTED REALITY IN MANUAL ASSEMBLY DESIGN AND PLANNING

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH

    Patterns and Pattern Languages for Mobile Augmented Reality

    Get PDF
    Mixed Reality is a relatively new field in computer science which uses technology as a medium to provide modified or enhanced views of reality or to virtually generate a new reality. Augmented Reality is a branch of Mixed Reality which blends the real-world as viewed through a computer interface with virtual objects generated by a computer. The 21st century commodification of mobile devices with multi-core Central Processing Units, Graphics Processing Units, high definition displays and multiple sensors controlled by capable Operating Systems such as Android and iOS means that Mobile Augmented Reality applications have become increasingly feasible. Mobile Augmented Reality is a multi-disciplinary field requiring a synthesis of many technologies such as computer graphics, computer vision, machine learning and mobile device programming while also requiring theoretical knowledge of diverse fields such as Linear Algebra, Projective and Differential Geometry, Probability and Optimisation. This multi-disciplinary nature has led to a fragmentation of knowledge into various specialisations, making it difficult to integrate different solution components into a coherent architecture. Software design patterns provide a solution space of tried and tested best practices for a specified problem within a given context. The solution space is non-prescriptive and is described in terms of relationships between roles that can be assigned to software components. Architectural patterns are used to specify high level designs of complete systems, as opposed to domain or tactical level patterns that address specific lower level problem areas. Pattern Languages comprise multiple software patterns combining in multiple possible sequences to form a language with the individual patterns forming the language vocabulary while the valid sequences through the patterns define the grammar. Pattern Languages provide flexible generalised solutions within a particular domain that can be customised to solve problems of differing characteristics and levels of iii complexity within the domain. The specification of one or more Pattern Languages tailored to the Mobile Augmented Reality domain can therefore provide a generalised guide for the design and architecture of Mobile Augmented Reality applications from an architectural level down to the ”nuts-and-bolts” implementation level. While there is a large body of research into the technical specialisations pertaining to Mobile Augmented Reality, there is a dearth of up-to-date literature covering Mobile Augmented Reality design. This thesis fills this vacuum by: 1. Providing architectural patterns that provide the spine on which the design of Mobile Augmented Reality artefacts can be based; 2. Documenting existing patterns within the context of Mobile Augmented Reality; 3. Identifying new patterns specific to Mobile Augmented Reality; and 4. Combining the patterns into Pattern Languages for Detection & Tracking, Rendering & Interaction and Data Access for Mobile Augmented Reality. The resulting Pattern Languages support design at multiple levels of complexity from an object-oriented framework down to specific one-off Augmented Reality applications. The practical contribution of this thesis is the specification of architectural patterns and Pattern Language that provide a unified design approach for both the overall architecture and the detailed design of Mobile Augmented Reality artefacts. The theoretical contribution is a design theory for Mobile Augmented Reality gleaned from the extraction of patterns and creation of a pattern language or languages

    Development of a text reading system on video images

    Get PDF
    Since the early days of computer science researchers sought to devise a machine which could automatically read text to help people with visual impairments. The problem of extracting and recognising text on document images has been largely resolved, but reading text from images of natural scenes remains a challenge. Scene text can present uneven lighting, complex backgrounds or perspective and lens distortion; it usually appears as short sentences or isolated words and shows a very diverse set of typefaces. However, video sequences of natural scenes provide a temporal redundancy that can be exploited to compensate for some of these deficiencies. Here we present a complete end-to-end, real-time scene text reading system on video images based on perspective aware text tracking. The main contribution of this work is a system that automatically detects, recognises and tracks text in videos of natural scenes in real-time. The focus of our method is on large text found in outdoor environments, such as shop signs, street names and billboards. We introduce novel efficient techniques for text detection, text aggregation and text perspective estimation. Furthermore, we propose using a set of Unscented Kalman Filters (UKF) to maintain each text region¿s identity and to continuously track the homography transformation of the text into a fronto-parallel view, thereby being resilient to erratic camera motion and wide baseline changes in orientation. The orientation of each text line is estimated using a method that relies on the geometry of the characters themselves to estimate a rectifying homography. This is done irrespective of the view of the text over a large range of orientations. We also demonstrate a wearable head-mounted device for text reading that encases a camera for image acquisition and a pair of headphones for synthesized speech output. Our system is designed for continuous and unsupervised operation over long periods of time. It is completely automatic and features quick failure recovery and interactive text reading. It is also highly parallelised in order to maximize the usage of available processing power and to achieve real-time operation. We show comparative results that improve the current state-of-the-art when correcting perspective deformation of scene text. The end-to-end system performance is demonstrated on sequences recorded in outdoor scenarios. Finally, we also release a dataset of text tracking videos along with the annotated ground-truth of text regions

    Vision systems for autonomous aircraft guidance

    Get PDF

    RECOGNITION OF FACES FROM SINGLE AND MULTI-VIEW VIDEOS

    Get PDF
    Face recognition has been an active research field for decades. In recent years, with videos playing an increasingly important role in our everyday life, video-based face recognition has begun to attract considerable research interest. This leads to a wide range of potential application areas, including TV/movies search and parsing, video surveillance, access control etc. Preliminary research results in this field have suggested that by exploiting the abundant spatial-temporal information contained in videos, we can greatly improve the accuracy and robustness of a visual recognition system. On the other hand, as this research area is still in its infancy, developing an end-to-end face processing pipeline that can robustly detect, track and recognize faces remains a challenging task. The goal of this dissertation is to study some of the related problems under different settings. We address the video-based face association problem, in which one attempts to extract face tracks of multiple subjects while maintaining label consistency. Traditional tracking algorithms have difficulty in handling this task, especially when challenging nuisance factors like motion blur, low resolution or significant camera motions are present. We demonstrate that contextual features, in addition to face appearance itself, play an important role in this case. We propose principled methods to combine multiple features using Conditional Random Fields and Max-Margin Markov networks to infer labels for the detected faces. Different from many existing approaches, our algorithms work in online mode and hence have a wider range of applications. We address issues such as parameter learning, inference and handling false positves/negatives that arise in the proposed approach. Finally, we evaluate our approach on several public databases. We next propose a novel video-based face recognition framework. We address the problem from two different aspects: To handle pose variations, we learn a Structural-SVM based detector which can simultaneously localize the face fiducial points and estimate the face pose. By adopting a different optimization criterion from existing algorithms, we are able to improve localization accuracy. To model other face variations, we use intra-personal/extra-personal dictionaries. The intra-personal/extra-personal modeling of human faces has been shown to work successfully in the Bayesian face recognition framework. It has additional advantages in scalability and generalization, which are of critical importance to real-world applications. Combining intra-personal/extra-personal models with dictionary learning enables us to achieve state-of-arts performance on unconstrained video data, even when the training data come from a different database. Finally, we present an approach for video-based face recognition using camera networks. The focus is on handling pose variations by applying the strength of the multi-view camera network. However, rather than taking the typical approach of modeling these variations, which eventually requires explicit knowledge about pose parameters, we rely on a pose-robust feature that eliminates the needs for pose estimation. The pose-robust feature is developed using the Spherical Harmonic (SH) representation theory. It is extracted using the surface texture map of a spherical model which approximates the subject's head. Feature vectors extracted from a video are modeled as an ensemble of instances of a probability distribution in the Reduced Kernel Hilbert Space (RKHS). The ensemble similarity measure in RKHS improves both robustness and accuracy of the recognition system. The proposed approach outperforms traditional algorithms on a multi-view video database collected using a camera network
    corecore