6 research outputs found

    A Comparison and Evaluation of Three Different Pose Estimation Algorithms In Detecting Low Texture Manufactured Objects

    Get PDF
    This thesis examines the problem of pose estimation, which is the problem of determining the pose of an object in some coordinate system. Pose refers to the object\u27s position and orientation in the coordinate system. In particular, this thesis examines pose estimation techniques using either monocular or binocular vision systems. Generally, when trying to find the pose of an object the objective is to generate a set of matching features, which may be points or lines, between a model of the object and the current image of the object. These matches can then be used to determine the pose of the object which was imaged. The algorithms presented in this thesis all generate possible matches and then use these matches to generate poses. The two monocular pose estimation techniques examined are two versions of SoftPOSIT: the traditional approach using point features, and a more recent approach using line features. The algorithms function in very much the same way with the only difference being the features used by the algorithms. Both algorithms are started with a random initial guess of the object\u27s pose. Using this pose a set of possible point matches is generated, and then using these matches the pose is refined so that the distances between matched points are reduced. Once the pose is refined, a new set of matches is generated. The process is then repeated until convergence, i.e., minimal or no change in the pose. The matched features depend on the initial pose, thus the algorithm\u27s output is dependent upon the initially guessed pose. By starting the algorithm with a variety of different poses, the goal of the algorithm is to determine the correct correspondences and then generate the correct pose. The binocular pose estimation technique presented attempts to match 3-D point data from a model of an object, to 3-D point data generated from the current view of the object. In both cases the point data is generated using a stereo camera. This algorithm attempts to match 3-D point triplets in the model to 3-D point triplets from the current view, and then use these matched triplets to obtain the pose parameters that describe the object\u27s location and orientation in space. The results of attempting to determine the pose of three different low texture manufactured objects across a sample set of 95 images are presented using each algorithm. The results of the two monocular methods are directly compared and examined. The results of the binocular method are examined as well, and then all three algorithms are compared. Out of the three methods, the best performing algorithm, by a significant margin, was found to be the binocular method. The types of objects searched for all had low feature counts, low surface texture variation, and multiple degrees of symmetry. The results indicate that it is generally hard to robustly determine the pose of these types of objects. Finally, suggestions are made for improvements that could be made to the algorithms which may lead to better pose results

    Learning Pose Invariant and Covariant Classifiers from Image Sequences

    Get PDF
    Object tracking and detection over a wide range of viewpoints is a long-standing problem in Computer Vision. Despite significant advance in wide-baseline sparse interest point matching and development of robust dense feature models, it remains a largely open problem. Moreover, abundance of low cost mobile platforms and novel application areas, such as real-time Augmented Reality, constantly push the performance limits of existing methods. There is a need to modify and adapt these to meet more stringent speed and capacity requirements. In this thesis, we aim to overcome the difficulties due to the multi-view nature of the object detection task. We significantly improve upon existing statistical keypoint matching algorithms to perform fast and robust recognition of image patches independently of object pose. We demonstrate this on various 2D and 3D datasets. The statistical keypoint matching approaches require massive amounts of training data covering a wide range of viewpoints. We have developed a weakly supervised algorithm to greatly simplify their training for 3D objects. We also integrate this algorithm in a 3D tracking-by-detection system to perform real-time Augmented Reality. Finally, we extend the use of a large training set with smooth viewpoint variation to category-level object detection. We introduce a new dataset with continuous pose annotations which we use to train pose estimators for objects of a single category. By using these estimators' output to select pose specific classifiers, our framework can simultaneously localize objects in an image and recover their pose. These decoupled pose estimation and classification steps yield improved detection rates. Overall, we rely on image and video sequences to train classifiers that can either operate independently of the object pose or recover the pose parameters explicitly. We show that in both cases our approaches mitigate the effects of viewpoint changes and improve the recognition performance

    Alinhamento imagem-modelo baseada na visão estereo de regiões planares arbitrarias

    Get PDF
    Orientador: Clesio Luis TozziTese (doutorado) - Universidade Estadual de Campinas, Faculdade de Engenharia Eletrica e de ComputaçãoDoutorad

    Modellbasierte Lokalisation und Verfolgung für sichtsystemgestützte Regelungen [online]

    Get PDF

    Object tracking in augmented reality remote access laboratories without fiducial markers

    Get PDF
    Remote Access Laboratories provide students with access to learning resources without the need to be in-situ (with the assets). The technology endows users with access to physical experiments anywhere and anytime, while also minimising or distributing the cost of operation for expensive laboratory equipment. Augmented Reality is a technology which provides interactive sensory feedback to users. The user experiences reality through a computer-based user interface with additional computer-generated information in the form applicable to the targeted senses. Recent advances in high definition video capture devices, video screens and mobile computers have driven resurgence in mainstream Augmented Reality technologies. Lower cost and greater processing power of microprocessors and memory place the resources in the hands of developers and users alike, allowing education institutes to invest in technologies that enhance the delivery of course content. This increase in pedagogical resources has already allowed the phenomenon of education at a distance to reach students from a wide range of demographics, improving access and outcomes in multiple disciplines. Incorporating Augmented Reality into Remote Access Laboratories resources has the benefit of improving overall user immersion into the remote experiment, thus improving student engagement and understanding of the delivered material. Visual implementations of Augmented Reality rely on providing the user with seamless integration of the current environment (through mobile device, desktop PC, or heads up display) with computer generated artificial visual artefacts. Virtual objects must appear in context to the current environment, and respond in a realistic period, or else the user suffers from a disjointed and confusing blend of real and virtual information. Understanding and interacting with the visual scene is controlled through Computer Vision algorithms, and are crucial in ensuring that the AR systems co-operate with the data discovered through the systems. While Augmented Reality has begun to expand in the educational environment, currently, there is still very little overlap of Augmented Reality technologies with Remote Access Laboratories. This research has investigated Computer Vision models that support Augmented Reality technologies such that live video streams from Remote Laboratories are enhanced by synthetic overlays pertinent to the experiments. Orientation of synthetic visual overlays requires knowledge of key reference points, often performed by fiducial markers. Removing the equipment’s need for fiducial markers and a priori knowledge simplifies and accelerates the uptake and expansion of the technology. These works uncover hybrid Computer Vision models which require no prior knowledge of the laboratory environment, including no fiducial markers or tags to track important objects and references. Developed models derive all relevant data from the live video stream and require no previous knowledge regarding the configuration of the physical scene. The new image analysis paradigms, (Two-Dimensional Colour Histograms and Neighbourhood Gradient Signature) improve the current state of markerless tracking through the unique attributes discovered within the sequential video frames. Novel methods are also established, with which to assess and measure the performance of Computer Vision models. Objective ground truth images minimise the level of subjective interference in measuring the efficacy of CV edge and corner detectors. Additionally, locating an effective method to contrast detected attributes associated with an image or object, has provided a means to measure the likelihood of an image match between video frames. In combination with existing material and new contributions, this research demonstrates effective object detection and tracking for Augmented Reality systems within a Remote Access Laboratory environment, with no requirement for fiducial markers, or prior knowledge of the environment. The models that have been proposed in the work can be generalised to be used in any cyber-physical environment that facilitates peripherals such as cameras and other sensors

    Tracking Objects with a Recognition Algorithm

    No full text
    International audienceIn this paper, we propose an efficient method for tracking 3D modelled objects in cluttered scenes. Rather than tracking objects in the image, our approach relies on the object recognition aspect of tracking. Possible matches between image and model features define volumes within a transformation space. The model is best aligned with the image in volumes satisfying the greatest number of correspondences. Object motion defines a trajectory in the transformation space. We propose an efficient algorithm to compute these transformations
    corecore