1,119 research outputs found

    Making Deep Heatmaps Robust to Partial Occlusions for 3D Object Pose Estimation

    Full text link
    We introduce a novel method for robust and accurate 3D object pose estimation from a single color image under large occlusions. Following recent approaches, we first predict the 2D projections of 3D points related to the target object and then compute the 3D pose from these correspondences using a geometric method. Unfortunately, as the results of our experiments show, predicting these 2D projections using a regular CNN or a Convolutional Pose Machine is highly sensitive to partial occlusions, even when these methods are trained with partially occluded examples. Our solution is to predict heatmaps from multiple small patches independently and to accumulate the results to obtain accurate and robust predictions. Training subsequently becomes challenging because patches with similar appearances but different positions on the object correspond to different heatmaps. However, we provide a simple yet effective solution to deal with such ambiguities. We show that our approach outperforms existing methods on two challenging datasets: The Occluded LineMOD dataset and the YCB-Video dataset, both exhibiting cluttered scenes with highly occluded objects. Project website: https://www.tugraz.at/institute/icg/research/team-lepetit/research-projects/robust-object-pose-estimation

    An intelligent robotic vision system with environment perception

    Get PDF
    Ever since the dawn of computer vision[1, 2], 3D environment reconstruction and object 6D pose estimation have been a core problem. This thesis attempts to develop a novel 3D intelligent robotic vision system integrating environment reconstruction and object detection techniques to solve practical problems. Chapter 2 reviews current state-of-the art of 3D vision techniques from environment reconstruction and 6D pose estimation.In Chapter 3 a novel environment reconstruction system is proposed by using coloured point clouds. The evaluation experiment indicates that the proposed algorithm 2 is effective for small-scale and large scale and textureless scenes. Chapter 4 presents Image-6D (that is section 4.2), a learning-based object pose estimation algorithm from a single RGB image. Contour-alignment is introduced as an efficient algorithm for pose refinement in an RGB image. This new method is evaluated on two widely used benchmark image data bases, LINEMOD and Occlusion-LINEMOD. Experiments show that the proposed method surpasses other state-of-the-art RGB based prediction approaches. Chapter 5 describes Point-6D (defined in section 5.2), a novel 6D pose estimation method using coloured point clouds as input. The performance of this new method is demonstrated on LineMOD [3] and YCB-Video [4] dataset. Chapter 6 summarizes contributions and discusses potential future research directions. In addition, we presents an intelligent 3D robotic vision system deployed in a simulated/laboratory nuclear waste disposal scenario in Appendices B. To verify the results, a simulated nuclear waste handling experiment has been successfully completed via the proposed robotic system

    Discriminative Appearance Models for Face Alignment

    Get PDF
    The proposed face alignment algorithm uses local gradient features as the appearance representation. These features are obtained by pixel value comparison, which provide robustness against changes in illumination, as well as partial occlusion and local deformation due to the locality. The adopted features are modeled in three discriminative methods, which correspond to different alignment cost functions. The discriminative appearance modeling alleviate the generalization problem to some extent

    Structure-aware image denoising, super-resolution, and enhancement methods

    Get PDF
    Denoising, super-resolution and structure enhancement are classical image processing applications. The motive behind their existence is to aid our visual analysis of raw digital images. Despite tremendous progress in these fields, certain difficult problems are still open to research. For example, denoising and super-resolution techniques which possess all the following properties, are very scarce: They must preserve critical structures like corners, should be robust to the type of noise distribution, avoid undesirable artefacts, and also be fast. The area of structure enhancement also has an unresolved issue: Very little efforts have been put into designing models that can tackle anisotropic deformations in the image acquisition process. In this thesis, we design novel methods in the form of partial differential equations, patch-based approaches and variational models to overcome the aforementioned obstacles. In most cases, our methods outperform the existing approaches in both quality and speed, despite being applicable to a broader range of practical situations.Entrauschen, Superresolution und Strukturverbesserung sind klassische Anwendungen der Bildverarbeitung. Ihre Existenz bedingt sich in dem Bestreben, die visuelle Begutachtung digitaler Bildrohdaten zu unterstützen. Trotz erheblicher Fortschritte in diesen Feldern bedürfen bestimmte schwierige Probleme noch weiterer Forschung. So sind beispielsweise Entrauschungsund Superresolutionsverfahren, welche alle der folgenden Eingenschaften besitzen, sehr selten: die Erhaltung wichtiger Strukturen wie Ecken, Robustheit bezüglich der Rauschverteilung, Vermeidung unerwünschter Artefakte und niedrige Laufzeit. Auch im Gebiet der Strukturverbesserung liegt ein ungelöstes Problem vor: Bisher wurde nur sehr wenig Forschungsaufwand in die Entwicklung von Modellen investieret, welche anisotrope Deformationen in bildgebenden Verfahren bewältigen können. In dieser Arbeit entwerfen wir neue Methoden in Form von partiellen Differentialgleichungen, patch-basierten Ansätzen und Variationsmodellen um die oben erwähnten Hindernisse zu überwinden. In den meisten Fällen übertreffen unsere Methoden nicht nur qualitativ die bisher verwendeten Ansätze, sondern lösen die gestellten Aufgaben auch schneller. Zudem decken wir mit unseren Modellen einen breiteren Bereich praktischer Fragestellungen ab

    Characterizing Objects in Images using Human Context

    Get PDF
    Humans have an unmatched capability of interpreting detailed information about existent objects by just looking at an image. Particularly, they can effortlessly perform the following tasks: 1) Localizing various objects in the image and 2) Assigning functionalities to the parts of localized objects. This dissertation addresses the problem of aiding vision systems accomplish these two goals. The first part of the dissertation concerns object detection in a Hough-based framework. To this end, the independence assumption between features is addressed by grouping them in a local neighborhood. We study the complementary nature of individual and grouped features and combine them to achieve improved performance. Further, we consider the challenging case of detecting small and medium sized household objects under human-object interactions. We first evaluate appearance based star and tree models. While the tree model is slightly better, appearance based methods continue to suffer due to deficiencies caused by human interactions. To this end, we successfully incorporate automatically extracted human pose as a form of context for object detection. The second part of the dissertation addresses the tedious process of manually annotating objects to train fully supervised detectors. We observe that videos of human-object interactions with activity labels can serve as weakly annotated examples of household objects. Since such objects cannot be localized only through appearance or motion, we propose a framework that includes human centric functionality to retrieve the common object. Designed to maximize data utility by detecting multiple instances of an object per video, the framework achieves performance comparable to its fully supervised counterpart. The final part of the dissertation concerns localizing functional regions or affordances within objects by casting the problem as that of semantic image segmentation. To this end, we introduce a dataset involving human-object interactions with strong i.e. pixel level and weak i.e. clickpoint and image level affordance annotations. We propose a framework that utilizes both forms of weak labels and demonstrate that efforts for weak annotation can be further optimized using human context

    Machine Learning to Determine Type of Vehicle

    Get PDF
    Generally, the present disclosure is directed to determining the vehicle type of a vehicle (e.g. for use in vehicle navigation). In particular, in some implementations, the systems and methods of the present disclosure can include or otherwise leverage one or more machine-learned models to predict a vehicle type of a vehicle based on vehicle data relating to the operation and/or capacity of a vehicle

    Automating Bridge Inspection Procedures: Real-Time UAS-Based Detection and Tracking of Concrete Bridge Element

    Get PDF
    Bridge inspections are necessary to maintain the safety, health, and welfare of the public. All bridges in the United States are federally mandated to undergo routine evaluations to confirm their structural integrity throughout their lifetime. The traditional process implements a bridge inspection team to conduct the inspection, heavily relying on visual measurements and subjective estimates of the existing state of the structure. Conducting unmanned automated bridge inspections would allow for a more efficient, accurate, and safer alternative to traditional bridge inspection procedures. Optimizing bridge inspections in this manner would enable frequent inspections in order to comprehensively monitor the health of bridges and quickly recognize minor problems which could be easily corrected before turning into more critical issues. In order to create an unmanned data acquisition procedure, unmanned aerial vehicles with high-resolution cameras will be employed to collect videos of the bridge under inspection. To automate a bridge inspection procedure employing machine learning methods, such as neural networks, and machine vision methods, such as Hough transform and Canny edge detection, will assist in identifying the entire beam. These methods along with future work in damage detection and assessment will be the main steps to create an unmanned automated bridge inspection

    Automatic Scaffolding Productivity Measurement through Deep Learning

    Get PDF
    This study developed a method to automatically measure scaffolding productivity by extracting and analysing semantic information from onsite vision data
    corecore