490 research outputs found

    The Discriminative Generalized Hough Transform for Localization of Highly Variable Objects and its Application for Surveillance Recordings

    Get PDF
    This work is about the localization of arbitrary objects in 2D images in general and the localization of persons in video surveillance recordings in particular. More precisely, it is about localizing specific landmarks. Thereby the possibilities and limitations of localization approaches based on the Generalized Hough Transform (GHT), especially of the Discriminative Generalized Hough Transform (DGHT) will be evaluated. GHT-based approaches determine the number of matching model and feature points and the most likely target point position is given by the highest number of matching model and feature points. Additionally, the DGHT comprises a statistical learning approach to generate optimal DGHT-models achieving good results on medical images. This work will show that the DGHT is not restricted to medical tasks but has issues with large target object variabilities, which are frequent in video surveillance tasks. As all GHT-based approaches also the DGHT only considers the number of matching model-feature-point-combinations, which means that all model points are treated independently. This work will show that model points are not independent of each other and considering them independently will result in high error rates. This drawback is analyzed and a universal solution, which is not only applicable for the DGHT but all GHT-based approaches, is presented. This solution is based on an additional classifier that takes the whole set of matching model-feature-point-combinations into account to estimate a confidence score. On all tested databases, this approach could reduce the error rates drastically by up to 94.9%. Furthermore, this work presents a general approach for combining multiple GHT-models into a deeper model. This can be used to combine the localization results of different object landmarks such as mouth, nose, and eyes. Similar to Convolutional Neural Networks (CNNs) this will split the target object variability into multiple and smaller variabilities. A comparison of GHT-based approaches with CNNs and a description of the advantages, disadvantages, and potential application of both approaches will conclude this work.Diese Arbeit beschäftigt sich im Allgemeinen mit der Lokalisierung von Objekten in 2D Bilddaten und im Speziellen mit der Lokalisierung von Personen in Videoüberwachungsaufnahmen. Genauer gesagt handelt es sich hierbei um die Lokalisierung spezieller Landmarken. Dabei werden die Möglichkeiten und Limiterungen von Lokalisierungsverfahren basierend auf der Generalisierten Hough Transformation (GHT) untersucht, insbesondere die der Diskriminativen Generalisierten Hough Transformation (DGHT). Bei GHT-basierten Ansätze wird die Anzahl an übereinstimmenden Modelpunkten und Merkmalspunkten ermittelt und die wahrscheinlicheste Objekt-Position ergibt sich aus der höchsten Anzahl an übereinstimmenden Model- und Merkmalspunkte. Die DGHT umfasst darüber hinaus noch ein statistisches Lernverfahren, um optimale DGHT-Modele zu erzeugen und erzielte damit auf medizinischen Bilder und Anwendungen sehr gute Erfolge. Wie sich in dieser Arbeit zeigen wird, ist die DGHT nicht auf medizinische Anwendungen beschränkt, hat allerdings Schwierigkeiten große Variabilität der Ziel-Objekte abzudecken, wie sie in Überwachungsszenarien zu erwarten sind. Genau wie alle GHT-basierten Ansätze leidet auch die DGHT unter dem Problem, dass lediglich die Anzahl an übereinstimmenden Model- und Merkmalspunkten ermittelt wird, was bedeutet, dass alle Modelpunkte unabhängig voneinander betrachtet werden. Dass Modelpunkte nicht unabhängig voneinander sind, wird im Laufe dieser Arbeit gezeigt werden, und die unabhängige Betrachtung führt gerade bei sehr variablen Zielobjekten zu einer hohen Fehlerrate. Dieses Problem wird in dieser Arbeit grundlegend untersucht und ein allgemeiner Lösungsansatz vorgestellt, welcher nicht nur für die DGHT sondern grundsätzlich für alle GHT-basierten Verfahren Anwendung finden kann. Die Lösung basiert auf der Integration eines zusätzlichen Klassifikators, welcher die gesamte Menge an übereinstimmenden Model- und Merkmalspunkten betrachtet und anhand dessen ein zusätzliches Konfidenzmaß vergibt. Dadurch konnte auf allen getesteten Datenbanken eine deutliche Reduktion der Fehlerrate erzielt werden von bis zu 94.9%. Darüber hinaus umfasst die Arbeit einen generellen Ansatz zur Kombination mehrere GHT-Model in einem tieferen Model. Dies kann dazu verwendet werden, um die Lokalisierungsergebnisse verschiedener Objekt-Landmarken zu kombinieren, z. B. die von Mund, Nase und Augen. Ähnlich wie auch bei Convolutional Neural Networks (CNNs) ist es damit möglich über mehrere Ebenen unterschiedliche Bereiche zu lokalisieren und somit die Variabilität des Zielobjektes in mehrere, leichter zu handhabenden Variabilitäten aufzuspalten. Abgeschlossen wird die Arbeit durch einen Vergleich von GHT-basierten Ansätzen mit CNNs und einer Beschreibung der Vor- und Nachteile und mögliche Einsatzfelder beider Verfahren

    Timely-automatic procedure for estimating the endocardial limits of the left ventricle assessed echocardiographically in clinical practice

    Get PDF
    In this paper, we propose an analytical rapid method to estimate the endocardial borders of the left ventricular walls on echocardiographic images for prospective clinical integration. The procedure was created as a diagnostic support tool for the clinician and it is based on the use of the anisotropic generalized Hough transform. Its application is guided by a Gabor-like filtering for the approximate delimitation of the region of interest without the need for computing further anatomical characteristics. The algorithm is applying directly a deformable template on the predetermined filtered region and therefore it is responsive and straightforward implementable. For accuracy considerations, we have employed a support vector machine classifier to determine the confidence level of the automated marking. The clinical tests were performed at the Cardiology Clinic of the County Emergency Hospital Timisoara and they improved the physicians perception in more than 50% of the cases. The report is concluded with medical discussions.European Union (UE)Ministerio de Economía y Competitividad (MINECO). Españ

    Automatic Multi-Scale and Multi-Object Pedestrian and Car Detection in Digital Images Based on the Discriminative Generalized Hough Transform and Deep Convolutional Neural Networks

    Get PDF
    Many approaches have been suggested for automatic pedestrian and car detection to cope with the large variability regarding object size, occlusion, background variability, aspect and so forth. Current state-of-the-art deep learning-based frameworks rely either on a proposal generation mechanism (e.g., "Faster R-CNN") or on the inspection of image quadrants / octants (e.g., "YOLO" or "SSD"), which are then further processed with deep convolutional neural networks (CNN). In this thesis, the Discriminative Generalized Hough Transform (DGHT), which operates on edge images, is analyzed for the application to automatic multi-scale and multi-object pedestrian and car detection in 2D digital images. The analysis motivates to use the DGHT as an efficient proposal generation mechanism, followed by a proposal (bounding box) refinement and proposal acceptance or rejection based on a deep CNN. The impact of the different components of the resulting DGHT object detection pipeline as well as the amount of DGHT training data on the detection performance are analyzed in detail. Due to the low false negative rate and the low number of candidates of the DGHT as well as the high classification accuracy of the CNN, competitive performance to the state-of-the-art in pedestrian and car detection is obtained on the IAIR database with much less generated proposals than other proposal-generating algorithms, being outperformed only by YOLOv2 fine-tuned to IAIR cars. By evaluations on further databases (without retraining or adaptation) the generalization capability of the DGHT object detection pipeline is shown

    Characterizing Objects in Images using Human Context

    Get PDF
    Humans have an unmatched capability of interpreting detailed information about existent objects by just looking at an image. Particularly, they can effortlessly perform the following tasks: 1) Localizing various objects in the image and 2) Assigning functionalities to the parts of localized objects. This dissertation addresses the problem of aiding vision systems accomplish these two goals. The first part of the dissertation concerns object detection in a Hough-based framework. To this end, the independence assumption between features is addressed by grouping them in a local neighborhood. We study the complementary nature of individual and grouped features and combine them to achieve improved performance. Further, we consider the challenging case of detecting small and medium sized household objects under human-object interactions. We first evaluate appearance based star and tree models. While the tree model is slightly better, appearance based methods continue to suffer due to deficiencies caused by human interactions. To this end, we successfully incorporate automatically extracted human pose as a form of context for object detection. The second part of the dissertation addresses the tedious process of manually annotating objects to train fully supervised detectors. We observe that videos of human-object interactions with activity labels can serve as weakly annotated examples of household objects. Since such objects cannot be localized only through appearance or motion, we propose a framework that includes human centric functionality to retrieve the common object. Designed to maximize data utility by detecting multiple instances of an object per video, the framework achieves performance comparable to its fully supervised counterpart. The final part of the dissertation concerns localizing functional regions or affordances within objects by casting the problem as that of semantic image segmentation. To this end, we introduce a dataset involving human-object interactions with strong i.e. pixel level and weak i.e. clickpoint and image level affordance annotations. We propose a framework that utilizes both forms of weak labels and demonstrate that efforts for weak annotation can be further optimized using human context

    Going Further with Point Pair Features

    Full text link
    Point Pair Features is a widely used method to detect 3D objects in point clouds, however they are prone to fail in presence of sensor noise and background clutter. We introduce novel sampling and voting schemes that significantly reduces the influence of clutter and sensor noise. Our experiments show that with our improvements, PPFs become competitive against state-of-the-art methods as it outperforms them on several objects from challenging benchmarks, at a low computational cost.Comment: Corrected post-print of manuscript accepted to the European Conference on Computer Vision (ECCV) 2016; https://link.springer.com/chapter/10.1007/978-3-319-46487-9_5

    Monocular 3d Object Recognition

    Get PDF
    Object recognition is one of the fundamental tasks of computer vision. Recent advances in the field enable reliable 2D detections from a single cluttered image. However, many challenges still remain. Object detection needs timely response for real world applications. Moreover, we are genuinely interested in estimating the 3D pose and shape of an object or human for the sake of robotic manipulation and human-robot interaction. In this thesis, a suite of solutions to these challenges is presented. First, Active Deformable Part Models (ADPM) is proposed for fast part-based object detection. ADPM dramatically accelerates the detection by dynamically scheduling the part evaluations and efficiently pruning the image locations. Second, we unleash the power of marrying discriminative 2D parts with an explicit 3D geometric representation. Several methods of such scheme are proposed for recovering rich 3D information of both rigid and non-rigid objects from monocular RGB images. (1) The accurate 3D pose of an object instance is recovered from cluttered images using only the CAD model. (2) A global optimal solution for simultaneous 2D part localization, 3D pose and shape estimation is obtained by optimizing a unified convex objective function. Both appearance and geometric compatibility are jointly maximized. (3) 3D human pose estimation from an image sequence is realized via an Expectation-Maximization algorithm. The 2D joint location uncertainties are marginalized out during inference and 3D pose smoothness is enforced across frames. By bridging the gap between 2D and 3D, our methods provide an end-to-end solution to 3D object recognition from images. We demonstrate a range of interesting applications using only a single image or a monocular video, including autonomous robotic grasping with a single image, 3D object image pop-up and a monocular human MoCap system. We also show empirical start-of-art results on a number of benchmarks on 2D detection and 3D pose and shape estimation
    corecore