75 research outputs found

    Statistical/Geometric Techniques for Object Representation and Recognition

    Get PDF
    Object modeling and recognition are key areas of research in computer vision and graphics with wide range of applications. Though research in these areas is not new, traditionally most of it has focused on analyzing problems under controlled environments. The challenges posed by real life applications demand for more general and robust solutions. The wide variety of objects with large intra-class variability makes the task very challenging. The difficulty in modeling and matching objects also vary depending on the input modality. In addition, the easy availability of sensors and storage have resulted in tremendous increase in the amount of data that needs to be processed which requires efficient algorithms suitable for large-size databases. In this dissertation, we address some of the challenges involved in modeling and matching of objects in realistic scenarios. Object matching in images require accounting for large variability in the appearance due to changes in illumination and view point. Any real world object is characterized by its underlying shape and albedo, which unlike the image intensity are insensitive to changes in illumination conditions. We propose a stochastic filtering framework for estimating object albedo from a single intensity image by formulating the albedo estimation as an image estimation problem. We also show how this albedo estimate can be used for illumination insensitive object matching and for more accurate shape recovery from a single image using standard shape from shading formulation. We start with the simpler problem where the pose of the object is known and only the illumination varies. We then extend the proposed approach to handle unknown pose in addition to illumination variations. We also use the estimated albedo maps for another important application, which is recognizing faces across age progression. Many approaches which address the problem of modeling and recognizing objects from images assume that the underlying objects are of diffused texture. But most real world objects exhibit a combination of diffused and specular properties. We propose an approach for separating the diffused and specular reflectance from a given color image so that the algorithms proposed for objects of diffused texture become applicable to a much wider range of real world objects. Representing and matching the 2D and 3D geometry of objects is also an integral part of object matching with applications in gesture recognition, activity classification, trademark and logo recognition, etc. The challenge in matching 2D/3D shapes lies in accounting for the different rigid and non-rigid deformations, large intra-class variability, noise and outliers. In addition, since shapes are usually represented as a collection of landmark points, the shape matching algorithm also has to deal with the challenges of missing or unknown correspondence across these data points. We propose an efficient shape indexing approach where the different feature vectors representing the shape are mapped to a hash table. For a query shape, we show how the similar shapes in the database can be efficiently retrieved without the need for establishing correspondence making the algorithm extremely fast and scalable. We also propose an approach for matching and registration of 3D point cloud data across unknown or missing correspondence using an implicit surface representation. Finally, we discuss possible future directions of this research

    A Multispectral Bidirectional Reflectance Distribution Function Study of Human Skin for Improved Dismount Detection

    Get PDF
    In 2008, the Sensors Exploitation Research Group at the Air Force Institute of Technology began using spectral properties of skin for the detection and classification of humans. Since then a multispectral skin detection system was developed to exploit the optical properties of human skin at wavelengths in the visible and near infrared region of the electromagnetic spectrum. A rules-based detector, analyzing an image spectrally, currently bases its skin pixel selection criteria on a diffuse skin reflectance model. However, when observing skin in direct view of the sun, a glint of light off skin is common and indicates specularity. The areas of skin with a high degree of specular reflectance, results in misdetections. We show that skin is characterized by diffuse and specular reflectance, with both components dependent on the scene configuration. While we cannot always rely on the person to directly face the camera or have constant illumination conditions, it is important to have flexibility with the rules-based detector as the scene changes. Our research better characterizes skin reflectance as a function of source and detector angular locations to improve on the rules-based detector

    Data driven analysis of faces from images

    Get PDF
    This thesis proposes three new data-driven approaches to detect, analyze, or modify faces in images. All presented contributions are inspired by the use of prior knowledge and they derive information about facial appearances from pre-collected databases of images or 3D face models. First, we contribute an approach that extends a widely-used monocular face detector by an additional classifier that evaluates disparity maps of a passive stereo camera. The algorithm runs in real-time and significantly reduces the number of false positives compared to the monocular approach. Next, with a many-core implementation of the detector, we train view-dependent face detectors based on tailored views which guarantee that the statistical variability is fully covered. These detectors are superior to the state of the art on a challenging dataset and can be trained in an automated procedure. Finally, we contribute a model describing the relation of facial appearance and makeup. The approach extracts makeup from before/after images of faces and allows to modify faces in images. Applications such as machine-suggested makeup can improve perceived attractiveness as shown in a perceptual study. In summary, the presented methods help improve the outcome of face detection algorithms, ease and automate their training procedures and the modification of faces in images. Moreover, their data-driven nature enables new and powerful applications arising from the use of prior knowledge and statistical analyses.In der vorliegenden Arbeit werden drei neue, datengetriebene Methoden vorgestellt, die Gesichter in Abbildungen detektieren, analysieren oder modifizieren. Alle Algorithmen extrahieren dabei Vorwissen über Gesichter und deren Erscheinungsformen aus zuvor erstellten Gesichts- Datenbanken, in 2-D oder 3-D. Zunächst wird ein weit verbreiteter monokularer Gesichtsdetektions- Algorithmus um einen zweiten Klassifikator erweitert. In Echtzeit wertet dieser stereoskopische Tiefenkarten aus und führt so zu nachweislich weniger falsch detektierten Gesichtern. Anschließend wird der Basis-Algorithmus durch Parallelisierung verbessert und mit synthetisch generierten Bilddaten trainiert. Diese garantieren die volle Nutzung des verfügbaren Varianzspektrums. So erzeugte Detektoren übertreffen bisher präsentierte Detektoren auf einem schwierigen Datensatz und können automatisch erzeugt werden. Abschließend wird ein Datenmodell für Gesichts-Make-up vorgestellt. Dieses extrahiert Make-up aus Vorher/Nachher-Fotos und kann Gesichter in Abbildungen modifizieren. In einer Studie wird gezeigt, dass vom Computer empfohlenes Make-up die wahrgenommene Attraktivität von Gesichtern steigert. Zusammengefasst verbessern die gezeigten Methoden die Ergebnisse von Gesichtsdetektoren, erleichtern und automatisieren ihre Trainingsprozedur sowie die automatische Veränderung von Gesichtern in Abbildungen. Durch Extraktion von Vorwissen und statistische Datenanalyse entstehen zudem neuartige Anwendungsfelder

    A Temporal Learning Approach to Inpainting Endoscopic Specularities and Its effect on Image Correspondence

    Get PDF
    Video streams are utilised to guide minimally-invasive surgery and diagnostic procedures in a wide range of procedures, and many computer assisted techniques have been developed to automatically analyse them. These approaches can provide additional information to the surgeon such as lesion detection, instrument navigation, or anatomy 3D shape modeling. However, the necessary image features to recognise these patterns are not always reliably detected due to the presence of irregular light patterns such as specular highlight reflections. In this paper, we aim at removing specular highlights from endoscopic videos using machine learning. We propose using a temporal generative adversarial network (GAN) to inpaint the hidden anatomy under specularities, inferring its appearance spatially and from neighbouring frames where they are not present in the same location. This is achieved using in-vivo data of gastric endoscopy (Hyper-Kvasir) in a fully unsupervised manner that relies on automatic detection of specular highlights. System evaluations show significant improvements to traditional methods through direct comparison as well as other machine learning techniques through an ablation study that depicts the importance of the network's temporal and transfer learning components. The generalizability of our system to different surgical setups and procedures was also evaluated qualitatively on in-vivo data of gastric endoscopy and ex-vivo porcine data (SERV-CT, SCARED). We also assess the effect of our method in computer vision tasks that underpin 3D reconstruction and camera motion estimation, namely stereo disparity, optical flow, and sparse point feature matching. These are evaluated quantitatively and qualitatively and results show a positive effect of specular highlight inpainting on these tasks in a novel comprehensive analysis

    Application of Multi-Sensor Fusion Technology in Target Detection and Recognition

    Get PDF
    Application of multi-sensor fusion technology has drawn a lot of industrial and academic interest in recent years. The multi-sensor fusion methods are widely used in many applications, such as autonomous systems, remote sensing, video surveillance, and the military. These methods can obtain the complementary properties of targets by considering multiple sensors. On the other hand, they can achieve a detailed environment description and accurate detection of interest targets based on the information from different sensors.This book collects novel developments in the field of multi-sensor, multi-source, and multi-process information fusion. Articles are expected to emphasize one or more of the three facets: architectures, algorithms, and applications. Published papers dealing with fundamental theoretical analyses, as well as those demonstrating their application to real-world problems

    Image-Based Bidirectional Reflectance Distribution Function of Human Skin in the Visible and Near Infrared

    Get PDF
    Human detection is an important first step in locating and tracking people in many missions including SAR and ISR operations. Recent detection systems utilize hyperspectral and multispectral technology to increase the acquired spectral content in imagery and subsequently better identify targets. This research demonstrates human detection through a multispectral skin detection system to exploit the unique optical properties of human skin. At wavelengths in the VIS and NIR regions of the electromagnetic spectrum, an individual can be identified by their unique skin parameters. Current detection methods base the skin pixel selection criteria on a diffuse skin reflectance model; however, it can be observed that human skin exhibits a combination of specular and diffuse reflectance. The objective of this effort is to better characterize human skin reflectance by collecting image-based BRDF skin measurements for future model incorporation in the existing multispectral skin detection system. Integrating multispectral BRDF data should reduce misdetections and better describe skin reflectance as a function of illumination source, target, and detector orientation

    Segmentation d'images et suivi d'objets en vidéos approches par estimation, sélection de caractéristiques et contours actifs

    Get PDF
    Cette thèse aborde deux problèmes parmi les plus importants et les plus complexes dans la vision artificielle, qui sont la segmentation d'images et le suivi d'objets dans les vidéos. Nous proposons plusieurs approches, traitant de ces deux problèmes, qui sont basées sur la modélisation variationnelle (contours actifs) et statistique. Ces approches ont pour but de surmonter différentes limites théoriques et pratiques (algorithmiques) de ces deux problèmes. En premier lieu, nous abordons le problème d'automatisation de la segmentation par contours actifs"ensembles de niveaux", et sa généralisation pour le cas de plusieurs régions. Pour cela, un modèle permettant d'estimer l'information de régions de manière automatique, et adaptative au contenu de l'image, est proposé. Ce modèle n'utilise aucune information a priori sur les régions, et traite également les images de couleur et de texture, avec un nombre arbitraire de régions. Nous introduisons ensuite une approche statistique pour estimer et intégrer la pertinence des caractéristiques et la sémantique dans la segmentation d'objets d'intérêt. En deuxième lieu, nous abordons le problème du suivi d'objets dans les vidéos en utilisant les contours actifs. Nous proposons pour cela deux modèles différents. Le premier suppose que les propriétés photométriques des objets suivis sont invariantes dans le temps, mais le modèle est capable de suivre des objets en présence de bruit, et au milieu de fonds de vidéos non-statiques et encombrés. Ceci est réalisé grâce à l'intégration de l'information de régions, de frontières et de formes des objets suivis. Le deuxième modèle permet de prendre en charge les variations photométriques des objets suivis, en utilisant un modèle statistique adaptatif à l'apparence de ces derniers. Finalement, nous proposons un nouveau modèle statistique, basé sur la Gaussienne généralisée, pour une représentation efficace de données bruitées et de grandes dimensions en segmentation. Ce modèle est utilisé pour assurer la robustesse de la segmentation des images de couleur contenant du bruit, ainsi que des objets en mouvement dans les vidéos (acquises par des caméras statiques) contenant de l'ombrage et/ou des changements soudains d'illumination
    corecore