2,128 research outputs found

    A FLEXIBLE METHODOLOGY FOR OUTDOOR/INDOOR BUILDING RECONSTRUCTION FROM OCCLUDED POINT CLOUDS

    Get PDF
    Terrestrial Laser Scanning data are increasingly used in building survey not only in cultural heritage domain but also for as-built modelling of large and medium size civil structures. However, raw point clouds derived from laser scanning generally not directly ready for the generation of such models. A time-consuming manual modelling phase has to be taken into account. In addition the large presence of occlusion and clutter may turn out in low-quality building models when state-of-the-art automatic modelling procedures are applied. This paper presents an automated procedure to convert raw point clouds into semantically-enriched building models. The developed method mainly focuses on a geometrical complexity typical of modern buildings with clear prevalence of planar features A characteristic of this methodology is the possibility to work with outdoor and indoor building environments. In order to operate under severe occlusions and clutter a couple of completion algorithms were designed to generate a plausible and reliable model. Finally, some examples of the developed modelling procedure are presented and discussed

    Robust pedestrian detection and tracking in crowded scenes

    Get PDF
    In this paper, a robust computer vision approach to detecting and tracking pedestrians in unconstrained crowded scenes is presented. Pedestrian detection is performed via a 3D clustering process within a region-growing framework. The clustering process avoids using hard thresholds by using bio-metrically inspired constraints and a number of plan view statistics. Pedestrian tracking is achieved by formulating the track matching process as a weighted bipartite graph and using a Weighted Maximum Cardinality Matching scheme. The approach is evaluated using both indoor and outdoor sequences, captured using a variety of different camera placements and orientations, that feature significant challenges in terms of the number of pedestrians present, their interactions and scene lighting conditions. The evaluation is performed against a manually generated groundtruth for all sequences. Results point to the extremely accurate performance of the proposed approach in all cases

    Video and Imaging, 2013-2016

    Get PDF

    A survey on generative adversarial networks for imbalance problems in computer vision tasks

    Get PDF
    Any computer vision application development starts off by acquiring images and data, then preprocessing and pattern recognition steps to perform a task. When the acquired images are highly imbalanced and not adequate, the desired task may not be achievable. Unfortunately, the occurrence of imbalance problems in acquired image datasets in certain complex real-world problems such as anomaly detection, emotion recognition, medical image analysis, fraud detection, metallic surface defect detection, disaster prediction, etc., are inevitable. The performance of computer vision algorithms can significantly deteriorate when the training dataset is imbalanced. In recent years, Generative Adversarial Neural Networks (GANs) have gained immense attention by researchers across a variety of application domains due to their capability to model complex real-world image data. It is particularly important that GANs can not only be used to generate synthetic images, but also its fascinating adversarial learning idea showed good potential in restoring balance in imbalanced datasets. In this paper, we examine the most recent developments of GANs based techniques for addressing imbalance problems in image data. The real-world challenges and implementations of synthetic image generation based on GANs are extensively covered in this survey. Our survey first introduces various imbalance problems in computer vision tasks and its existing solutions, and then examines key concepts such as deep generative image models and GANs. After that, we propose a taxonomy to summarize GANs based techniques for addressing imbalance problems in computer vision tasks into three major categories: 1. Image level imbalances in classification, 2. object level imbalances in object detection and 3. pixel level imbalances in segmentation tasks. We elaborate the imbalance problems of each group, and provide GANs based solutions in each group. Readers will understand how GANs based techniques can handle the problem of imbalances and boost performance of the computer vision algorithms

    Object recognition in infrared imagery using appearance-based methods

    Get PDF
    Abstract unavailable please refer to PD

    Multi-modal RGB–Depth–Thermal Human Body Segmentation

    Get PDF

    Contributions to region-based image and video analysis: feature aggregation, background subtraction and description constraining

    Full text link
    Tesis doctoral inédita leída en la Universidad Autónoma de Madrid, Escuela Politécnica Superior, Departamento de Tecnología Electrónica y de las Comunicaciones. Fecha de lectura: 22-01-2016Esta tesis tiene embargado el acceso al texto completo hasta el 22-07-2017The use of regions for image and video analysis has been traditionally motivated by their ability to diminish the number of processed units and hence, the number of required decisions. However, as we explore in this thesis, this is just one of the potential advantages that regions may provide. When dealing with regions, two description spaces may be differentiated: the decision space, on which regions are shaped—region segmentation—, and the feature space, on which regions are used for analysis—region-based applications—. These two spaces are highly related. The solutions taken on the decision space severely affect their performance in the feature space. Accordingly, in this thesis we propose contributions on both spaces. Regarding the contributions to region segmentation, these are two-fold. Firstly, we give a twist to a classical region segmentation technique, the Mean-Shift, by exploring new solutions to automatically set the spectral kernel bandwidth. Secondly, we propose a method to describe the micro-texture of a pixel neighbourhood by using an easily customisable filter-bank methodology—which is based on the discrete cosine transform (DCT)—. The rest of the thesis is devoted to describe region-based approaches to several highly topical issues in computer vision; two broad tasks are explored: background subtraction (BS) and local descriptors (LD). Concerning BS, regions are here used as complementary cues to refine pixel-based BS algorithms: by providing robust to illumination cues and by storing the background dynamics in a region-driven background modelling. Relating to LD, the region is here used to reshape the description area usually fixed for local descriptors. Region-masked versions of classical two-dimensional and three-dimensional local descriptions are designed. So-built descriptions are proposed for the task of object identification, under a novel neural-oriented strategy. Furthermore, a local description scheme based on a fuzzy use of the region membership is derived. This characterisation scheme has been geometrically adapted to account for projective deformations, providing a suitable tool for finding corresponding points in wide-baseline scenarios. Experiments have been conducted for every contribution, discussing the potential benefits and the limitations of the proposed schemes. In overall, obtained results suggest that the region—conditioned by successful aggregation processes—is a reliable and useful tool to extrapolate pixel-level results, diminish semantic noise, isolate significant object cues and constrain local descriptions. The methods and approaches described along this thesis present alternative or complementary solutions to pixel-based image processing.El uso de regiones para el análisis de imágenes y secuencias de video ha estado tradicionalmente motivado por su utilidad para disminuir el número de unidades de análisis y, por ende, el número de decisiones. En esta tesis evidenciamos que esta es sólo una de las muchas ventajas adheridas a la utilización de regiones. En el procesamiento por regiones deben distinguirse dos espacios de análisis: el espacio de decisión, en donde se construyen las regiones, y el espacio de características, donde se utilizan. Ambos espacios están altamente relacionados. Las soluciones diseñadas para la construcción de regiones en el espacio de decisión definen su utilidad en el espacio de análisis. Por este motivo, a lo largo de esta tesis estudiamos ambos espacios. En particular, proponemos dos contribuciones en la etapa de construcción de regiones. En la primera, revisitamos una técnica clásica, Mean-Shift, e introducimos un esquema para la selección automática del ancho de banda que permite estimar localmente la densidad de una determinada característica. En la segunda, utilizamos la transformada discreta del coseno para describir la variabilidad local en el entorno de un píxel. En el resto de la tesis exploramos soluciones en el espacio de características, en otras palabras, proponemos aplicaciones que se apoyan en la región para realizar el procesamiento. Dichas aplicaciones se centran en dos ramas candentes en el ámbito de la visión por computador: la segregación del frente por substracción del fondo y la descripción local de los puntos de una imagen. En la rama substracción de fondo, utilizamos las regiones como unidades de apoyo a los algoritmos basados exclusivamente en el análisis a nivel de píxel. En particular, mejoramos la robustez de estos algoritmos a los cambios locales de iluminación y al dinamismo del fondo. Para esta última técnica definimos un modelo de fondo completamente basado en regiones. Las contribuciones asociadas a la rama de descripción local están centradas en el uso de la región para definir, automáticamente, entornos de descripción alrededor de los puntos. En las aproximaciones existentes, estos entornos de descripción suelen ser de tamaño y forma fija. Como resultado de este procedimiento se establece el diseño de versiones enmascaradas de descriptores bidimensionales y tridimensionales. En el algoritmo desarrollado, organizamos los descriptores así diseñados en una estructura neuronal y los utilizamos para la identificación automática de objetos. Por otro lado, proponemos un esquema de descripción mediante asociación difusa de píxeles a regiones. Este entorno de descripción es transformado geométricamente para adaptarse a potenciales deformaciones proyectivas en entornos estéreo donde las cámaras están ampliamente separadas. Cada una de las aproximaciones desarrolladas se evalúa y discute, remarcando las ventajas e inconvenientes asociadas a su utilización. En general, los resultados obtenidos sugieren que la región, asumiendo que ha sido construida de manera exitosa, es una herramienta fiable y de utilidad para: extrapolar resultados a nivel de pixel, reducir el ruido semántico, aislar las características significativas de los objetos y restringir la descripción local de estas características. Los métodos y enfoques descritos a lo largo de esta tesis establecen soluciones alternativas o complementarias al análisis a nivel de píxelIt was partially supported by the Spanish Government trough its FPU grant program and the projects (TEC2007-65400 - SemanticVideo), (TEC2011-25995 Event Video) and (TEC2014-53176-R HAVideo); the European Commission (IST-FP6-027685 - Mesh); the Comunidad de Madrid (S-0505/TIC-0223 - ProMultiDis-CM) and the Spanish Administration Agency CENIT 2007-1007 (VISION)

    Dimensionality reduction and sparse representations in computer vision

    Get PDF
    The proliferation of camera equipped devices, such as netbooks, smartphones and game stations, has led to a significant increase in the production of visual content. This visual information could be used for understanding the environment and offering a natural interface between the users and their surroundings. However, the massive amounts of data and the high computational cost associated with them, encumbers the transfer of sophisticated vision algorithms to real life systems, especially ones that exhibit resource limitations such as restrictions in available memory, processing power and bandwidth. One approach for tackling these issues is to generate compact and descriptive representations of image data by exploiting inherent redundancies. We propose the investigation of dimensionality reduction and sparse representations in order to accomplish this task. In dimensionality reduction, the aim is to reduce the dimensions of the space where image data reside in order to allow resource constrained systems to handle them and, ideally, provide a more insightful description. This goal is achieved by exploiting the inherent redundancies that many classes of images, such as faces under different illumination conditions and objects from different viewpoints, exhibit. We explore the description of natural images by low dimensional non-linear models called image manifolds and investigate the performance of computer vision tasks such as recognition and classification using these low dimensional models. In addition to dimensionality reduction, we study a novel approach in representing images as a sparse linear combination of dictionary examples. We investigate how sparse image representations can be used for a variety of tasks including low level image modeling and higher level semantic information extraction. Using tools from dimensionality reduction and sparse representation, we propose the application of these methods in three hierarchical image layers, namely low-level features, mid-level structures and high-level attributes. Low level features are image descriptors that can be extracted directly from the raw image pixels and include pixel intensities, histograms, and gradients. In the first part of this work, we explore how various techniques in dimensionality reduction, ranging from traditional image compression to the recently proposed Random Projections method, affect the performance of computer vision algorithms such as face detection and face recognition. In addition, we discuss a method that is able to increase the spatial resolution of a single image, without using any training examples, according to the sparse representations framework. In the second part, we explore mid-level structures, including image manifolds and sparse models, produced by abstracting information from low-level features and offer compact modeling of high dimensional data. We propose novel techniques for generating more descriptive image representations and investigate their application in face recognition and object tracking. In the third part of this work, we propose the investigation of a novel framework for representing the semantic contents of images. This framework employs high level semantic attributes that aim to bridge the gap between the visual information of an image and its textual description by utilizing low level features and mid level structures. This innovative paradigm offers revolutionary possibilities including recognizing the category of an object from purely textual information without providing any explicit visual example

    Reconnaissance de visage robuste aux occultations

    Get PDF
    Face recognition is an important technology in computer vision, which often acts as an essential component in biometrics systems, HCI systems, access control systems, multimedia indexing applications, etc. Partial occlusion, which significantly changes the appearance of part of a face, cannot only cause large performance deterioration of face recognition, but also can cause severe security issues. In this thesis, we focus on the occlusion problem in automatic face recognition in non-controlled environments. Toward this goal, we propose a framework that consists of applying explicit occlusion analysis and processing to improve face recognition under different occlusion conditions. We demonstrate in this thesis that the proposed framework is more efficient than the methods based on non-explicit occlusion treatments from the literature. We identify two new types of facial occlusions, namely the sparse occlusion and dynamic occlusion. Solutions are presented to handle the identified occlusion problems in more advanced surveillance context. Recently, the emerging Kinect sensor has been successfully applied in many computer vision fields. We introduce this new sensor in the context of face recognition, particularly in presence of occlusions, and demonstrate its efficiency compared with traditional 2D cameras. Finally, we propose two approaches based on 2D and 3D to improve the baseline face recognition techniques. Improving the baseline methods can also have the positive impact on the recognition results when partial occlusion occurs.La reconnaissance faciale est une technologie importante en vision par ordinateur, avec un rôle central en biométrie, interface homme-machine, contrôle d’accès, indexation multimédia, etc. L’occultation partielle, qui change complétement l’apparence d’une partie du visage, ne provoque pas uniquement une dégradation des performances en reconnaissance faciale, mai peut aussi avoir des conséquences en termes de sécurité. Dans cette thèse, nous concentrons sur le problème des occultations en reconnaissance faciale en environnements non contrôlés. Nous proposons une séquence qui consiste à analyser de manière explicite les occultations et à fiabiliser la reconnaissance faciale soumises à diverses occultations. Nous montrons dans cette thèse que l’approche proposée est plus efficace que les méthodes de l’état de l’art opérant sans traitement explicite dédié aux occultations. Nous identifions deux nouveaux types d’occultations, à savoir éparses et dynamiques. Des solutions sont introduites pour gérer ces problèmes d’occultation nouvellement identifiés dans un contexte de vidéo surveillance avancé. Récemment, le nouveau capteur Kinect a été utilisé avec succès dans de nombreuses applications en vision par ordinateur. Nous introduisons ce nouveau capteur dans le contexte de la reconnaissance faciale, en particulier en présence d’occultations, et démontrons son efficacité par rapport aux caméras traditionnelles. Finalement, nous proposons deux approches basées 2D et 3D permettant d’améliorer les techniques de base en reconnaissance de visages. L’amélioration des méthodes de base peut alors générer un impact positif sur les résultats de reconnaissance en présence d’occultations
    corecore