121 research outputs found

    An UM-based silhouette-crease edge enhancement for noisy images

    Get PDF
    [[abstract]]This paper presents an improved unsharp masking (UM) technique that enhances the quality and suppresses noises for the images acquired from a noisy environment such as taken during night time. Our approach employs noise smoothing and the idea that important edges should be enhanced more than minor edges. Edges are classified as silhouette and crease edges (major and minor edges) according to their lengths. An adaptive weighting method is used to enhance the edges. In this way, the major edges (silhouette) are sharpened more than minor edges (crease). The proposed method is examined on night images as well as noisy images. It is also compared to existing UM-based methods with satisfying results.[[notice]]補正完

    Error propagation in pattern recognition systems: Impact of quality on fingerprint categorization

    Get PDF
    The aspect of quality in pattern classification has recently been explored in the context of biometric identification and authentication systems. The results presented in the literature indicate that incorporating information about quality of the input pattern leads to improved classification performance. The quality itself, however, can be defined in a number of ways, and its role in the various stages of pattern classification is often ambiguous or ad hoc. In this dissertation a more systematic approach to the incorporation of localized quality metrics into the pattern recognition process is developed for the specific task of fingerprint categorization. Quality is defined not as an intrinsic property of the image, but rather in terms of a set of defects introduced to it. A number of fingerprint images have been examined and the important quality defects have been identified and modeled in a mathematically tractable way. The models are flexible and can be used to generate synthetic images that can facilitate algorithm development and large scale, less time consuming performance testing. The effect of quality defects on various stages of the fingerprint recognition process are examined both analytically and empirically. For these defect models, it is shown that the uncertainty of parameter estimates, i.e. extracted fingerprint features, is the key quantity that can be calculated and propagated forward through the stages of the fingerprint classification process. Modified image processing techniques that explicitly utilize local quality metrics in the extraction of features useful in fingerprint classification, such as ridge orientation flow field, are presented and their performance is investigated

    Of assembling small sculptures and disassembling large geometry

    Get PDF
    This thesis describes the research results and contributions that have been achieved during the author’s doctoral work. It is divided into two independent parts, each of which is devoted to a particular research aspect. The first part covers the true-to-detail creation of digital pieces of art, so-called relief sculptures, from given 3D models. The main goal is to limit the depth of the contained objects with respect to a certain perspective without compromising the initial three-dimensional impression. Here, the preservation of significant features and especially their sharpness is crucial. Therefore, it is necessary to overemphasize fine surface details to ensure their perceptibility in the more complanate relief. Our developments are aimed at amending the flexibility and user-friendliness during the generation process. The main focus is on providing real-time solutions with intuitive usability that make it possible to create precise, lifelike and aesthetic results. These goals are reached by a GPU implementation, the use of efficient filtering techniques, and the replacement of user defined parameters by adaptive values. Our methods are capable of processing dynamic scenes and allow the generation of seamless artistic reliefs which can be composed of multiple elements. The second part addresses the analysis of repetitive structures, so-called symmetries, within very large data sets. The automatic recognition of components and their patterns is a complex correspondence problem which has numerous applications ranging from information visualization over compression to automatic scene understanding. Recent algorithms reach their limits with a growing amount of data, since their runtimes rise quadratically. Our aim is to make even massive data sets manageable. Therefore, it is necessary to abstract features and to develop a suitable, low-dimensional descriptor which ensures an efficient, robust, and purposive search. A simple inspection of the proximity within the descriptor space helps to significantly reduce the number of necessary pairwise comparisons. Our method scales quasi-linearly and allows a rapid analysis of data sets which could not be handled by prior approaches because of their size.Die vorgelegte Arbeit beschreibt die wissenschaftlichen Ergebnisse und Beiträge, die während der vergangenen Promotionsphase entstanden sind. Sie gliedert sich in zwei voneinander unabhängige Teile, von denen jeder einem eigenen Forschungsschwerpunkt gewidmet ist. Der erste Teil beschäftigt sich mit der detailgetreuen Erzeugung digitaler Kunstwerke, sogenannter Reliefplastiken, aus gegebenen 3D-Modellen. Das Ziel ist es, die Objekte, abhängig von der Perspektive, stark in ihrer Tiefe zu limitieren, ohne dass der Eindruck der räumlichen Ausdehnung verloren geht. Hierbei kommt dem Aufrechterhalten der Schärfe signifikanter Merkmale besondere Bedeutung zu. Dafür ist es notwendig, die feinen Details der Objektoberfläche überzubetonen, um ihre Sichtbarkeit im flacheren Relief zu gewährleisten. Unsere Weiterentwicklungen zielen auf die Verbesserung der Flexibilität und Benutzerfreundlichkeit während des Enstehungsprozesses ab. Der Fokus liegt dabei auf dem Bereitstellen intuitiv bedienbarer Echtzeitlösungen, die die Erzeugung präziser, naturgetreuer und visuell ansprechender Resultate ermöglichen. Diese Ziele werden durch eine GPU-Implementierung, den Einsatz effizienter Filtertechniken sowie das Ersetzen benutzergesteuerter Parameter durch adaptive Werte erreicht. Unsere Methoden erlauben das Verarbeiten dynamischer Szenen und die Erstellung nahtloser, kunstvoller Reliefs, die aus mehreren Elementen und Perspektiven zusammengesetzt sein können. Der zweite Teil behandelt die Analyse wiederkehrender Stukturen, sogenannter Symmetrien, innerhalb sehr großer Datensätze. Das automatische Erkennen von Komponenten und deren Muster ist ein komplexes Korrespondenzproblem mit zahlreichen Anwendungen, von der Informationsvisualisierung über Kompression bis hin zum automatischen Verstehen. Mit zunehmender Datenmenge geraten die etablierten Algorithmen an ihre Grenzen, da ihre Laufzeiten quadratisch ansteigen. Unser Ziel ist es, auch massive Datensätze handhabbar zu machen. Dazu ist es notwendig, Merkmale zu abstrahieren und einen passenden niedrigdimensionalen Deskriptor zu entwickeln, der eine effiziente, robuste und zielführende Suche erlaubt. Eine simple Betrachtung der Nachbarschaft innerhalb der Deskriptoren hilft dabei, die Anzahl notwendiger paarweiser Vergleiche signifikant zu reduzieren. Unser Verfahren skaliert quasi-linear und ermöglicht somit eine rasche Auswertung auch auf Daten, die für bisherige Methoden zu groß waren

    Space Carving multi-view video plus depth sequences for representation and transmission of 3DTV and FTV contents

    Get PDF
    La vidéo 3D a suscité un intérêt croissant durant ces dernières années. Grâce au développement récent des écrans stéréoscopiques et auto-stéréoscopiques, la vidéo 3D fournit une sensation réaliste de profondeur à l'utilisateur et une navigation virtuelle autour de la scène observée. Cependant de nombreux défis techniques existent encore. Ces défis peuvent être liés à l'acquisition de la scène et à sa représentation d'une part ou à la transmission des données d'autre part. Dans le contexte de la représentation de scènes naturelles, de nombreux efforts ont été fournis afin de surmonter ces difficultés. Les méthodes proposées dans la littérature peuvent être basées image, géométrie ou faire appel à des représentations combinant image et géométrie. L'approche adoptée dans cette thèse consiste en une méthode hybride s'appuyant sur l'utilisation des séquences multi-vues plus profondeur MVD (Multi-view Video plus Depth) afin de conserver le photo-réalisme de la scène observée, combinée avec un modèle géométrique, à base de maillage triangulaire, renforçant ainsi la compacité de la représentation. Nous supposons que les cartes de profondeur des données MVD fournies sont fiables et que les caméras utilisées durant l'acquisition sont calibrées, les paramètres caméras sont donc connus, mais les images correspondantes ne sont pas nécessairement rectifiées. Nous considérerons ainsi le cas général où les caméras peuvent être parallèles ou convergentes. Les contributions de cette thèse sont les suivantes. D'abord, un schéma volumétrique dédié à la fusion des cartes de profondeur en une surface maillée est proposé. Ensuite, un nouveau schéma de plaquage de texture multi-vues est proposé. Finalement, nous abordons à l'issue ce ces deux étapes de modélisation, la transmission proprement dite et comparons les performances de notre schéma de modélisation avec un schéma basé sur le standard MPEG-MVC, état de l'art dans la compression de vidéos multi-vues.3D videos have witnessed a growing interest in the last few years. Due to the recent development ofstereoscopic and auto-stereoscopic displays, 3D videos provide a realistic depth perception to the user and allows a virtual navigation around the scene. Nevertheless, several technical challenges are still remaining. Such challenges are either related to scene acquisition and representation on the one hand or to data transmission on the other hand. In the context of natural scene representation, research activities have been strengthened worldwide in order to handle these issues. The proposed methods for scene representation can be image-based, geometry based or methods combining both image and geometry. In this thesis, we take advantage of image based representations, thanks to the use of Multi-view Video plus Depth representation, in order to preserve the photorealism of the observed scene, and geometric based representations in order to enforce the compactness ofthe proposed scene representation. We assume the provided depth maps to be reliable.Besides, the considered cameras are calibrated so that the cameras parameters are known but thecorresponding images are not necessarily rectified. We consider, therefore, the general framework where cameras can be either convergent or parallel. The contributions of this thesis are the following. First, a new volumetric framework is proposed in order to mergethe input depth maps into a single and compact surface mesh. Second, a new algorithm for multi-texturing the surface mesh is proposed. Finally, we address the transmission issue and compare the performance of the proposed modeling scheme with the current standard MPEG-MVC, that is the state of the art of multi-view video compression.RENNES-INSA (352382210) / SudocSudocFranceF

    To err is human? A functional comparison of human and machine decision-making

    Get PDF
    It is hard to imagine what a world without objects would look like. While being able to rapidly recognise objects seems deceptively simple to humans, it has long proven challenging for machines, constituting a major roadblock towards real-world applications. This has changed with recent advances in deep learning: Today, modern deep neural networks (DNNs) often achieve human-level object recognition performance. However, their complexity makes it notoriously hard to understand how they arrive at a decision, which carries the risk that machine learning applications outpace our understanding of machine decisions - without knowing when machines will fail, and why; when machines will be biased, and why; when machines will be successful, and why. We here seek to develop a better understanding of machine decision-making by comparing it to human decision-making. Most previous investigations have compared intermediate representations (such as network activations to neural firing patterns), but ultimately, a machine's behaviour (or output decision) has the most direct relevance: humans are affected by machine decisions, not by "machine thoughts". Therefore, the focus of this thesis and its six constituent projects (P1-P6) is a functional comparison of human and machine decision-making. This is achieved by transferring methods from human psychophysics - a field with a proven track record of illuminating complex visual systems - to modern machine learning. The starting point of our investigations is a simple question: How do DNNs recognise objects, by texture or by shape? Following behavioural experiments with cue-conflict stimuli, we show that the textbook explanation of machine object recognition - an increasingly complex hierarchy based on object parts and shapes - is inaccurate. Instead, standard DNNs simply exploit local image textures (P1). Intriguingly, this difference between humans and DNNs can be overcome through data augmentation: Training DNNs on a suitable dataset induces a human-like shape bias and leads to emerging human-level distortion robustness in DNNs, enabling them to cope with unseen types of image corruptions much better than any previously tested model. Motivated by the finding that texture bias is pervasive throughout object classification and object detection (P2), we then develop "error consistency". Error consistency is an analysis to understand how machine decisions differ from one another depending on, for instance, model architecture or training objective. This analysis reveals remarkable similarities between feedforward vs. recurrent (P3) and supervised vs. self-supervised models (P4). At the same time, DNNs show little consistency with human observers, reinforcing our finding of fundamentally different decision-making between humans and machines. In the light of these results, we then take a step back, asking where these differences may originate from. We find that many DNN shortcomings can be seen as symptoms of the same underlying pattern: "shortcut learning", a tendency to exploit unintended patterns that fail to generalise to unexpected input (P5). While shortcut learning accounts for many functional differences between human and machine perception, some of them can be overcome: In our last investigation, a large-scale behavioural comparison, toolbox and benchmark (P6), we report partial success in closing the gap between human and machine vision. Taken together our findings indicate that our understanding of machine decision-making is riddled with (often untested) assumptions. Putting these on a solid empirical footing, as done here through rigorous quantitative experiments and functional comparisons with human decision-making, is key: for when humans better understand machines, we will be able to build machines that better understand humans - and the world we all share

    Interlacing Self-Localization, Moving Object Tracking and Mapping for 3D Range Sensors

    Get PDF
    This work presents a solution for autonomous vehicles to detect arbitrary moving traffic participants and to precisely determine the motion of the vehicle. The solution is based on three-dimensional images captured with modern range sensors like e.g. high-resolution laser scanners. As result, objects are tracked and a detailed 3D model is built for each object and for the static environment. The performance is demonstrated in challenging urban environments that contain many different objects

    Multi-scale active shape description in medical imaging

    Get PDF
    Shape description in medical imaging has become an increasingly important research field in recent years. Fast and high-resolution image acquisition methods like Magnetic Resonance (MR) imaging produce very detailed cross-sectional images of the human body - shape description is then a post-processing operation which abstracts quantitative descriptions of anatomically relevant object shapes. This task is usually performed by clinicians and other experts by first segmenting the shapes of interest, and then making volumetric and other quantitative measurements. High demand on expert time and inter- and intra-observer variability impose a clinical need of automating this process. Furthermore, recent studies in clinical neurology on the correspondence between disease status and degree of shape deformations necessitate the use of more sophisticated, higher-level shape description techniques. In this work a new hierarchical tool for shape description has been developed, combining two recently developed and powerful techniques in image processing: differential invariants in scale-space, and active contour models. This tool enables quantitative and qualitative shape studies at multiple levels of image detail, exploring the extra image scale degree of freedom. Using scale-space continuity, the global object shape can be detected at a coarse level of image detail, and finer shape characteristics can be found at higher levels of detail or scales. New methods for active shape evolution and focusing have been developed for the extraction of shapes at a large set of scales using an active contour model whose energy function is regularized with respect to scale and geometric differential image invariants. The resulting set of shapes is formulated as a multiscale shape stack which is analysed and described for each scale level with a large set of shape descriptors to obtain and analyse shape changes across scales. This shape stack leads naturally to several questions in regard to variable sampling and appropriate levels of detail to investigate an image. The relationship between active contour sampling precision and scale-space is addressed. After a thorough review of modem shape description, multi-scale image processing and active contour model techniques, the novel framework for multi-scale active shape description is presented and tested on synthetic images and medical images. An interesting result is the recovery of the fractal dimension of a known fractal boundary using this framework. Medical applications addressed are grey-matter deformations occurring for patients with epilepsy, spinal cord atrophy for patients with Multiple Sclerosis, and cortical impairment for neonates. Extensions to non-linear scale-spaces, comparisons to binary curve and curvature evolution schemes as well as other hierarchical shape descriptors are discussed

    QUIS-CAMPI: Biometric Recognition in Surveillance Scenarios

    Get PDF
    The concerns about individuals security have justified the increasing number of surveillance cameras deployed both in private and public spaces. However, contrary to popular belief, these devices are in most cases used solely for recording, instead of feeding intelligent analysis processes capable of extracting information about the observed individuals. Thus, even though video surveillance has already proved to be essential for solving multiple crimes, obtaining relevant details about the subjects that took part in a crime depends on the manual inspection of recordings. As such, the current goal of the research community is the development of automated surveillance systems capable of monitoring and identifying subjects in surveillance scenarios. Accordingly, the main goal of this thesis is to improve the performance of biometric recognition algorithms in data acquired from surveillance scenarios. In particular, we aim at designing a visual surveillance system capable of acquiring biometric data at a distance (e.g., face, iris or gait) without requiring human intervention in the process, as well as devising biometric recognition methods robust to the degradation factors resulting from the unconstrained acquisition process. Regarding the first goal, the analysis of the data acquired by typical surveillance systems shows that large acquisition distances significantly decrease the resolution of biometric samples, and thus their discriminability is not sufficient for recognition purposes. In the literature, diverse works point out Pan Tilt Zoom (PTZ) cameras as the most practical way for acquiring high-resolution imagery at a distance, particularly when using a master-slave configuration. In the master-slave configuration, the video acquired by a typical surveillance camera is analyzed for obtaining regions of interest (e.g., car, person) and these regions are subsequently imaged at high-resolution by the PTZ camera. Several methods have already shown that this configuration can be used for acquiring biometric data at a distance. Nevertheless, these methods failed at providing effective solutions to the typical challenges of this strategy, restraining its use in surveillance scenarios. Accordingly, this thesis proposes two methods to support the development of a biometric data acquisition system based on the cooperation of a PTZ camera with a typical surveillance camera. The first proposal is a camera calibration method capable of accurately mapping the coordinates of the master camera to the pan/tilt angles of the PTZ camera. The second proposal is a camera scheduling method for determining - in real-time - the sequence of acquisitions that maximizes the number of different targets obtained, while minimizing the cumulative transition time. In order to achieve the first goal of this thesis, both methods were combined with state-of-the-art approaches of the human monitoring field to develop a fully automated surveillance capable of acquiring biometric data at a distance and without human cooperation, designated as QUIS-CAMPI system. The QUIS-CAMPI system is the basis for pursuing the second goal of this thesis. The analysis of the performance of the state-of-the-art biometric recognition approaches shows that these approaches attain almost ideal recognition rates in unconstrained data. However, this performance is incongruous with the recognition rates observed in surveillance scenarios. Taking into account the drawbacks of current biometric datasets, this thesis introduces a novel dataset comprising biometric samples (face images and gait videos) acquired by the QUIS-CAMPI system at a distance ranging from 5 to 40 meters and without human intervention in the acquisition process. This set allows to objectively assess the performance of state-of-the-art biometric recognition methods in data that truly encompass the covariates of surveillance scenarios. As such, this set was exploited for promoting the first international challenge on biometric recognition in the wild. This thesis describes the evaluation protocols adopted, along with the results obtained by the nine methods specially designed for this competition. In addition, the data acquired by the QUIS-CAMPI system were crucial for accomplishing the second goal of this thesis, i.e., the development of methods robust to the covariates of surveillance scenarios. The first proposal regards a method for detecting corrupted features in biometric signatures inferred by a redundancy analysis algorithm. The second proposal is a caricature-based face recognition approach capable of enhancing the recognition performance by automatically generating a caricature from a 2D photo. The experimental evaluation of these methods shows that both approaches contribute to improve the recognition performance in unconstrained data.A crescente preocupação com a segurança dos indivíduos tem justificado o crescimento do número de câmaras de vídeo-vigilância instaladas tanto em espaços privados como públicos. Contudo, ao contrário do que normalmente se pensa, estes dispositivos são, na maior parte dos casos, usados apenas para gravação, não estando ligados a nenhum tipo de software inteligente capaz de inferir em tempo real informações sobre os indivíduos observados. Assim, apesar de a vídeo-vigilância ter provado ser essencial na resolução de diversos crimes, o seu uso está ainda confinado à disponibilização de vídeos que têm que ser manualmente inspecionados para extrair informações relevantes dos sujeitos envolvidos no crime. Como tal, atualmente, o principal desafio da comunidade científica é o desenvolvimento de sistemas automatizados capazes de monitorizar e identificar indivíduos em ambientes de vídeo-vigilância. Esta tese tem como principal objetivo estender a aplicabilidade dos sistemas de reconhecimento biométrico aos ambientes de vídeo-vigilância. De forma mais especifica, pretende-se 1) conceber um sistema de vídeo-vigilância que consiga adquirir dados biométricos a longas distâncias (e.g., imagens da cara, íris, ou vídeos do tipo de passo) sem requerer a cooperação dos indivíduos no processo; e 2) desenvolver métodos de reconhecimento biométrico robustos aos fatores de degradação inerentes aos dados adquiridos por este tipo de sistemas. No que diz respeito ao primeiro objetivo, a análise aos dados adquiridos pelos sistemas típicos de vídeo-vigilância mostra que, devido à distância de captura, os traços biométricos amostrados não são suficientemente discriminativos para garantir taxas de reconhecimento aceitáveis. Na literatura, vários trabalhos advogam o uso de câmaras Pan Tilt Zoom (PTZ) para adquirir imagens de alta resolução à distância, principalmente o uso destes dispositivos no modo masterslave. Na configuração master-slave um módulo de análise inteligente seleciona zonas de interesse (e.g. carros, pessoas) a partir do vídeo adquirido por uma câmara de vídeo-vigilância e a câmara PTZ é orientada para adquirir em alta resolução as regiões de interesse. Diversos métodos já mostraram que esta configuração pode ser usada para adquirir dados biométricos à distância, ainda assim estes não foram capazes de solucionar alguns problemas relacionados com esta estratégia, impedindo assim o seu uso em ambientes de vídeo-vigilância. Deste modo, esta tese propõe dois métodos para permitir a aquisição de dados biométricos em ambientes de vídeo-vigilância usando uma câmara PTZ assistida por uma câmara típica de vídeo-vigilância. O primeiro é um método de calibração capaz de mapear de forma exata as coordenadas da câmara master para o ângulo da câmara PTZ (slave) sem o auxílio de outros dispositivos óticos. O segundo método determina a ordem pela qual um conjunto de sujeitos vai ser observado pela câmara PTZ. O método proposto consegue determinar em tempo-real a sequência de observações que maximiza o número de diferentes sujeitos observados e simultaneamente minimiza o tempo total de transição entre sujeitos. De modo a atingir o primeiro objetivo desta tese, os dois métodos propostos foram combinados com os avanços alcançados na área da monitorização de humanos para assim desenvolver o primeiro sistema de vídeo-vigilância completamente automatizado e capaz de adquirir dados biométricos a longas distâncias sem requerer a cooperação dos indivíduos no processo, designado por sistema QUIS-CAMPI. O sistema QUIS-CAMPI representa o ponto de partida para iniciar a investigação relacionada com o segundo objetivo desta tese. A análise do desempenho dos métodos de reconhecimento biométrico do estado-da-arte mostra que estes conseguem obter taxas de reconhecimento quase perfeitas em dados adquiridos sem restrições (e.g., taxas de reconhecimento maiores do que 99% no conjunto de dados LFW). Contudo, este desempenho não é corroborado pelos resultados observados em ambientes de vídeo-vigilância, o que sugere que os conjuntos de dados atuais não contêm verdadeiramente os fatores de degradação típicos dos ambientes de vídeo-vigilância. Tendo em conta as vulnerabilidades dos conjuntos de dados biométricos atuais, esta tese introduz um novo conjunto de dados biométricos (imagens da face e vídeos do tipo de passo) adquiridos pelo sistema QUIS-CAMPI a uma distância máxima de 40m e sem a cooperação dos sujeitos no processo de aquisição. Este conjunto permite avaliar de forma objetiva o desempenho dos métodos do estado-da-arte no reconhecimento de indivíduos em imagens/vídeos capturados num ambiente real de vídeo-vigilância. Como tal, este conjunto foi utilizado para promover a primeira competição de reconhecimento biométrico em ambientes não controlados. Esta tese descreve os protocolos de avaliação usados, assim como os resultados obtidos por 9 métodos especialmente desenhados para esta competição. Para além disso, os dados adquiridos pelo sistema QUIS-CAMPI foram essenciais para o desenvolvimento de dois métodos para aumentar a robustez aos fatores de degradação observados em ambientes de vídeo-vigilância. O primeiro é um método para detetar características corruptas em assinaturas biométricas através da análise da redundância entre subconjuntos de características. O segundo é um método de reconhecimento facial baseado em caricaturas automaticamente geradas a partir de uma única foto do sujeito. As experiências realizadas mostram que ambos os métodos conseguem reduzir as taxas de erro em dados adquiridos de forma não controlada
    • …
    corecore