9 research outputs found

    Three-dimensional block matching using orthonormal tree-structured haar transform for multichannel images

    Get PDF
    Multichannel images, i.e., images of the same object or scene taken in different spectral bands or with different imaging modalities/settings, are common in many applications. For example, multispectral images contain several wavelength bands and hence, have richer information than color images. Multichannel magnetic resonance imaging and multichannel computed tomography images are common in medical imaging diagnostics, and multimodal images are also routinely used in art investigation. All the methods for grayscale images can be applied to multichannel images by processing each channel/band separately. However, it requires vast computational time, especially for the task of searching for overlapping patches similar to a given query patch. To address this problem, we propose a three-dimensional orthonormal tree-structured Haar transform (3D-OTSHT) targeting fast full search equivalent for three-dimensional block matching in multichannel images. The use of a three-dimensional integral image significantly saves time to obtain the 3D-OTSHT coefficients. We demonstrate superior performance of the proposed block matching

    Robust Visual Correspondence: Theory and Applications

    Get PDF
    Visual correspondence represents one of the most important tasks in computer vision. Given two sets of pixels (i.e. two images), it aims at finding corresponding pixel pairs belonging to the two sets (homologous pixels). As a matter of fact, visual correspondence is commonly employed in fields such as stereo correspondence, change detection, image registration, motion estimation, pattern matching, image vector quantization. The visual correspondence task can be extremely challenging in presence of disturbance factors which typically affect images. A common source of disturbances can be related to photometric distortions between the images under comparison. These can be ascribed to the camera sensors employed in the image acquisition process (due to dynamic variations of camera parameters such as auto-exposure and auto-gain, or to the use of different cameras), or can be induced by external factors such as changes of the amount of light emitted by the sources or viewing of non-lambertian surfaces at different angles. All of these factors tend to produce brightness changes in corresponding pixels of the two images that can not be neglected in real applications implying visual correspondence between images acquired from different spatial points (e.g. stereo vision) and/or different time instants (e.g. pattern matching, change detection). In addition to photometric distortions, differences between corresponding pixels can also be due to the noise introduced by camera sensors. Finally, the acquisition of images from different spatial points or different time instants can also induce occlusions. Evaluation assessments have also been proposed which compared visual correspondence approaches for tasks such as stereo correspondence (Chambon & Crouzil, 2003), image registration (Zitova & Flusser, 2003) and image motion (Giachetti, 2000)

    Matchability prediction for full-search template matching algorithms

    Get PDF
    While recent approaches have shown that it is possible to do template matching by exhaustively scanning the parameter space, the resulting algorithms are still quite demanding. In this paper we alleviate the computational load of these algorithms by proposing an efficient approach for predicting the match ability of a template, before it is actually performed. This avoids large amounts of unnecessary computations. We learn the match ability of templates by using dense convolutional neural network descriptors that do not require ad-hoc criteria to characterize a template. By using deep learning descriptions of patches we are able to predict match ability over the whole image quite reliably. We will also show how no specific training data is required to solve problems like panorama stitching in which you usually require data from the scene in question. Due to the highly parallelizable nature of this tasks we offer an efficient technique with a negligible computational cost at test time.Peer ReviewedPostprint (author's final draft

    1 Full search-equivalent pattern matching with Incremental Dissimilarity Approximations

    No full text
    This paper proposes a novel method for fast pattern matching based on dissimilarity functions derived from the Lp norm, such as the Sum of Squared Differences (SSD) and the Sum of Absolute Differences (SAD). The proposed method is full-search equivalent, i.e. it yields the same results as the Full Search (FS) algorithm. In order to pursue computational savings the method deploys a succession of increasingly tighter lower bounds of the adopted L p norm-based dissimilarity function. Such bounding functions allow for establishing a hierarchy of pruning conditions aimed at skipping rapidly those candidates that cannot satisfy the matching criterion. The paper includes an experimental comparison between the proposed method and other full-search equivalent approaches known in literature, which proves the remarkable computational efficiency of our proposal. I

    Deep learning for object detection in robotic grasping contexts

    Get PDF
    Dans la dernière décennie, les approches basées sur les réseaux de neurones convolutionnels sont devenus les standards pour la plupart des tâches en vision numérique. Alors qu'une grande partie des méthodes classiques de vision étaient basées sur des règles et algorithmes, les réseaux de neurones sont optimisés directement à partir de données d'entraînement qui sont étiquetées pour la tâche voulue. En pratique, il peut être difficile d'obtenir une quantité su sante de données d'entraînement ou d'interpréter les prédictions faites par les réseaux. Également, le processus d'entraînement doit être recommencé pour chaque nouvelle tâche ou ensemble d'objets. Au final, bien que très performantes, les solutions basées sur des réseaux de neurones peuvent être difficiles à mettre en place. Dans cette thèse, nous proposons des stratégies visant à contourner ou solutionner en partie ces limitations en contexte de détection d'instances d'objets. Premièrement, nous proposons d'utiliser une approche en cascade consistant à utiliser un réseau de neurone comme pré-filtrage d'une méthode standard de "template matching". Cette façon de faire nous permet d'améliorer les performances de la méthode de "template matching" tout en gardant son interprétabilité. Deuxièmement, nous proposons une autre approche en cascade. Dans ce cas, nous proposons d'utiliser un réseau faiblement supervisé pour générer des images de probabilité afin d'inférer la position de chaque objet. Cela permet de simplifier le processus d'entraînement et diminuer le nombre d'images d'entraînement nécessaires pour obtenir de bonnes performances. Finalement, nous proposons une architecture de réseau de neurones ainsi qu'une procédure d'entraînement permettant de généraliser un détecteur d'objets à des objets qui ne sont pas vus par le réseau lors de l'entraînement. Notre approche supprime donc la nécessité de réentraîner le réseau de neurones pour chaque nouvel objet.In the last decade, deep convolutional neural networks became a standard for computer vision applications. As opposed to classical methods which are based on rules and hand-designed features, neural networks are optimized and learned directly from a set of labeled training data specific for a given task. In practice, both obtaining sufficient labeled training data and interpreting network outputs can be problematic. Additionnally, a neural network has to be retrained for new tasks or new sets of objects. Overall, while they perform really well, deployment of deep neural network approaches can be challenging. In this thesis, we propose strategies aiming at solving or getting around these limitations for object detection. First, we propose a cascade approach in which a neural network is used as a prefilter to a template matching approach, allowing an increased performance while keeping the interpretability of the matching method. Secondly, we propose another cascade approach in which a weakly-supervised network generates object-specific heatmaps that can be used to infer their position in an image. This approach simplifies the training process and decreases the number of required training images to get state-of-the-art performances. Finally, we propose a neural network architecture and a training procedure allowing detection of objects that were not seen during training, thus removing the need to retrain networks for new objects

    GPUを用いた高解像度画像の実時間ステレオマッチングシステムの研究

    Get PDF
    筑波大学 (University of Tsukuba)201
    corecore