129 research outputs found

    SFNet: Learning Object-aware Semantic Correspondence

    Get PDF
    We address the problem of semantic correspondence, that is, establishing a dense flow field between images depicting different instances of the same object or scene category. We propose to use images annotated with binary foreground masks and subjected to synthetic geometric deformations to train a convolutional neural network (CNN) for this task. Using these masks as part of the supervisory signal offers a good compromise between semantic flow methods, where the amount of training data is limited by the cost of manually selecting point correspondences, and semantic alignment ones, where the regression of a single global geometric transformation between images may be sensitive to image-specific details such as background clutter. We propose a new CNN architecture, dubbed SFNet, which implements this idea. It leverages a new and differentiable version of the argmax function for end-to-end training, with a loss that combines mask and flow consistency with smoothness terms. Experimental results demonstrate the effectiveness of our approach, which significantly outperforms the state of the art on standard benchmarks.Comment: cvpr 2019 oral pape

    DCTM: Discrete-Continuous Transformation Matching for Semantic Flow

    Full text link
    Techniques for dense semantic correspondence have provided limited ability to deal with the geometric variations that commonly exist between semantically similar images. While variations due to scale and rotation have been examined, there lack practical solutions for more complex deformations such as affine transformations because of the tremendous size of the associated solution space. To address this problem, we present a discrete-continuous transformation matching (DCTM) framework where dense affine transformation fields are inferred through a discrete label optimization in which the labels are iteratively updated via continuous regularization. In this way, our approach draws solutions from the continuous space of affine transformations in a manner that can be computed efficiently through constant-time edge-aware filtering and a proposed affine-varying CNN-based descriptor. Experimental results show that this model outperforms the state-of-the-art methods for dense semantic correspondence on various benchmarks

    Towards Robust and Accurate Image Registration by Incorporating Anatomical and Appearance Priors

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH

    Bending invariant correspondence matching on 3D models with feature descriptor.

    Get PDF
    Li, Sai Man.Thesis (M.Phil.)--Chinese University of Hong Kong, 2010.Includes bibliographical references (leaves 91-96).Abstracts in English and Chinese.Abstract --- p.2List of Figures --- p.6Acknowledgement --- p.10Chapter Chapter 1 --- Introduction --- p.11Chapter 1.1 --- Problem definition --- p.11Chapter 1.2. --- Proposed algorithm --- p.12Chapter 1.3. --- Main features --- p.14Chapter Chapter 2 --- Literature Review --- p.16Chapter 2.1 --- Local Feature Matching techniques --- p.16Chapter 2.2. --- Global Iterative alignment techniques --- p.19Chapter 2.3 --- Other Approaches --- p.20Chapter Chapter 3 --- Correspondence Matching --- p.21Chapter 3.1 --- Fundamental Techniques --- p.24Chapter 3.1.1 --- Geodesic Distance Approximation --- p.24Chapter 3.1.1.1 --- Dijkstra ´ةs algorithm --- p.25Chapter 3.1.1.2 --- Wavefront Propagation --- p.26Chapter 3.1.2 --- Farthest Point Sampling --- p.27Chapter 3.1.3 --- Curvature Estimation --- p.29Chapter 3.1.4 --- Radial Basis Function (RBF) --- p.32Chapter 3.1.5 --- Multi-dimensional Scaling (MDS) --- p.35Chapter 3.1.5.1 --- Classical MDS --- p.35Chapter 3.1.5.2 --- Fast MDS --- p.38Chapter 3.2 --- Matching Processes --- p.40Chapter 3.2.1 --- Posture Alignment --- p.42Chapter 3.2.1.1 --- Sign Flip Correction --- p.43Chapter 3.2.1.2 --- Input model Alignment --- p.49Chapter 3.2.2 --- Surface Fitting --- p.52Chapter 3.2.2.1 --- Optimizing Surface Fitness --- p.54Chapter 3.2.2.2 --- Optimizing Surface Smoothness --- p.56Chapter 3.2.3 --- Feature Matching Refinement --- p.59Chapter 3.2.3.1 --- Feature descriptor --- p.61Chapter 3.2.3.3 --- Feature Descriptor matching --- p.63Chapter Chapter 4 --- Experimental Result --- p.66Chapter 4.1 --- Result of the Fundamental Techniques --- p.66Chapter 4.1.1 --- Geodesic Distance Approximation --- p.67Chapter 4.1.2 --- Farthest Point Sampling (FPS) --- p.67Chapter 4.1.3 --- Radial Basis Function (RBF) --- p.69Chapter 4.1.4 --- Curvature Estimation --- p.70Chapter 4.1.5 --- Multi-Dimensional Scaling (MDS) --- p.71Chapter 4.2 --- Result of the Core Matching Processes --- p.73Chapter 4.2.1 --- Posture Alignment Step --- p.73Chapter 4.2.2 --- Surface Fitting Step --- p.78Chapter 4.2.3 --- Feature Matching Refinement --- p.82Chapter 4.2.4 --- Application of the proposed algorithm --- p.84Chapter 4.2.4.1 --- Design Automation in Garment Industry --- p.84Chapter 4.3 --- Analysis --- p.86Chapter 4.3.1 --- Performance --- p.86Chapter 4.3.2 --- Accuracy --- p.87Chapter 4.3.3 --- Approach Comparison --- p.88Chapter Chapter 5 --- Conclusion --- p.89Chapter 5.1 --- Strength and contributions --- p.89Chapter 5.2 --- Limitation and future works --- p.90References --- p.9

    Learning to Generate and Refine Object Proposals

    Get PDF
    Visual object recognition is a fundamental and challenging problem in computer vision. To build a practical recognition system, one is first confronted with high computation complexity due to an enormous search space from an image, which is caused by large variations in object appearance, pose and mutual occlusion, as well as other environmental factors. To reduce the search complexity, a moderate set of image regions that are likely to contain an object, regardless of its category, are usually first generated in modern object recognition subsystems. These possible object regions are called object proposals, object hypotheses or object candidates, which can be used for down-stream classification or global reasoning in many different vision tasks like object detection, segmentation and tracking, etc. This thesis addresses the problem of object proposal generation, including bounding box and segment proposal generation, in real-world scenarios. In particular, we investigate the representation learning in object proposal generation with 3D cues and contextual information, aiming to propose higher-quality object candidates which have higher object recall, better boundary coverage and lower number. We focus on three main issues: 1) how can we incorporate additional geometric and high-level semantic context information into the proposal generation for stereo images? 2) how do we generate object segment proposals for stereo images with learning representations and learning grouping process? and 3) how can we learn a context-driven representation to refine segment proposals efficiently? In this thesis, we propose a series of solutions to address each of the raised problems. We first propose a semantic context and depth-aware object proposal generation method. We design a set of new cues to encode the objectness, and then train an efficient random forest classifier to re-rank the initial proposals and linear regressors to fine-tune their locations. Next, we extend the task to the segment proposal generation in the same setting and develop a learning-based segment proposal generation method for stereo images. Our method makes use of learned deep features and designed geometric features to represent a region and learns a similarity network to guide the superpixel grouping process. We also learn a ranking network to predict the objectness score for each segment proposal. To address the third problem, we take a transformation-based approach to improve the quality of a given segment candidate pool based on context information. We propose an efficient deep network that learns affine transformations to warp an initial object mask towards nearby object region, based on a novel feature pooling strategy. Finally, we extend our affine warping approach to address the object-mask alignment problem and particularly the problem of refining a set of segment proposals. We design an end-to-end deep spatial transformer network that learns free-form deformations (FFDs) to non-rigidly warp the shape mask towards the ground truth, based on a multi-level dual mask feature pooling strategy. We evaluate all our approaches on several publicly available object recognition datasets and show superior performance

    Deformable Medical Image Registration: A Survey

    Get PDF
    Deformable image registration is a fundamental task in medical image processing. Among its most important applications, one may cite: i) multi-modality fusion, where information acquired by different imaging devices or protocols is fused to facilitate diagnosis and treatment planning; ii) longitudinal studies, where temporal structural or anatomical changes are investigated; and iii) population modeling and statistical atlases used to study normal anatomical variability. In this technical report, we attempt to give an overview of deformable registration methods, putting emphasis on the most recent advances in the domain. Additional emphasis has been given to techniques applied to medical images. In order to study image registration methods in depth, their main components are identified and studied independently. The most recent techniques are presented in a systematic fashion. The contribution of this technical report is to provide an extensive account of registration techniques in a systematic manner.Le recalage déformable d'images est une des tâches les plus fondamentales dans l'imagerie médicale. Parmi ses applications les plus importantes, on compte: i) la fusion d' information provenant des différents types de modalités a n de faciliter le diagnostic et la planification du traitement; ii) les études longitudinales, oú des changements structurels ou anatomiques sont étudiées en fonction du temps; et iii) la modélisation de la variabilité anatomique normale d'une population et les atlas statistiques. Dans ce rapport de recherche, nous essayons de donner un aperçu des différentes méthodes du recalage déformables, en mettant l'accent sur les avancées les plus récentes du domaine. Nous avons particulièrement insisté sur les techniques appliquées aux images médicales. A n d'étudier les méthodes du recalage d'images, leurs composants principales sont d'abord identifiés puis étudiées de manière indépendante, les techniques les plus récentes étant classifiées en suivant un schéma logique déterminé. La contribution de ce rapport de recherche est de fournir un compte rendu détaillé des techniques de recalage d'une manière systématique

    Indexing and Retrieval of 3D Articulated Geometry Models

    Get PDF
    In this PhD research study, we focus on building a content-based search engine for 3D articulated geometry models. 3D models are essential components in nowadays graphic applications, and are widely used in the game, animation and movies production industry. With the increasing number of these models, a search engine not only provides an entrance to explore such a huge dataset, it also facilitates sharing and reusing among different users. In general, it reduces production costs and time to develop these 3D models. Though a lot of retrieval systems have been proposed in recent years, search engines for 3D articulated geometry models are still in their infancies. Among all the works that we have surveyed, reliability and efficiency are the two main issues that hinder the popularity of such systems. In this research, we have focused our attention mainly to address these two issues. We have discovered that most existing works design features and matching algorithms in order to reflect the intrinsic properties of these 3D models. For instance, to handle 3D articulated geometry models, it is common to extract skeletons and use graph matching algorithms to compute the similarity. However, since this kind of feature representation is complex, it leads to high complexity of the matching algorithms. As an example, sub-graph isomorphism can be NP-hard for model graph matching. Our solution is based on the understanding that skeletal matching seeks correspondences between the two comparing models. If we can define descriptive features, the correspondence problem can be solved by bag-based matching where fast algorithms are available. In the first part of the research, we propose a feature extraction algorithm to extract such descriptive features. We then convert the skeletal matching problems into bag-based matching. We further define metric similarity measure so as to support fast search. We demonstrate the advantages of this idea in our experiments. The improvement on precision is 12\% better at high recall. The indexing search of 3D model is 24 times faster than the state of the art if only the first relevant result is returned. However, improving the quality of descriptive features pays the price of high dimensionality. Curse of dimensionality is a notorious problem on large multimedia databases. The computation time scales exponentially as the dimension increases, and indexing techniques may not be useful in such situation. In the second part of the research, we focus ourselves on developing an embedding retrieval framework to solve the high dimensionality problem. We first argue that our proposed matching method projects 3D models on manifolds. We then use manifold learning technique to reduce dimensionality and maximize intra-class distances. We further propose a numerical method to sub-sample and fast search databases. To preserve retrieval accuracy using fewer landmark objects, we propose an alignment method which is also beneficial to existing works for fast search. The advantages of the retrieval framework are demonstrated in our experiments that it alleviates the problem of curse of dimensionality. It also improves the efficiency (3.4 times faster) and accuracy (30\% more accurate) of our matching algorithm proposed above. In the third part of the research, we also study a closely related area, 3D motions. 3D motions are captured by sticking sensor on human beings. These captured data are real human motions that are used to animate 3D articulated geometry models. Creating realistic 3D motions is an expensive and tedious task. Although 3D motions are very different from 3D articulated geometry models, we observe that existing works also suffer from the problem of temporal structure matching. This also leads to low efficiency in the matching algorithms. We apply the same idea of bag-based matching into the work of 3D motions. From our experiments, the proposed method has a 13\% improvement on precision at high recall and is 12 times faster than existing works. As a summary, we have developed algorithms for 3D articulated geometry models and 3D motions, covering feature extraction, feature matching, indexing and fast search methods. Through various experiments, our idea of converting restricted matching to bag-based matching improves matching efficiency and reliability. These have been shown in both 3D articulated geometry models and 3D motions. We have also connected 3D matching to the area of manifold learning. The embedding retrieval framework not only improves efficiency and accuracy, but has also opened a new area of research

    Content Recognition and Context Modeling for Document Analysis and Retrieval

    Get PDF
    The nature and scope of available documents are changing significantly in many areas of document analysis and retrieval as complex, heterogeneous collections become accessible to virtually everyone via the web. The increasing level of diversity presents a great challenge for document image content categorization, indexing, and retrieval. Meanwhile, the processing of documents with unconstrained layouts and complex formatting often requires effective leveraging of broad contextual knowledge. In this dissertation, we first present a novel approach for document image content categorization, using a lexicon of shape features. Each lexical word corresponds to a scale and rotation invariant local shape feature that is generic enough to be detected repeatably and is segmentation free. A concise, structurally indexed shape lexicon is learned by clustering and partitioning feature types through graph cuts. Our idea finds successful application in several challenging tasks, including content recognition of diverse web images and language identification on documents composed of mixed machine printed text and handwriting. Second, we address two fundamental problems in signature-based document image retrieval. Facing continually increasing volumes of documents, detecting and recognizing unique, evidentiary visual entities (\eg, signatures and logos) provides a practical and reliable supplement to the OCR recognition of printed text. We propose a novel multi-scale framework to detect and segment signatures jointly from document images, based on the structural saliency under a signature production model. We formulate the problem of signature retrieval in the unconstrained setting of geometry-invariant deformable shape matching and demonstrate state-of-the-art performance in signature matching and verification. Third, we present a model-based approach for extracting relevant named entities from unstructured documents. In a wide range of applications that require structured information from diverse, unstructured document images, processing OCR text does not give satisfactory results due to the absence of linguistic context. Our approach enables learning of inference rules collectively based on contextual information from both page layout and text features. Finally, we demonstrate the importance of mining general web user behavior data for improving document ranking and other web search experience. The context of web user activities reveals their preferences and intents, and we emphasize the analysis of individual user sessions for creating aggregate models. We introduce a novel algorithm for estimating web page and web site importance, and discuss its theoretical foundation based on an intentional surfer model. We demonstrate that our approach significantly improves large-scale document retrieval performance

    Image Registration for Quantitative Parametric Response Mapping of Cancer Treatment Response

    Get PDF
    AbstractImaging biomarkers capable of early quantification of tumor response to therapy would provide an opportunity to individualize patient care. Image registration of longitudinal scans provides a method of detecting treatment-associated changes within heterogeneous tumors by monitoring alterations in the quantitative value of individual voxels over time, which is unattainable by traditional volumetric-based histogram methods. The concepts involved in the use of image registration for tracking and quantifying breast cancer treatment response using parametric response mapping (PRM), a voxel-based analysis of diffusion-weighted magnetic resonance imaging (DW-MRI) scans, are presented. Application of PRM to breast tumor response detection is described, wherein robust registration solutions for tracking small changes in water diffusivity in breast tumors during therapy are required. Methodologies that employ simulations are presented for measuring expected statistical accuracy of PRM for response assessment. Test-retest clinical scans are used to yield estimates of system noise to indicate significant changes in voxel-based changes in water diffusivity. Overall, registration-based PRM image analysis provides significant opportunities for voxel-based image analysis to provide the required accuracy for early assessment of response to treatment in breast cancer patients receiving neoadjuvant chemotherapy

    Variational methods for shape and image registrations.

    Get PDF
    Estimating and analysis of deformation, either rigid or non-rigid, is an active area of research in various medical imaging and computer vision applications. Its importance stems from the inherent inter- and intra-variability in biological and biomedical object shapes and from the dynamic nature of the scenes usually dealt with in computer vision research. For instance, quantifying the growth of a tumor, recognizing a person\u27s face, tracking a facial expression, or retrieving an object inside a data base require the estimation of some sort of motion or deformation undergone by the object of interest. To solve these problems, and other similar problems, registration comes into play. This is the process of bringing into correspondences two or more data sets. Depending on the application at hand, these data sets can be for instance gray scale/color images or objects\u27 outlines. In the latter case, one talks about shape registration while in the former case, one talks about image/volume registration. In some situations, the combinations of different types of data can be used complementarily to establish point correspondences. One of most important image analysis tools that greatly benefits from the process of registration, and which will be addressed in this dissertation, is the image segmentation. This process consists of localizing objects in images. Several challenges are encountered in image segmentation, including noise, gray scale inhomogeneities, and occlusions. To cope with such issues, the shape information is often incorporated as a statistical model into the segmentation process. Building such statistical models requires a good and accurate shape alignment approach. In addition, segmenting anatomical structures can be accurately solved through the registration of the input data set with a predefined anatomical atlas. Variational approaches for shape/image registration and segmentation have received huge interest in the past few years. Unlike traditional discrete approaches, the variational methods are based on continuous modelling of the input data through the use of Partial Differential Equations (PDE). This brings into benefit the extensive literature on theory and numerical methods proposed to solve PDEs. This dissertation addresses the registration problem from a variational point of view, with more focus on shape registration. First, a novel variational framework for global-to-local shape registration is proposed. The input shapes are implicitly represented through their signed distance maps. A new Sumof- Squared-Differences (SSD) criterion which measures the disparity between the implicit representations of the input shapes, is introduced to recover the global alignment parameters. This new criteria has the advantages over some existing ones in accurately handling scale variations. In addition, the proposed alignment model is less expensive computationally. Complementary to the global registration field, the local deformation field is explicitly established between the two globally aligned shapes, by minimizing a new energy functional. This functional incrementally and simultaneously updates the displacement field while keeping the corresponding implicit representation of the globally warped source shape as close to a signed distance function as possible. This is done under some regularization constraints that enforce the smoothness of the recovered deformations. The overall process leads to a set of coupled set of equations that are simultaneously solved through a gradient descent scheme. Several applications, where the developed tools play a major role, are addressed throughout this dissertation. For instance, some insight is given as to how one can solve the challenging problem of three dimensional face recognition in the presence of facial expressions. Statistical modelling of shapes will be presented as a way of benefiting from the proposed shape registration framework. Second, this dissertation will visit th
    corecore