1,952 research outputs found

    Object detection and activity recognition in digital image and video libraries

    Get PDF
    This thesis is a comprehensive study of object-based image and video retrieval, specifically for car and human detection and activity recognition purposes. The thesis focuses on the problem of connecting low level features to high level semantics by developing relational object and activity presentations. With the rapid growth of multimedia information in forms of digital image and video libraries, there is an increasing need for intelligent database management tools. The traditional text based query systems based on manual annotation process are impractical for today\u27s large libraries requiring an efficient information retrieval system. For this purpose, a hierarchical information retrieval system is proposed where shape, color and motion characteristics of objects of interest are captured in compressed and uncompressed domains. The proposed retrieval method provides object detection and activity recognition at different resolution levels from low complexity to low false rates. The thesis first examines extraction of low level features from images and videos using intensity, color and motion of pixels and blocks. Local consistency based on these features and geometrical characteristics of the regions is used to group object parts. The problem of managing the segmentation process is solved by a new approach that uses object based knowledge in order to group the regions according to a global consistency. A new model-based segmentation algorithm is introduced that uses a feedback from relational representation of the object. The selected unary and binary attributes are further extended for application specific algorithms. Object detection is achieved by matching the relational graphs of objects with the reference model. The major advantages of the algorithm can be summarized as improving the object extraction by reducing the dependence on the low level segmentation process and combining the boundary and region properties. The thesis then addresses the problem of object detection and activity recognition in compressed domain in order to reduce computational complexity. New algorithms for object detection and activity recognition in JPEG images and MPEG videos are developed. It is shown that significant information can be obtained from the compressed domain in order to connect to high level semantics. Since our aim is to retrieve information from images and videos compressed using standard algorithms such as JPEG and MPEG, our approach differentiates from previous compressed domain object detection techniques where the compression algorithms are governed by characteristics of object of interest to be retrieved. An algorithm is developed using the principal component analysis of MPEG motion vectors to detect the human activities; namely, walking, running, and kicking. Object detection in JPEG compressed still images and MPEG I frames is achieved by using DC-DCT coefficients of the luminance and chrominance values in the graph based object detection algorithm. The thesis finally addresses the problem of object detection in lower resolution and monochrome images. Specifically, it is demonstrated that the structural information of human silhouettes can be captured from AC-DCT coefficients

    WarpNet: Weakly Supervised Matching for Single-view Reconstruction

    Full text link
    We present an approach to matching images of objects in fine-grained datasets without using part annotations, with an application to the challenging problem of weakly supervised single-view reconstruction. This is in contrast to prior works that require part annotations, since matching objects across class and pose variations is challenging with appearance features alone. We overcome this challenge through a novel deep learning architecture, WarpNet, that aligns an object in one image with a different object in another. We exploit the structure of the fine-grained dataset to create artificial data for training this network in an unsupervised-discriminative learning approach. The output of the network acts as a spatial prior that allows generalization at test time to match real images across variations in appearance, viewpoint and articulation. On the CUB-200-2011 dataset of bird categories, we improve the AP over an appearance-only network by 13.6%. We further demonstrate that our WarpNet matches, together with the structure of fine-grained datasets, allow single-view reconstructions with quality comparable to using annotated point correspondences.Comment: to appear in IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 201

    Bending invariant correspondence matching on 3D models with feature descriptor.

    Get PDF
    Li, Sai Man.Thesis (M.Phil.)--Chinese University of Hong Kong, 2010.Includes bibliographical references (leaves 91-96).Abstracts in English and Chinese.Abstract --- p.2List of Figures --- p.6Acknowledgement --- p.10Chapter Chapter 1 --- Introduction --- p.11Chapter 1.1 --- Problem definition --- p.11Chapter 1.2. --- Proposed algorithm --- p.12Chapter 1.3. --- Main features --- p.14Chapter Chapter 2 --- Literature Review --- p.16Chapter 2.1 --- Local Feature Matching techniques --- p.16Chapter 2.2. --- Global Iterative alignment techniques --- p.19Chapter 2.3 --- Other Approaches --- p.20Chapter Chapter 3 --- Correspondence Matching --- p.21Chapter 3.1 --- Fundamental Techniques --- p.24Chapter 3.1.1 --- Geodesic Distance Approximation --- p.24Chapter 3.1.1.1 --- Dijkstra ´ةs algorithm --- p.25Chapter 3.1.1.2 --- Wavefront Propagation --- p.26Chapter 3.1.2 --- Farthest Point Sampling --- p.27Chapter 3.1.3 --- Curvature Estimation --- p.29Chapter 3.1.4 --- Radial Basis Function (RBF) --- p.32Chapter 3.1.5 --- Multi-dimensional Scaling (MDS) --- p.35Chapter 3.1.5.1 --- Classical MDS --- p.35Chapter 3.1.5.2 --- Fast MDS --- p.38Chapter 3.2 --- Matching Processes --- p.40Chapter 3.2.1 --- Posture Alignment --- p.42Chapter 3.2.1.1 --- Sign Flip Correction --- p.43Chapter 3.2.1.2 --- Input model Alignment --- p.49Chapter 3.2.2 --- Surface Fitting --- p.52Chapter 3.2.2.1 --- Optimizing Surface Fitness --- p.54Chapter 3.2.2.2 --- Optimizing Surface Smoothness --- p.56Chapter 3.2.3 --- Feature Matching Refinement --- p.59Chapter 3.2.3.1 --- Feature descriptor --- p.61Chapter 3.2.3.3 --- Feature Descriptor matching --- p.63Chapter Chapter 4 --- Experimental Result --- p.66Chapter 4.1 --- Result of the Fundamental Techniques --- p.66Chapter 4.1.1 --- Geodesic Distance Approximation --- p.67Chapter 4.1.2 --- Farthest Point Sampling (FPS) --- p.67Chapter 4.1.3 --- Radial Basis Function (RBF) --- p.69Chapter 4.1.4 --- Curvature Estimation --- p.70Chapter 4.1.5 --- Multi-Dimensional Scaling (MDS) --- p.71Chapter 4.2 --- Result of the Core Matching Processes --- p.73Chapter 4.2.1 --- Posture Alignment Step --- p.73Chapter 4.2.2 --- Surface Fitting Step --- p.78Chapter 4.2.3 --- Feature Matching Refinement --- p.82Chapter 4.2.4 --- Application of the proposed algorithm --- p.84Chapter 4.2.4.1 --- Design Automation in Garment Industry --- p.84Chapter 4.3 --- Analysis --- p.86Chapter 4.3.1 --- Performance --- p.86Chapter 4.3.2 --- Accuracy --- p.87Chapter 4.3.3 --- Approach Comparison --- p.88Chapter Chapter 5 --- Conclusion --- p.89Chapter 5.1 --- Strength and contributions --- p.89Chapter 5.2 --- Limitation and future works --- p.90References --- p.9

    Towards precise completion of deformable shapes

    Get PDF
    According to Aristotle, “the whole is greater than the sum of its parts”. This statement was adopted to explain human perception by the Gestalt psychology school of thought in the twentieth century. Here, we claim that when observing a part of an object which was previously acquired as a whole, one could deal with both partial correspondence and shape completion in a holistic manner. More specifically, given the geometry of a full, articulated object in a given pose, as well as a partial scan of the same object in a different pose, we address the new problem of matching the part to the whole while simultaneously reconstructing the new pose from its partial observation. Our approach is data-driven and takes the form of a Siamese autoencoder without the requirement of a consistent vertex labeling at inference time; as such, it can be used on unorganized point clouds as well as on triangle meshes. We demonstrate the practical effectiveness of our model in the applications of single-view deformable shape completion and dense shape correspondence, both on synthetic and real-world geometric data, where we outperform prior work by a large margin

    Indexing and Retrieval of 3D Articulated Geometry Models

    Get PDF
    In this PhD research study, we focus on building a content-based search engine for 3D articulated geometry models. 3D models are essential components in nowadays graphic applications, and are widely used in the game, animation and movies production industry. With the increasing number of these models, a search engine not only provides an entrance to explore such a huge dataset, it also facilitates sharing and reusing among different users. In general, it reduces production costs and time to develop these 3D models. Though a lot of retrieval systems have been proposed in recent years, search engines for 3D articulated geometry models are still in their infancies. Among all the works that we have surveyed, reliability and efficiency are the two main issues that hinder the popularity of such systems. In this research, we have focused our attention mainly to address these two issues. We have discovered that most existing works design features and matching algorithms in order to reflect the intrinsic properties of these 3D models. For instance, to handle 3D articulated geometry models, it is common to extract skeletons and use graph matching algorithms to compute the similarity. However, since this kind of feature representation is complex, it leads to high complexity of the matching algorithms. As an example, sub-graph isomorphism can be NP-hard for model graph matching. Our solution is based on the understanding that skeletal matching seeks correspondences between the two comparing models. If we can define descriptive features, the correspondence problem can be solved by bag-based matching where fast algorithms are available. In the first part of the research, we propose a feature extraction algorithm to extract such descriptive features. We then convert the skeletal matching problems into bag-based matching. We further define metric similarity measure so as to support fast search. We demonstrate the advantages of this idea in our experiments. The improvement on precision is 12\% better at high recall. The indexing search of 3D model is 24 times faster than the state of the art if only the first relevant result is returned. However, improving the quality of descriptive features pays the price of high dimensionality. Curse of dimensionality is a notorious problem on large multimedia databases. The computation time scales exponentially as the dimension increases, and indexing techniques may not be useful in such situation. In the second part of the research, we focus ourselves on developing an embedding retrieval framework to solve the high dimensionality problem. We first argue that our proposed matching method projects 3D models on manifolds. We then use manifold learning technique to reduce dimensionality and maximize intra-class distances. We further propose a numerical method to sub-sample and fast search databases. To preserve retrieval accuracy using fewer landmark objects, we propose an alignment method which is also beneficial to existing works for fast search. The advantages of the retrieval framework are demonstrated in our experiments that it alleviates the problem of curse of dimensionality. It also improves the efficiency (3.4 times faster) and accuracy (30\% more accurate) of our matching algorithm proposed above. In the third part of the research, we also study a closely related area, 3D motions. 3D motions are captured by sticking sensor on human beings. These captured data are real human motions that are used to animate 3D articulated geometry models. Creating realistic 3D motions is an expensive and tedious task. Although 3D motions are very different from 3D articulated geometry models, we observe that existing works also suffer from the problem of temporal structure matching. This also leads to low efficiency in the matching algorithms. We apply the same idea of bag-based matching into the work of 3D motions. From our experiments, the proposed method has a 13\% improvement on precision at high recall and is 12 times faster than existing works. As a summary, we have developed algorithms for 3D articulated geometry models and 3D motions, covering feature extraction, feature matching, indexing and fast search methods. Through various experiments, our idea of converting restricted matching to bag-based matching improves matching efficiency and reliability. These have been shown in both 3D articulated geometry models and 3D motions. We have also connected 3D matching to the area of manifold learning. The embedding retrieval framework not only improves efficiency and accuracy, but has also opened a new area of research

    Analysis and Manipulation of Repetitive Structures of Varying Shape

    Get PDF
    Self-similarity and repetitions are ubiquitous in man-made and natural objects. Such structural regularities often relate to form, function, aesthetics, and design considerations. Discovering structural redundancies along with their dominant variations from 3D geometry not only allows us to better understand the underlying objects, but is also beneficial for several geometry processing tasks including compact representation, shape completion, and intuitive shape manipulation. To identify these repetitions, we present a novel detection algorithm based on analyzing a graph of surface features. We combine general feature detection schemes with a RANSAC-based randomized subgraph searching algorithm in order to reliably detect recurring patterns of locally unique structures. A subsequent segmentation step based on a simultaneous region growing is applied to verify that the actual data supports the patterns detected in the feature graphs. We introduce our graph based detection algorithm on the example of rigid repetitive structure detection. Then we extend the approach to allow more general deformations between the detected parts. We introduce subspace symmetries whereby we characterize similarity by requiring the set of repeating structures to form a low dimensional shape space. We discover these structures based on detecting linearly correlated correspondences among graphs of invariant features. The found symmetries along with the modeled variations are useful for a variety of applications including non-local and non-rigid denoising. Employing subspace symmetries for shape editing, we introduce a morphable part model for smart shape manipulation. The input geometry is converted to an assembly of deformable parts with appropriate boundary conditions. Our method uses self-similarities from a single model or corresponding parts of shape collections as training input and allows the user also to reassemble the identified parts in new configurations, thus exploiting both the discrete and continuous learned variations while ensuring appropriate boundary conditions across part boundaries. We obtain an interactive yet intuitive shape deformation framework producing realistic deformations on classes of objects that are difficult to edit using repetition-unaware deformation techniques
    corecore