144,366 research outputs found

    Robust and Efficient Graph Correspondence Transfer for Person Re-identification

    Full text link
    Spatial misalignment caused by variations in poses and viewpoints is one of the most critical issues that hinders the performance improvement in existing person re-identification (Re-ID) algorithms. To address this problem, in this paper, we present a robust and efficient graph correspondence transfer (REGCT) approach for explicit spatial alignment in Re-ID. Specifically, we propose to establish the patch-wise correspondences of positive training pairs via graph matching. By exploiting both spatial and visual contexts of human appearance in graph matching, meaningful semantic correspondences can be obtained. To circumvent the cumbersome \emph{on-line} graph matching in testing phase, we propose to transfer the \emph{off-line} learned patch-wise correspondences from the positive training pairs to test pairs. In detail, for each test pair, the training pairs with similar pose-pair configurations are selected as references. The matching patterns (i.e., the correspondences) of the selected references are then utilized to calculate the patch-wise feature distances of this test pair. To enhance the robustness of correspondence transfer, we design a novel pose context descriptor to accurately model human body configurations, and present an approach to measure the similarity between a pair of pose context descriptors. Meanwhile, to improve testing efficiency, we propose a correspondence template ensemble method using the voting mechanism, which significantly reduces the amount of patch-wise matchings involved in distance calculation. With aforementioned strategies, the REGCT model can effectively and efficiently handle the spatial misalignment problem in Re-ID. Extensive experiments on five challenging benchmarks, including VIPeR, Road, PRID450S, 3DPES and CUHK01, evidence the superior performance of REGCT over other state-of-the-art approaches.Comment: Tech. Report. The source code is available at http://www.dabi.temple.edu/~hbling/code/gct.htm. arXiv admin note: text overlap with arXiv:1804.0024

    Geometric Hypergraph Learning for Visual Tracking

    Full text link
    Graph based representation is widely used in visual tracking field by finding correct correspondences between target parts in consecutive frames. However, most graph based trackers consider pairwise geometric relations between local parts. They do not make full use of the target's intrinsic structure, thereby making the representation easily disturbed by errors in pairwise affinities when large deformation and occlusion occur. In this paper, we propose a geometric hypergraph learning based tracking method, which fully exploits high-order geometric relations among multiple correspondences of parts in consecutive frames. Then visual tracking is formulated as the mode-seeking problem on the hypergraph in which vertices represent correspondence hypotheses and hyperedges describe high-order geometric relations. Besides, a confidence-aware sampling method is developed to select representative vertices and hyperedges to construct the geometric hypergraph for more robustness and scalability. The experiments are carried out on two challenging datasets (VOT2014 and Deform-SOT) to demonstrate that the proposed method performs favorable against other existing trackers

    DASC: Robust Dense Descriptor for Multi-modal and Multi-spectral Correspondence Estimation

    Full text link
    Establishing dense correspondences between multiple images is a fundamental task in many applications. However, finding a reliable correspondence in multi-modal or multi-spectral images still remains unsolved due to their challenging photometric and geometric variations. In this paper, we propose a novel dense descriptor, called dense adaptive self-correlation (DASC), to estimate multi-modal and multi-spectral dense correspondences. Based on an observation that self-similarity existing within images is robust to imaging modality variations, we define the descriptor with a series of an adaptive self-correlation similarity measure between patches sampled by a randomized receptive field pooling, in which a sampling pattern is obtained using a discriminative learning. The computational redundancy of dense descriptors is dramatically reduced by applying fast edge-aware filtering. Furthermore, in order to address geometric variations including scale and rotation, we propose a geometry-invariant DASC (GI-DASC) descriptor that effectively leverages the DASC through a superpixel-based representation. For a quantitative evaluation of the GI-DASC, we build a novel multi-modal benchmark as varying photometric and geometric conditions. Experimental results demonstrate the outstanding performance of the DASC and GI-DASC in many cases of multi-modal and multi-spectral dense correspondences

    NM-Net: Mining Reliable Neighbors for Robust Feature Correspondences

    Full text link
    Feature correspondence selection is pivotal to many feature-matching based tasks in computer vision. Searching for spatially k-nearest neighbors is a common strategy for extracting local information in many previous works. However, there is no guarantee that the spatially k-nearest neighbors of correspondences are consistent because the spatial distribution of false correspondences is often irregular. To address this issue, we present a compatibility-specific mining method to search for consistent neighbors. Moreover, in order to extract and aggregate more reliable features from neighbors, we propose a hierarchical network named NM-Net with a series of convolution layers taking the generated graph as input, which is insensitive to the order of correspondences. Our experimental results have shown the proposed method achieves the state-of-the-art performance on four datasets with various inlier ratios and varying numbers of feature consistencies.Comment: IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2019) (oral

    The Video Genome

    Full text link
    Fast evolution of Internet technologies has led to an explosive growth of video data available in the public domain and created unprecedented challenges in the analysis, organization, management, and control of such content. The problems encountered in video analysis such as identifying a video in a large database (e.g. detecting pirated content in YouTube), putting together video fragments, finding similarities and common ancestry between different versions of a video, have analogous counterpart problems in genetic research and analysis of DNA and protein sequences. In this paper, we exploit the analogy between genetic sequences and videos and propose an approach to video analysis motivated by genomic research. Representing video information as video DNA sequences and applying bioinformatic algorithms allows to search, match, and compare videos in large-scale databases. We show an application for content-based metadata mapping between versions of annotated video

    Semi-dense Stereo Matching using Dual CNNs

    Full text link
    A robust solution for semi-dense stereo matching is presented. It utilizes two CNN models for computing stereo matching cost and performing confidence-based filtering, respectively. Compared to existing CNNs-based matching cost generation approaches, our method feeds additional global information into the network so that the learned model can better handle challenging cases, such as lighting changes and lack of textures. Through utilizing non-parametric transforms, our method is also more self-reliant than most existing semi-dense stereo approaches, which rely highly on the adjustment of parameters. The experimental results based on Middlebury Stereo dataset demonstrate that the proposed approach outperforms the state-of-the-art semi-dense stereo approaches

    SafeDrive: Enhancing Lane Appearance for Autonomous and Assisted Driving Under Limited Visibility

    Full text link
    Autonomous detection of lane markers improves road safety, and purely visual tracking is desirable for widespread vehicle compatibility and reducing sensor intrusion, cost, and energy consumption. However, visual approaches are often ineffective because of a number of factors; e.g., occlusion, poor weather conditions, and paint wear-off. We present an approach to enhance lane marker appearance for assisted and autonomous driving, particularly under poor visibility. Our method, named SafeDrive, attempts to improve visual lane detection approaches in drastically degraded visual conditions. SafeDrive finds lane markers in alternate imagery of the road at the vehicle's location and reconstructs a sparse 3D model of the surroundings. By estimating the geometric relationship between this 3D model and the current view, the lane markers are projected onto the visual scene; any lane detection algorithm can be subsequently used to detect lanes in the resulting image. SafeDrive does not require additional sensors other than vision and location data. We demonstrate the effectiveness of our approach on a number of test cases obtained from actual driving data recorded in urban settings.Comment: arXiv admin note: text overlap with arXiv:1701.0844

    Comparative evaluation of 2D feature correspondence selection algorithms

    Full text link
    Correspondence selection aiming at seeking correct feature correspondences from raw feature matches is pivotal for a number of feature-matching-based tasks. Various 2D (image) correspondence selection algorithms have been presented with decades of progress. Unfortunately, the lack of an in-depth evaluation makes it difficult for developers to choose a proper algorithm given a specific application. This paper fills this gap by evaluating eight 2D correspondence selection algorithms ranging from classical methods to the most recent ones on four standard datasets. The diversity of experimental datasets brings various nuisances including zoom, rotation, blur, viewpoint change, JPEG compression, light change, different rendering styles and multi-structures for comprehensive test. To further create different distributions of initial matches, a set of combinations of detector and descriptor is also taken into consideration. We measure the quality of a correspondence selection algorithm from four perspectives, i.e., precision, recall, F-measure and efficiency. According to evaluation results, the current advantages and limitations of all considered algorithms are aggregately summarized which could be treated as a "user guide" for the following developers

    Photo Stylistic Brush: Robust Style Transfer via Superpixel-Based Bipartite Graph

    Full text link
    With the rapid development of social network and multimedia technology, customized image and video stylization has been widely used for various social-media applications. In this paper, we explore the problem of exemplar-based photo style transfer, which provides a flexible and convenient way to invoke fantastic visual impression. Rather than investigating some fixed artistic patterns to represent certain styles as was done in some previous works, our work emphasizes styles related to a series of visual effects in the photograph, e.g. color, tone, and contrast. We propose a photo stylistic brush, an automatic robust style transfer approach based on Superpixel-based BIpartite Graph (SuperBIG). A two-step bipartite graph algorithm with different granularity levels is employed to aggregate pixels into superpixels and find their correspondences. In the first step, with the extracted hierarchical features, a bipartite graph is constructed to describe the content similarity for pixel partition to produce superpixels. In the second step, superpixels in the input/reference image are rematched to form a new superpixel-based bipartite graph, and superpixel-level correspondences are generated by a bipartite matching. Finally, the refined correspondence guides SuperBIG to perform the transformation in a decorrelated color space. Extensive experimental results demonstrate the effectiveness and robustness of the proposed method for transferring various styles of exemplar images, even for some challenging cases, such as night images

    Automated Tracking and Estimation for Control of Non-rigid Cloth

    Full text link
    This report is a summary of research conducted on cloth tracking for automated textile manufacturing during a two semester long research course at Georgia Tech. This work was completed in 2009. Advances in current sensing technology such as the Microsoft Kinect would now allow me to relax certain assumptions and generally improve the tracking performance. This is because a major part of my approach described in this paper was to track features in a 2D image and use these to estimate the cloth deformation. Innovations such as the Kinect would improve estimation due to the automatic depth information obtained when tracking 2D pixel locations. Additionally, higher resolution camera images would probably give better quality feature tracking. However, although I would use different technology now to implement this tracker, the algorithm described and implemented in this paper is still a viable approach which is why I am publishing this as a tech report for reference. In addition, although the related work is a bit exhaustive, it will be useful to a reader who is new to methods for tracking and estimation as well as modeling of cloth
    • …
    corecore