214 research outputs found

    Entropy-Based Maximally Stable Extremal Regions for Robust Feature Detection

    Get PDF
    Maximally stable extremal regions (MSER) is a state-of-the-art method in local feature detection. However, this method is sensitive to blurring because, in blurred images, the intensity values in region boundary will vary more slowly, and this will undermine the stability criterion that the MSER relies on. In this paper, we propose a method to improve MSER, making it more robust to image blurring. To find back the regions missed by MSER in the blurred image, we utilize the fact that the entropy of probability distribution function of intensity values increases rapidly when the local region expands across the boundary, while the entropy in the central part remains small. We use the entropy averaged by the regional area as a measure to reestimate regions missed by MSER. Experiments show that, when dealing with blurred images, the proposed method has better performance than the original MSER, with little extra computational effort

    Generic Tubelet Proposals for Action Localization

    Full text link
    We develop a novel framework for action localization in videos. We propose the Tube Proposal Network (TPN), which can generate generic, class-independent, video-level tubelet proposals in videos. The generated tubelet proposals can be utilized in various video analysis tasks, including recognizing and localizing actions in videos. In particular, we integrate these generic tubelet proposals into a unified temporal deep network for action classification. Compared with other methods, our generic tubelet proposal method is accurate, general, and is fully differentiable under a smoothL1 loss function. We demonstrate the performance of our algorithm on the standard UCF-Sports, J-HMDB21, and UCF-101 datasets. Our class-independent TPN outperforms other tubelet generation methods, and our unified temporal deep network achieves state-of-the-art localization results on all three datasets

    Learning and Using Taxonomies For Fast Visual Categorization

    Get PDF
    The computational complexity of current visual categorization algorithms scales linearly at best with the number of categories. The goal of classifying simultaneously N_(cat) = 10^4 - 10^5 visual categories requires sub-linear classification costs. We explore algorithms for automatically building classification trees which have, in principle, log N_(cat) complexity. We find that a greedy algorithm that recursively splits the set of categories into the two minimally confused subsets achieves 5-20 fold speedups at a small cost in classification performance. Our approach is independent of the specific classification algorithm used. A welcome by-product of our algorithm is a very reasonable taxonomy of the Caltech-256 dataset

    Multi-View Dynamic Shape Refinement Using Local Temporal Integration

    Get PDF
    International audienceWe consider 4D shape reconstructions in multi-view environments and investigate how to exploit temporal redundancy for precision refinement. In addition to being beneficial to many dynamic multi-view scenarios this also enables larger scenes where such increased precision can compensate for the reduced spatial resolution per image frame. With precision and scalability in mind, we propose a symmetric (non-causal) local time-window geometric integration scheme over temporal sequences, where shape reconstructions are refined framewise by warping local and reliable geometric regions of neighboring frames to them. This is in contrast to recent comparable approaches targeting a different context with more compact scenes and real-time applications. These usually use a single dense volumetric update space or geometric template, which they causally track and update globally frame by frame, with limitations in scalability for larger scenes and in topology and precision with a template based strategy. Our templateless and local approach is a first step towards temporal shape super-resolution. We show that it improves reconstruction accuracy by considering multiple frames. To this purpose, and in addition to real data examples, we introduce a multi-camera synthetic dataset that provides ground-truth data for mid-scale dynamic scenes

    Automatic Segmentation for Plant Leaves via Multiview Stereo Reconstruction

    Get PDF
    corecore