4,745 research outputs found

    COMPUTER VISION AND DEEP LEARNING WITH APPLICATIONS TO OBJECT DETECTION, SEGMENTATION, AND DOCUMENT ANALYSIS

    Get PDF
    There are three work on signature matching for document analysis. In the first work, we propose a large-scale signature matching method based on locality sensitive hashing (LSH). Shape Context features are used to describe the structure of signatures. Two stages of hashing are performed to find the nearest neighbors for query signatures. We show that our algorithm can achieve a high accuracy even when few signatures are collected from one same person and perform fast matching when dealing with a large dataset. In the second work, we present a novel signature matching method based on supervised topic models. Shape Context features are extracted from signature shape contours which capture the local variations in signature properties. We then use the concept of topic models to learn the shape context features which correspond to individual authors. We demonstrate considerable improvement over state of the art methods. In the third work, we present a partial signature matching method using graphical models. In additional to the second work, modified shape context features are extracted from the contour of signatures to describe both full and partial signatures. Hierarchical Dirichlet processes are implemented to infer the number of salient regions needed. The results show the effectiveness of the approach for both the partial and full signature matching. There are three work on deep learning for object detection and segmentation. In the first work, we propose a deep neural network fusion architecture for fast and robust pedestrian detection. The proposed network fusion architecture allows for parallel processing of multiple networks for speed. A single shot deep convolutional network is trained as an object detector to generate all possible pedestrian candidates of different sizes and occlusions. Next, multiple deep neural networks are used in parallel for further refinement of these pedestrian candidates. We introduce a soft-rejection based network fusion method to fuse the soft metrics from all networks together to generate the final confidence scores. Our method performs better than existing state-of-the-arts, especially when detecting small-size and occluded pedestrians. Furthermore, we propose a method for integrating pixel-wise semantic segmentation network into the network fusion architecture as a reinforcement to the pedestrian detector. In the second work, in addition to the first work, a fusion network is trained to fuse the multiple classification networks. Furthermore, a novel soft-label method is devised to assign floating point labels to the pedestrian candidates. This metric for each candidate detection is derived from the percentage of overlap of its bounding box with those of other ground truth classes. In the third work, we propose a boundary-sensitive deep neural network architecture for portrait segmentation. A residual network and atrous convolution based framework is trained as the base portrait segmentation network. To better solve boundary segmentation, three techniques are introduced. First, an individual boundary-sensitive kernel is introduced by labeling the boundary pixels as a separate class and using the soft-label strategy to assign floating-point label vectors to pixels in the boundary class. Each pixel contributes to multiple classes when updating loss based on its relative position to the contour. Second, a global boundary-sensitive kernel is used when updating loss function to assign different weights to pixel locations on one image to constrain the global shape of the resulted segmentation map. Third, we add multiple binary classifiers to classify boundary-sensitive portrait attributes, so as to refine the learning process of our model

    A Replica Inference Approach to Unsupervised Multi-Scale Image Segmentation

    Full text link
    We apply a replica inference based Potts model method to unsupervised image segmentation on multiple scales. This approach was inspired by the statistical mechanics problem of "community detection" and its phase diagram. Specifically, the problem is cast as identifying tightly bound clusters ("communities" or "solutes") against a background or "solvent". Within our multiresolution approach, we compute information theory based correlations among multiple solutions ("replicas") of the same graph over a range of resolutions. Significant multiresolution structures are identified by replica correlations as manifest in information theory overlaps. With the aid of these correlations as well as thermodynamic measures, the phase diagram of the corresponding Potts model is analyzed both at zero and finite temperatures. Optimal parameters corresponding to a sensible unsupervised segmentation correspond to the "easy phase" of the Potts model. Our algorithm is fast and shown to be at least as accurate as the best algorithms to date and to be especially suited to the detection of camouflaged images.Comment: 26 pages, 22 figure

    PortraitNet:Real-time portrait segmentation network for mobile device

    Get PDF
    Real-time portrait segmentation plays a significant role in many applications on mobile device, such as background replacement in video chat or teleconference. In this paper, we propose a real-time portrait segmentation model, called PortraitNet, that can run effectively and efficiently on mobile device. PortraitNet is based on a lightweight U-shape architecture with two auxiliary losses at the training stage, while no additional cost is required at the testing stage for portrait inference. The two auxiliary losses are boundary loss and consistency constraint loss. The former improves the accuracy of boundary pixels, and the latter enhances the robustness in complex lighting environment. We evaluate PortraitNet on portrait segmentation dataset EG1800 and Supervise-Portrait. Compared with the state-of-the-art methods, our approach achieves remarkable performance in terms of both accuracy and efficiency, especially for generating results with sharper boundaries and under severe illumination conditions. Meanwhile, PortraitNet is capable of processing 224 × 224 RGB images at 30 FPS on iPhone 7
    • …
    corecore