19,685 research outputs found

    Minimizing Supervision in Multi-label Categorization

    Full text link
    Multiple categories of objects are present in most images. Treating this as a multi-class classification is not justified. We treat this as a multi-label classification problem. In this paper, we further aim to minimize the supervision required for providing supervision in multi-label classification. Specifically, we investigate an effective class of approaches that associate a weak localization with each category either in terms of the bounding box or segmentation mask. Doing so improves the accuracy of multi-label categorization. The approach we adopt is one of active learning, i.e., incrementally selecting a set of samples that need supervision based on the current model, obtaining supervision for these samples, retraining the model with the additional set of supervised samples and proceeding again to select the next set of samples. A crucial concern is the choice of the set of samples. In doing so, we provide a novel insight, and no specific measure succeeds in obtaining a consistently improved selection criterion. We, therefore, provide a selection criterion that consistently improves the overall baseline criterion by choosing the top k set of samples for a varied set of criteria. Using this criterion, we are able to show that we can retain more than 98% of the fully supervised performance with just 20% of samples (and more than 96% using 10%) of the dataset on PASCAL VOC 2007 and 2012. Also, our proposed approach consistently outperforms all other baseline metrics for all benchmark datasets and model combinations.Comment: Accepted in CVPR-W 202

    VITON: An Image-based Virtual Try-on Network

    Full text link
    We present an image-based VIirtual Try-On Network (VITON) without using 3D information in any form, which seamlessly transfers a desired clothing item onto the corresponding region of a person using a coarse-to-fine strategy. Conditioned upon a new clothing-agnostic yet descriptive person representation, our framework first generates a coarse synthesized image with the target clothing item overlaid on that same person in the same pose. We further enhance the initial blurry clothing area with a refinement network. The network is trained to learn how much detail to utilize from the target clothing item, and where to apply to the person in order to synthesize a photo-realistic image in which the target item deforms naturally with clear visual patterns. Experiments on our newly collected Zalando dataset demonstrate its promise in the image-based virtual try-on task over state-of-the-art generative models
    • …
    corecore