20,572 research outputs found

    JigsawNet: Shredded Image Reassembly using Convolutional Neural Network and Loop-based Composition

    Full text link
    This paper proposes a novel algorithm to reassemble an arbitrarily shredded image to its original status. Existing reassembly pipelines commonly consist of a local matching stage and a global compositions stage. In the local stage, a key challenge in fragment reassembly is to reliably compute and identify correct pairwise matching, for which most existing algorithms use handcrafted features, and hence, cannot reliably handle complicated puzzles. We build a deep convolutional neural network to detect the compatibility of a pairwise stitching, and use it to prune computed pairwise matches. To improve the network efficiency and accuracy, we transfer the calculation of CNN to the stitching region and apply a boost training strategy. In the global composition stage, we modify the commonly adopted greedy edge selection strategies to two new loop closure based searching algorithms. Extensive experiments show that our algorithm significantly outperforms existing methods on solving various puzzles, especially those challenging ones with many fragment pieces

    Solving Visual Madlibs with Multiple Cues

    Get PDF
    This paper focuses on answering fill-in-the-blank style multiple choice questions from the Visual Madlibs dataset. Previous approaches to Visual Question Answering (VQA) have mainly used generic image features from networks trained on the ImageNet dataset, despite the wide scope of questions. In contrast, our approach employs features derived from networks trained for specialized tasks of scene classification, person activity prediction, and person and object attribute prediction. We also present a method for selecting sub-regions of an image that are relevant for evaluating the appropriateness of a putative answer. Visual features are computed both from the whole image and from local regions, while sentences are mapped to a common space using a simple normalized canonical correlation analysis (CCA) model. Our results show a significant improvement over the previous state of the art, and indicate that answering different question types benefits from examining a variety of image cues and carefully choosing informative image sub-regions

    TransNFCM: Translation-Based Neural Fashion Compatibility Modeling

    Full text link
    Identifying mix-and-match relationships between fashion items is an urgent task in a fashion e-commerce recommender system. It will significantly enhance user experience and satisfaction. However, due to the challenges of inferring the rich yet complicated set of compatibility patterns in a large e-commerce corpus of fashion items, this task is still underexplored. Inspired by the recent advances in multi-relational knowledge representation learning and deep neural networks, this paper proposes a novel Translation-based Neural Fashion Compatibility Modeling (TransNFCM) framework, which jointly optimizes fashion item embeddings and category-specific complementary relations in a unified space via an end-to-end learning manner. TransNFCM places items in a unified embedding space where a category-specific relation (category-comp-category) is modeled as a vector translation operating on the embeddings of compatible items from the corresponding categories. By this way, we not only capture the specific notion of compatibility conditioned on a specific pair of complementary categories, but also preserve the global notion of compatibility. We also design a deep fashion item encoder which exploits the complementary characteristic of visual and textual features to represent the fashion products. To the best of our knowledge, this is the first work that uses category-specific complementary relations to model the category-aware compatibility between items in a translation-based embedding space. Extensive experiments demonstrate the effectiveness of TransNFCM over the state-of-the-arts on two real-world datasets.Comment: Accepted in AAAI 2019 conferenc

    Constrained tGAP for generalisation between scales: the case of Dutch topographic data

    Get PDF
    This article presents the results of integrating large- and medium-scale data into a unified data structure. This structure can be used as a single non-redundant representation for the input data, which can be queried at any arbitrary scale between the source scales. The solution is based on the constrained topological Generalized Area Partition (tGAP), which stores the results of a generalization process applied to the large-scale dataset, and is controlled by the objects of the medium-scale dataset, which act as constraints on the large-scale objects. The result contains the accurate geometry of the large-scale objects enriched with the generalization knowledge of the medium-scale data, stored as references in the constraint tGAP structure. The advantage of this constrained approach over the original tGAP is the higher quality of the aggregated maps. The idea was implemented with real topographic datasets from The Netherlands for the large- (1:1000) and medium-scale (1:10,000) data. The approach is expected to be equally valid for any categorical map and for other scales as well
    corecore