60,445 research outputs found

    Textual Query Based Image Retrieval

    Get PDF
    As digital cameras becoming popular and mobile phones are increased very fast so that consumers photos are increased. So that retrieving the appropriate image depending on content or text based image retrieval techniques has become very vast. Content-based image retrieval, a technique which uses visual contents to search images from large scale image databases according to users interests, has been an active and fast advancing research area semantic gap between the low-level visual features and the high-level semantic concepts. Real-time textual query-based personal photo retrieval system by leveraging millions of Web images and their associated rich textual descriptions. Then user provides a textual query. Our system generates the inverted file to automatically find the positive Web images that are related to the textual query as well as the negative Web images that are irrelevant to the textual query. For that purpose we use k-Nearest Neighbor (kNN), Decision stumps, and linear SVM, to rank personal photos. For improvement of the photo retrieval performance, we have used two relevance feedback methods via cross-domain learning, which effectively utilize both the Web images and personal images. DOI: 10.17762/ijritcc2321-8169.15032

    MCAD: Multi-teacher Cross-modal Alignment Distillation for efficient image-text retrieval

    Full text link
    With the success of large-scale visual-language pretraining models and the wide application of image-text retrieval in industry areas, reducing the model size and streamlining their terminal-device deployment have become urgently necessary. The mainstream model structures for image-text retrieval are single-stream and dual-stream, both aiming to close the semantic gap between visual and textual modalities. Dual-stream models excel at offline indexing and fast inference, while single-stream models achieve more accurate cross-model alignment by employing adequate feature fusion. We propose a multi-teacher cross-modality alignment distillation (MCAD) technique to integrate the advantages of single-stream and dual-stream models. By incorporating the fused single-stream features into the image and text features of the dual-stream model, we formulate new modified teacher features and logits. Then, we conduct both logit and feature distillation to boost the capability of the student dual-stream model, achieving high retrieval performance without increasing inference complexity. Extensive experiments demonstrate the remarkable performance and high efficiency of MCAD on image-text retrieval tasks. Furthermore, we implement a mobile CLIP model on Snapdragon clips with only 93M running memory and 30ms search latency, without apparent performance degradation of the original large CLIP

    Leveraging Deep Visual Descriptors for Hierarchical Efficient Localization

    Full text link
    Many robotics applications require precise pose estimates despite operating in large and changing environments. This can be addressed by visual localization, using a pre-computed 3D model of the surroundings. The pose estimation then amounts to finding correspondences between 2D keypoints in a query image and 3D points in the model using local descriptors. However, computational power is often limited on robotic platforms, making this task challenging in large-scale environments. Binary feature descriptors significantly speed up this 2D-3D matching, and have become popular in the robotics community, but also strongly impair the robustness to perceptual aliasing and changes in viewpoint, illumination and scene structure. In this work, we propose to leverage recent advances in deep learning to perform an efficient hierarchical localization. We first localize at the map level using learned image-wide global descriptors, and subsequently estimate a precise pose from 2D-3D matches computed in the candidate places only. This restricts the local search and thus allows to efficiently exploit powerful non-binary descriptors usually dismissed on resource-constrained devices. Our approach results in state-of-the-art localization performance while running in real-time on a popular mobile platform, enabling new prospects for robotics research.Comment: CoRL 2018 Camera-ready (fix typos and update citations
    • …
    corecore