184,088 research outputs found

    Hierarchy-based Image Embeddings for Semantic Image Retrieval

    Full text link
    Deep neural networks trained for classification have been found to learn powerful image representations, which are also often used for other tasks such as comparing images w.r.t. their visual similarity. However, visual similarity does not imply semantic similarity. In order to learn semantically discriminative features, we propose to map images onto class embeddings whose pair-wise dot products correspond to a measure of semantic similarity between classes. Such an embedding does not only improve image retrieval results, but could also facilitate integrating semantics for other tasks, e.g., novelty detection or few-shot learning. We introduce a deterministic algorithm for computing the class centroids directly based on prior world-knowledge encoded in a hierarchy of classes such as WordNet. Experiments on CIFAR-100, NABirds, and ImageNet show that our learned semantic image embeddings improve the semantic consistency of image retrieval results by a large margin.Comment: Accepted at WACV 2019. Source code: https://github.com/cvjena/semantic-embedding

    SPSIM: A Superpixel-Based Similarity Index for Full-Reference Image Quality Assessment

    Get PDF
    Full-reference image quality assessment algorithms usually perform comparisons of features extracted from square patches. These patches do not have any visual meanings. On the contrary, a superpixel is a set of image pixels that share similar visual characteristics and is thus perceptually meaningful. Features from superpixels may improve the performance of image quality assessment. Inspired by this, we propose a new superpixel-based similarity index by extracting perceptually meaningful features and revising similarity measures. The proposed method evaluates image quality on the basis of three measurements, namely, superpixel luminance similarity, superpixel chrominance similarity, and pixel gradient similarity. The first two measurements assess the overall visual impression on local images. The third measurement quantifies structural variations. The impact of superpixel-based regional gradient consistency on image quality is also analyzed. Distorted images showing high regional gradient consistency with the corresponding reference images are visually appreciated. Therefore, the three measurements are further revised by incorporating the regional gradient consistency into their computations. A weighting function that indicates superpixel-based texture complexity is utilized in the pooling stage to obtain the final quality score. Experiments on several benchmark databases demonstrate that the proposed method is competitive with the state-of-the-art metrics

    Learning the Relation between Similarity Loss and Clustering Loss in Self-Supervised Learning

    Full text link
    Self-supervised learning enables networks to learn discriminative features from massive data itself. Most state-of-the-art methods maximize the similarity between two augmentations of one image based on contrastive learning. By utilizing the consistency of two augmentations, the burden of manual annotations can be freed. Contrastive learning exploits instance-level information to learn robust features. However, the learned information is probably confined to different views of the same instance. In this paper, we attempt to leverage the similarity between two distinct images to boost representation in self-supervised learning. In contrast to instance-level information, the similarity between two distinct images may provide more useful information. Besides, we analyze the relation between similarity loss and feature-level cross-entropy loss. These two losses are essential for most deep learning methods. However, the relation between these two losses is not clear. Similarity loss helps obtain instance-level representation, while feature-level cross-entropy loss helps mine the similarity between two distinct images. We provide theoretical analyses and experiments to show that a suitable combination of these two losses can get state-of-the-art results. Code is available at https://github.com/guijiejie/ICCL.Comment: This paper is accepted by IEEE Transactions on Image Processin

    Addressing Challenging Place Recognition Tasks using Generative Adversarial Networks

    Full text link
    Place recognition is an essential component of Simultaneous Localization And Mapping (SLAM). Under severe appearance change, reliable place recognition is a difficult perception task since the same place is perceptually very different in the morning, at night, or over different seasons. This work addresses place recognition as a domain translation task. Using a pair of coupled Generative Adversarial Networks (GANs), we show that it is possible to generate the appearance of one domain (such as summer) from another (such as winter) without requiring image-to-image correspondences across the domains. Mapping between domains is learned from sets of images in each domain without knowing the instance-to-instance correspondence by enforcing a cyclic consistency constraint. In the process, meaningful feature spaces are learned for each domain, the distances in which can be used for the task of place recognition. Experiments show that learned features correspond to visual similarity and can be effectively used for place recognition across seasons.Comment: Accepted for publication in IEEE International Conference on Robotics and Automation (ICRA), 201

    LayerDiffusion: Layered Controlled Image Editing with Diffusion Models

    Full text link
    Text-guided image editing has recently experienced rapid development. However, simultaneously performing multiple editing actions on a single image, such as background replacement and specific subject attribute changes, while maintaining consistency between the subject and the background remains challenging. In this paper, we propose LayerDiffusion, a semantic-based layered controlled image editing method. Our method enables non-rigid editing and attribute modification of specific subjects while preserving their unique characteristics and seamlessly integrating them into new backgrounds. We leverage a large-scale text-to-image model and employ a layered controlled optimization strategy combined with layered diffusion training. During the diffusion process, an iterative guidance strategy is used to generate a final image that aligns with the textual description. Experimental results demonstrate the effectiveness of our method in generating highly coherent images that closely align with the given textual description. The edited images maintain a high similarity to the features of the input image and surpass the performance of current leading image editing methods. LayerDiffusion opens up new possibilities for controllable image editing.Comment: 17 pages, 14 figure

    Tensor singular spectral analysis for 3D feature extraction in hyperspectral images.

    Get PDF
    Due to the cubic structure of a hyperspectral image (HSI), how to characterize its spectral and spatial properties in three dimensions is challenging. Conventional spectral-spatial methods usually extract spectral and spatial information separately, ignoring their intrinsic correlations. Recently, some 3D feature extraction methods are developed for the extraction of spectral and spatial features simultaneously, although they rely on local spatial-spectral regions and thus ignore the global spectral similarity and spatial consistency. Meanwhile, some of these methods contain huge model parameters which require a large number of training samples. In this paper, a novel Tensor Singular Spectral Analysis (TensorSSA) method is proposed to extract global and low-rank features of HSI. In TensorSSA, an adaptive embedding operation is first proposed to construct a trajectory tensor corresponding to the entire HSI, which takes full advantage of the spatial similarity and improves the adequate representation of the global low-rank properties of the HSI. Moreover, the obtained trajectory tensor, which contains the global and local spatial and spectral information of the HSI, is decomposed by the Tensor singular value decomposition (t-SVD) to explore its low-rank intrinsic features. Finally, the efficacy of the extracted features is evaluated using the accuracy of image classification with a support vector machine (SVM) classifier. Experimental results on three publicly available datasets have fully demonstrated the superiority of the proposed TensorSSA over a few state-of-the-art 2D/3D feature extraction and deep learning algorithms, even with a limited number of training samples
    • …
    corecore