5 research outputs found
Semantic-Enhanced Image Clustering
Image clustering is an important and open-challenging task in computer
vision. Although many methods have been proposed to solve the image clustering
task, they only explore images and uncover clusters according to the image
features, thus being unable to distinguish visually similar but semantically
different images. In this paper, we propose to investigate the task of image
clustering with the help of a visual-language pre-training model. Different
from the zero-shot setting, in which the class names are known, we only know
the number of clusters in this setting. Therefore, how to map images to a
proper semantic space and how to cluster images from both image and semantic
spaces are two key problems. To solve the above problems, we propose a novel
image clustering method guided by the visual-language pre-training model CLIP,
named \textbf{Semantic-Enhanced Image Clustering (SIC)}. In this new method, we
propose a method to map the given images to a proper semantic space first and
efficient methods to generate pseudo-labels according to the relationships
between images and semantics. Finally, we propose performing clustering with
consistency learning in both image space and semantic space, in a
self-supervised learning fashion. The theoretical result of convergence
analysis shows that our proposed method can converge at a sublinear speed.
Theoretical analysis of expectation risk also shows that we can reduce the
expected risk by improving neighborhood consistency, increasing prediction
confidence, or reducing neighborhood imbalance. Experimental results on five
benchmark datasets clearly show the superiority of our new method
Semantic-Enhanced Image Clustering
Image clustering is an important and open challenging task in computer vision. Although many methods have been proposed to solve the image clustering task, they only explore images and uncover clusters according to the image features, thus being unable to distinguish visually similar but semantically different images. In this paper, we propose to investigate the task of image clustering with the help of visual-language pre-training model. Different from the zero-shot setting, in which the class names are known, we only know the number of clusters in this setting. Therefore, how to map images to a proper semantic space and how to cluster images from both image and semantic spaces are two key problems. To solve the above problems, we propose a novel image clustering method guided by the visual-language pre-training model CLIP, named Semantic-Enhanced Image Clustering (SIC). In this new method, we propose a method to map the given images to a proper semantic space first and efficient methods to generate pseudo-labels according to the relationships between images and semantics. Finally, we propose to perform clustering with consistency learning in both image space and semantic space, in a self-supervised learning fashion. The theoretical result of convergence analysis shows that our proposed method can converge at a sublinear speed. Theoretical analysis of expectation risk also shows that we can reduce the expectation risk by improving neighborhood consistency, increasing prediction confidence, or reducing neighborhood imbalance. Experimental results on five benchmark datasets clearly show the superiority of our new method
Deep Unsupervised Hashing with Latent Semantic Components
Deep unsupervised hashing has been appreciated in the regime of image
retrieval. However, most prior arts failed to detect the semantic components
and their relationships behind the images, which makes them lack discriminative
power. To make up the defect, we propose a novel Deep Semantic Components
Hashing (DSCH), which involves a common sense that an image normally contains a
bunch of semantic components with homology and co-occurrence relationships.
Based on this prior, DSCH regards the semantic components as latent variables
under the Expectation-Maximization framework and designs a two-step iterative
algorithm with the objective of maximum likelihood of training data. Firstly,
DSCH constructs a semantic component structure by uncovering the fine-grained
semantics components of images with a Gaussian Mixture Modal~(GMM), where an
image is represented as a mixture of multiple components, and the semantics
co-occurrence are exploited. Besides, coarse-grained semantics components, are
discovered by considering the homology relationships between fine-grained
components, and the hierarchy organization is then constructed. Secondly, DSCH
makes the images close to their semantic component centers at both fine-grained
and coarse-grained levels, and also makes the images share similar semantic
components close to each other. Extensive experiments on three benchmark
datasets demonstrate that the proposed hierarchical semantic components indeed
facilitate the hashing model to achieve superior performance.Comment: 9 pages, 15 figure
Facile Synthesis of Two Dimensional (2D) V<sub>2</sub>O<sub>5</sub> Nanosheets Film towards Photodetectors
Most of the studies focused on V2O5 have been devoted to obtaining specific morphology and microstructure for its intended applications. Two dimensional (2D) V2O5 has the most valuable structure because of its unique planar configuration that can offer more active sites. In this study, a bottom-up and low-cost method that is hydrothermal combined with spin-coating and subsequent annealing was developed to prepare 2D V2O5 nanosheets film on quartz substrate. First, VOOH nanosheets were prepared by the hydrothermal method using V2O5 powders and EG as raw materials. Further, V2O5 nanosheets with an average lateral size over 500 nm and thickness less than 10 nm can be prepared from the parent VOOH nanosheets by annealing at 350 °C for 15 min in air. The prepared V2O5 nanosheets film was assembled of multiple nanosheets. The structural, morphological, microstructural and optical properties of the films were respective investigated by XRD, SEM, TEM and UV-Vis. The photodetector based on V2O5 nanosheets film shows good photoresponse with a response time of 2.4 s and a recovery time of 4.7 s