Search CORE

43,192 research outputs found

Multi Focus Image Fusion with variable size windows

Author: Calderon Felix
Flores Juan J.
Garnica-Carrillo Adan
Publication venue: 'Universitat Politecnica de Valencia'
Publication date: 01/06/2018
Field of study

[EN] In this paper we present the Linear Image Combination Algorithm with Variable Windows (CLI-VV) for the fusion of multifocus images. Unlike the CLI-S algorithm presented in a previous work, the CLI-VV algorithm allows to automatically determine the optimal size of the window in each pixel for the segmentation of the regions with the highest sharpness. We also present the generalized CLI-VV Algorithm for the fusion of sets of multi-focus images with more than two images. This new algorithm is called Variable Windows Multi-focus Fusion (FM-VV). The CLI-VV Algorithm was tested with 21 pairs of synthetic images and 29 pairs of real multi-focus images, and the FM-VV Algorithm on 5 trios of multi-focus images. In all the tests a competitive accuracy was obtained, with execution times lower than those reported in the literature.[ES] En este artículo presentamos el Algoritmo Combinación Lineal de Imágenes con Ventanas Variables (CLI-VV) para la fusión de imágenes multi-foco. A diferencia del Algoritmo CLI-S presentado en un trabajo anterior, el algoritmo CLI-VV permite determinar automáticamente el tamaño óptimo de la ventana en cada píxel para la segmentación de las regiones con la mayor nitidez. También presentamos la generalizado el Algoritmo CLI-VV para la fusión de conjuntos de imágenes multi-foco con más de dos imágenes. A este nuevo algoritmo lo denominamos Fusión Multi-foco con Ventanas Variables (FM-VV). El Algoritmo CLI-VV se probó con 21 pares de imágenes sintéticas y 29 pares de imágenes multi-foco reales, y el Algoritmo FM-VV sobre 5 tríos de imágenes multi-foco. En todos los ejemplos se obtuvo un porcentaje de acierto competitivos, producidos en tiempos de ejecución menores a los reportados en la literatura.Calderon, F.; Garnica-Carrillo, A.; Flores, JJ. (2018). Fusión de Imágenes Multi-Foco con Ventanas Variables. Revista Iberoamericana de Automática e Informática industrial. 15(3):262-276. https://doi.org/10.4995/riai.2017.8852OJS262276153Aslantas, V., Kurban, R., 2010. Fusion of multi-focus images using differential evolution algorithm. Expert Systems with Applications 37 (12), 8861 - 8870. https://doi.org/10.1016/j.eswa.2010.06.011Aslantas, V., Toprak, A. N., 2014. A pixel based multi-focus image fusion method. Optics Communications 332, 350 - 358. https://doi.org/10.1016/j.optcom.2014.07.044Aslantas, V., Toprak, A. N., 2017. Multi-focus image fusion based on optimal defocus estimation. Computers and Electrical Engineering. https://doi.org/10.1016/j.compeleceng.2017.02.003Assirati, L., Silva, N. R., Berton, L., Lopes, A. A., Bruno, O. M., 2014. Performing edge detection by difference of gaussians using q-gaussian kernels. Journal of Physics: Conference Series 490 (1), 012020. https://doi.org/10.1088/1742-6596/490/1/012020Bai, X., Zhang, Y., Zhou, F., Xue, B., 2015. Quadtree-based multi-focus image fusion using a weighted focus-measure. Information Fusion 22, 105 - 118. https://doi.org/10.1016/j.inffus.2014.05.003Calderon, F., Garnica, A., 2014. Multi focus image fusion based on linear combination of images. IEEE, pp. 1-7. https://doi.org/10.1109/ROPEC.2014.7036340Calderon, F., Garnica-Carrillo, A., Flores, J. J., 2016. Fusión de imágenes multi foco basado en la combinación lineal de imágenes utilizando imágenes incrementales. Revista Iberoamericana de Automática e Informática Industrial RIAI 13 (4), 450 - 461. https://doi.org/10.1016/j.riai.2016.07.002Cao, L., Jin, L., Tao, H., Li, G., Zhuang, Z., Zhang, Y., Feb 2015. Multi-focus image fusion based on spatial frequency in discrete cosine transform domain. Signal Processing Letters, IEEE 22 (2), 220-224. https://doi.org/10.1109/LSP.2014.2354534Chai, Y., Li, H., Li, Z., 2011. Multifocus image fusion scheme using focused region detection and multiresolution. Optics Communications 284 (19), 4376 - 4389. https://doi.org/10.1016/j.optcom.2011.05.046De, I., Chanda, B., 2013. Multi-focus image fusion using a morphology-based focus measure in a quad-tree structure. Information Fusion 14 (2), 136 - 146. https://doi.org/10.1016/j.inffus.2012.01.007Duan, J., Meng, G., Xiang, S., Pan, C., 2014. Multifocus image fusion via focus segmentation and region reconstruction. Neurocomputing 140, 193 - 209. https://doi.org/10.1016/j.neucom.2014.03.023Eskicioglu, A., Fisher, P., Dec 1995. Image quality measures and their performance. Communications, IEEE Transactions on 43 (12), 2959-2965. https://doi.org/10.1109/26.477498Kong, W., Lei, Y., 2017. Multi-focus image fusion using biochemical ion exchange model. Applied Soft Computing 51, 314 - 327. https://doi.org/10.1016/j.asoc.2016.11.033Kuthirummal, S., Nagahara, H., Zhou, C., Nayar, S., Jan 2011. Flexible depth of field photography. Pattern Analysis and Machine Intelligence, IEEE Transactions on 33 (1), 58-71. https://doi.org/10.1109/TPAMI.2010.66Lewis, J. J., O'Callaghan, R. J., Nikolov, S. G., Bull, D. R., Canagarajah, N., 2007. Pixel- and region-based image fusion with complex wavelets. Information Fusion 8 (2), 119 - 130, special Issue on Image Fusion: Advances in the State of the Art. https://doi.org/10.1016/j.inffus.2005.09.006Li, H., Chai, Y., Li, Z., 2013a. Multi-focus image fusion based on nonsubsampled contourlet transform and focused regions detection. Optik - International Journal for Light and Electron Optics 124 (1), 40 - 51. https://doi.org/10.1016/j.ijleo.2011.11.088Li, H., Chai, Y., Li, Z., 2013b. A new fusion scheme for multifocus images based on focused pixels detection. Machine vision and applications 24 (6), 1167-1181. https://doi.org/10.1007/s00138-013-0502-4Li, H., Manjunath, B., Mitra, S., 1995. Multisensor image fusion using the wavelet transform. Graphical Models and Image Processing 57 (3), 235 - 245. https://doi.org/10.1006/gmip.1995.1022Li, S., Kang, X., Fang, L., Hu, J., Yin, H., 2017. Pixel-level image fusion: A survey of the state of the art. Information Fusion 33, 100 - 112. https://doi.org/10.1016/j.inffus.2016.05.004Li, S., Kwok, J. T., Wang, Y., 2001. Combination of images with diverse focuses using the spatial frequency. Information Fusion 2 (3), 169 - 176. https://doi.org/10.1016/S1566-2535(01)00038-0Li, S., Kwok, J. T., Wang, Y., 2002. Multifocus image fusion using artificial neural networks. Pattern Recognition Letters 23 (8), 985 - 997. https://doi.org/10.1016/S0167-8655(02)00029-6Li, S., Yang, B., 2008a. Multifocus image fusion by combining curvelet and wavelet transform. Pattern Recognition Letters 29 (9), 1295-1301. https://doi.org/10.1016/j.patrec.2008.02.002Li, S., Yang, B., 2008b. Multifocus image fusion using region segmentation and spatial frequency. Image and Vision Computing 26 (7), 971 - 979. https://doi.org/10.1016/j.imavis.2007.10.012Li, X., He, M., Roux, M., August 2010. Multifocus image fusion based on redundant wavelet transform. Image Processing, IET 4 (4), 283-293. https://doi.org/10.1049/iet-ipr.2008.0259Liu, Y., Chen, X., Peng, H., Wang, Z., 2017a. Multi-focus image fusion with a deep convolutional neural network. Information Fusion 36, 191 - 207. https://doi.org/10.1016/j.inffus.2016.12.001Liu, Z., Chai, Y., Yin, H., Zhou, J., Zhu, Z., 2017b. A novel multi-focus image fusion approach based on image decomposition. Information Fusion 35, 102 - 116. https://doi.org/10.1016/j.inffus.2016.09.007Long, J., Shelhamer, E., Darrell, T., 2014. Fully convolutional networks for semantic segmentation. CoRR abs/1411.4038.Luo, X., Zhang, J., Dai, Q., 2012. A regional image fusion based on similarity characteristics. Signal Processing 92 (5), 1268 - 1280. https://doi.org/10.1016/j.sigpro.2011.11.021Ma, Y., Zhan, K.,Wang, Z., service), S. O., 2011. Applications of pulse-coupled neural networks.Malviya, A., Bhirud, S., Dec 2009. Wavelet based multi-focus image fusion. In: Methods and Models in Computer Science, 2009. ICM2CS 2009. Proceeding of International Conference on. pp. 1-6. https://doi.org/10.1109/ICM2CS.2009.5397990Nejati, M., Samavi, S., Shirani, S., 2015. Multi-focus image fusion using dictionary-based sparse representation. Information Fusion 25, 72 - 84. https://doi.org/10.1016/j.inffus.2014.10.004Orozco, R. I., 2013. Fusión de imágenes multifoco por medio de filtrado de regiones de alta y baja frecuencia. Master's thesis, División de Estudios de Postgrado. Facultad de Ingeniería Eléctrica. UMSNH, Morelia Michoacan Mexico.Pagidimarry, M., Babu, K. A., 2011. An all approach for multi-focus image fusion using neural network. Artificial Intelligent Systems and Machine Learning 3 (12), 732-739.Pajares, G., de la Cruz, J. M., 2004. A wavelet-based image fusion tutorial. Pattern Recognition 37 (9), 1855 - 1872. https://doi.org/10.1016/j.patcog.2004.03.010Piella, G., 2003. A general framework for multiresolution image fusion: from pixels to regions. Information Fusion 4 (4), 259 - 280. https://doi.org/10.1016/S1566-2535(03)00046-0Pramanik, S., Prusty, S., Bhattacharjee, D., Bhunre, P. K., 2013. A region-topixel based multi-sensor image fusion. Procedia Technology 10, 654 - 662. https://doi.org/10.1016/j.protcy.2013.12.407Qu, X., Hou, Y., Lam, F., Guo, D., Zhong, J., Chen, Z., 2014. Magnetic resonance image reconstruction from undersampled measurements using a patchbased nonlocal operator. Medical Image Analysis 18 (6), 843 - 856, sparse Methods for Signal Reconstruction and Medical Image Analysis. https://doi.org/10.1016/j.media.2013.09.007Riaz, M., Park, S., Ahmad, M., Rasheed, W., Park, J., 2008. Generalized laplacian as focus measure. In: Bubak, M., van Albada, G., Dongarra, J., Sloot, P. (Eds.), Computational Science ICCS 2008. Vol. 5101 of Lecture Notes in Computer Science. Springer Berlin Heidelberg, pp. 1013-1021. https://doi.org/10.1007/978-3-540-69384-0_106Rivera, M., Ocegueda, O., Marroquin, J., Dec 2007. Entropy-controlled quadratic markov measure field models for efficient image segmentation. Image Processing, IEEE Transactions on 16 (12), 3047-3057. https://doi.org/10.1109/TIP.2007.909384Sezan, M., Pavlovic, G., Tekalp, A., Erdem, A., Apr 1991. On modeling the focus blur in image restoration. In: Acoustics, Speech, and Signal Processing, 1991. ICASSP-91., 1991 International Conference on. pp. 2485-2488 vol.4. https://doi.org/10.1109/ICASSP.1991.150905Shah, P., Merchant, S. N., Desai, U. B., 2013. Multifocus and multispectral image fusion based on pixel significance using multiresolution decomposition. Signal, Image and Video Processing 7 (1), 95-109. https://doi.org/10.1007/s11760-011-0219-7Shi, W., Zhu, C., Tian, Y., Nichol, J., 2005. Wavelet-based image fusion and quality assessment. International Journal of Applied Earth Observation and Geoinformation 6 (3-4), 241 - 251. https://doi.org/10.1016/j.jag.2004.10.010Tian, J., Chen, L., Sept 2010. Multi-focus image fusion using wavelet-domain statistics. In: Image Processing (ICIP), 2010 17th IEEE International Conference on. pp. 1205-1208. https://doi.org/10.1109/ICIP.2010.5651791Viola, P., Jones, M., 2001. Rapid object detection using a boosted cascade of simple features. In: Computer Vision and Pattern Recognition, 2001. CVPR 2001. Proceedings of the 2001 IEEE Computer Society Conference on. Vol. 1. pp. I-511-I-518 vol.1. https://doi.org/10.1109/CVPR.2001.990517Yang, Y., 2011. A novel fDWTg based multi-focus image fusion method. Procedia Engineering 24 (0), 177 - 181, international Conference on Advances in Engineering 2011.Yang, Y., Huang, S., Gao, J., Qian, Z., 2014. Multi-focus image fusion using an effective discrete wavelet transform based algorithm. Measurement Science Review 14 (2), 102 - 108. https://doi.org/10.2478/msr-2014-0014Yang, Y., Tong, S., Huang, S., Lin, P., 2015. Multifocus image fusion based on nsct and focused area detection. IEEE Sensors Journal 15 (5), 2824-2838. Zhang, B., Lu, X., Pei, H., Liu, H., Zhao, Y., Zhou, W., 2016a. Multi-focus image fusion algorithm based on focused region extraction. Neurocomputing 174, 733 - 748. https://doi.org/10.1016/j.neucom.2015.09.092Zhang, Q., long Guo, B., 2009. Multifocus image fusion using the nonsubsampled contourlet transform. Signal Processing 89 (7), 1334 - 1346. https://doi.org/10.1016/j.sigpro.2009.01.012Zhang, Y., Chen, L., Zhao, Z., Jia, J., 2016b. Multi-focus image fusion based on cartoon-texture image decomposition. Optik - International Journal for Light and Electron Optics 127 (3), 1291 - 1296. https://doi.org/10.1016/j.ijleo.2015.10.098Zhang, Z., Blum, R., Aug 1999. A categorization of multiscale-decompositionbased image fusion schemes with a performance study for a digital camera application. Proceedings of the IEEE 87 (8), 1315-1326. https://doi.org/10.1109/5.775414Zhao, H., Li, Q., Feng, H., 2008. Multi-focus color image fusion in the HSI space using the sum-modified-laplacian and a coarse edge map. Image and Vision Computing 26 (9), 1285 - 1295. https://doi.org/10.1016/j.imavis.2008.03.007Zhou, L., Ji, G., Shi, C., Feng, C., Nian, R., 2006. A Multi-focus Image Fusion Method Based on Image Information Features and the Artificial Neural Networks. Vol. 344. Springer Berlin Heidelberg, Berlin, Heidelberg, pp. 747-752. https://doi.org/10.1007/978-3-540-37256-1_91Zhou, Z., Li, S., Wang, B., 2014. Multi-scale weighted gradient-based fusion for multi-focus images. Information Fusion 20 (0), 60 - 72. https://doi.org/10.1016/j.inffus.2013.11.00

Directory of Open Access Journals

RiuNet

Effective Image Retrieval via Multilinear Multi-index Fusion

Author: Tian Qi
Xie Yuan
Zhang Wensheng
Zhang Zhizhong
Publication venue
Publication date: 26/09/2017
Field of study

Multi-index fusion has demonstrated impressive performances in retrieval task by integrating different visual representations in a unified framework. However, previous works mainly consider propagating similarities via neighbor structure, ignoring the high order information among different visual representations. In this paper, we propose a new multi-index fusion scheme for image retrieval. By formulating this procedure as a multilinear based optimization problem, the complementary information hidden in different indexes can be explored more thoroughly. Specially, we first build our multiple indexes from various visual representations. Then a so-called index-specific functional matrix, which aims to propagate similarities, is introduced for updating the original index. The functional matrices are then optimized in a unified tensor space to achieve a refinement, such that the relevant images can be pushed more closer. The optimization problem can be efficiently solved by the augmented Lagrangian method with theoretical convergence guarantee. Unlike the traditional multi-index fusion scheme, our approach embeds the multi-index subspace structure into the new indexes with sparse constraint, thus it has little additional memory consumption in online query stage. Experimental evaluation on three benchmark datasets reveals that the proposed approach achieves the state-of-the-art performance, i.e., N-score 3.94 on UKBench, mAP 94.1\% on Holiday and 62.39\% on Market-1501.Comment: 12 page

arXiv.org e-Print Archive

Scale-Invariant Structure Saliency Selection for Fast Image Fusion

Author: Liang Yixiong
Liu Jianfeng
Mao Yuan
Xia Jiazhi
Xiang Yao
Publication venue: 'Elsevier BV'
Publication date: 30/10/2018
Field of study

In this paper, we present a fast yet effective method for pixel-level scale-invariant image fusion in spatial domain based on the scale-space theory. Specifically, we propose a scale-invariant structure saliency selection scheme based on the difference-of-Gaussian (DoG) pyramid of images to build the weights or activity map. Due to the scale-invariant structure saliency selection, our method can keep both details of small size objects and the integrity information of large size objects in images. In addition, our method is very efficient since there are no complex operation involved and easy to be implemented and therefore can be used for fast high resolution images fusion. Experimental results demonstrate the proposed method yields competitive or even better results comparing to state-of-the-art image fusion methods both in terms of visual quality and objective evaluation metrics. Furthermore, the proposed method is very fast and can be used to fuse the high resolution images in real-time. Code is available at https://github.com/yiqingmy/Fusion

arXiv.org e-Print Archive

Boosting in Image Quality Assessment

Author: AlRegib Ghassan
Temel Dogancan
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 21/11/2018
Field of study

In this paper, we analyze the effect of boosting in image quality assessment through multi-method fusion. Existing multi-method studies focus on proposing a single quality estimator. On the contrary, we investigate the generalizability of multi-method fusion as a framework. In addition to support vector machines that are commonly used in the multi-method fusion, we propose using neural networks in the boosting. To span different types of image quality assessment algorithms, we use quality estimators based on fidelity, perceptually-extended fidelity, structural similarity, spectral similarity, color, and learning. In the experiments, we perform k-fold cross validation using the LIVE, the multiply distorted LIVE, and the TID 2013 databases and the performance of image quality assessment algorithms are measured via accuracy-, linearity-, and ranking-based metrics. Based on the experiments, we show that boosting methods generally improve the performance of image quality assessment and the level of improvement depends on the type of the boosting algorithm. Our experimental results also indicate that boosting the worst performing quality estimator with two or more additional methods leads to statistically significant performance enhancements independent of the boosting technique and neural network-based boosting outperforms support vector machine-based boosting when two or more methods are fused.Comment: Paper: 6 pages, 5 tables, 1 figure, Presentation: 16 slides [Ancillary files

arXiv.org e-Print Archive

Orientation Driven Bag of Appearances for Person Re-identification

Author: Hu Liang
Liu Hong
Ma Liqian
Sun Qianru
Wang Can
Publication venue
Publication date: 09/05/2016
Field of study

Person re-identification (re-id) consists of associating individual across camera network, which is valuable for intelligent video surveillance and has drawn wide attention. Although person re-identification research is making progress, it still faces some challenges such as varying poses, illumination and viewpoints. For feature representation in re-identification, existing works usually use low-level descriptors which do not take full advantage of body structure information, resulting in low representation ability. %discrimination. To solve this problem, this paper proposes the mid-level body-structure based feature representation (BSFR) which introduces body structure pyramid for codebook learning and feature pooling in the vertical direction of human body. Besides, varying viewpoints in the horizontal direction of human body usually causes the data missing problem,

i.e.

, the appearances obtained in different orientations of the identical person could vary significantly. To address this problem, the orientation driven bag of appearances (ODBoA) is proposed to utilize person orientation information extracted by orientation estimation technic. To properly evaluate the proposed approach, we introduce a new re-identification dataset (Market-1203) based on the Market-1501 dataset and propose a new re-identification dataset (PKU-Reid). Both datasets contain multiple images captured in different body orientations for each person. Experimental results on three public datasets and two proposed datasets demonstrate the superiority of the proposed approach, indicating the effectiveness of body structure and orientation information for improving re-identification performance.Comment: 13 pages, 15 figures, 3 tables, submitted to IEEE Transactions on Circuits and Systems for Video Technolog

arXiv.org e-Print Archive

CGGAN: A Context Guided Generative Adversarial Network For Single Image Dehazing

Author: Feng Yaning
Guo Mingtao
Shi Zhenghao
Zhao Minghua
Zhou Zhaorun
Publication venue
Publication date: 28/05/2020
Field of study

Image haze removal is highly desired for the application of computer vision. This paper proposes a novel Context Guided Generative Adversarial Network (CGGAN) for single image dehazing. Of which, an novel new encoder-decoder is employed as the generator. And it consists of a feature-extraction-net, a context-extractionnet, and a fusion-net in sequence. The feature extraction-net acts as a encoder, and is used for extracting haze features. The context-extraction net is a multi-scale parallel pyramid decoder, and is used for extracting the deep features of the encoder and generating coarse dehazing image. The fusion-net is a decoder, and is used for obtaining the final haze-free image. To obtain more better results, multi-scale information obtained during the decoding process of the context extraction decoder is used for guiding the fusion decoder. By introducing an extra coarse decoder to the original encoder-decoder, the CGGAN can make better use of the deep feature information extracted by the encoder. To ensure our CGGAN work effectively for different haze scenarios, different loss functions are employed for the two decoders. Experiments results show the advantage and the effectiveness of our proposed CGGAN, evidential improvements over existing state-of-the-art methods are obtained.Comment: 12 pages, 7 figures, 3 table

arXiv.org e-Print Archive

A novel hybrid score level and decision level fusion scheme for cancelable multi-biometric verification

Author: Dey Somnath
Dwivedi Rudresh
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 26/05/2018
Field of study

In spite of the benefits of biometric-based authentication systems, there are few concerns raised because of the sensitivity of biometric data to outliers, low performance caused due to intra-class variations and privacy invasion caused by information leakage. To address these issues, we propose a hybrid fusion framework where only the protected modalities are combined to fulfill the requirement of secrecy and performance improvement. This paper presents a method to integrate cancelable modalities utilizing mean-closure weighting (MCW) score level and Dempster-Shafer (DS) theory based decision level fusion for iris and fingerprint to mitigate the limitations in the individual score or decision fusion mechanisms. The proposed hybrid fusion scheme incorporates the similarity scores from different matchers corresponding to each protected modality. The individual scores obtained from different matchers for each modality are combined using MCW score fusion method. The MCW technique achieves the optimal weight for each matcher involved in the score computation. Further, DS theory is applied to the induced scores to output the final decision. The rigorous experimental evaluations on three virtual databases indicate that the proposed hybrid fusion framework outperforms over the component level or individual fusion methods (score level and decision level fusion). As a result, we achieve (48%,66%), (72%,86%) and (49%,38%) of performance improvement over unimodal cancelable iris and unimodal cancelable fingerprint verification systems for Virtual_A, Virtual_B and Virtual_C databases, respectively. Also, the proposed method is robust enough to the variability of scores and outliers satisfying the requirement of secure authentication

arXiv.org e-Print Archive

Hybrid Distortion Aggregated Visual Comfort Assessment for Stereoscopic Image Retargeting

Author: Chen Zhibo
Li Weiping
Zhou Ya
Publication venue
Publication date: 30/11/2018
Field of study

Visual comfort is a quite important factor in 3D media service. Few research efforts have been carried out in this area especially in case of 3D content retargeting which may introduce more complicated visual distortions. In this paper, we propose a Hybrid Distortion Aggregated Visual Comfort Assessment (HDA-VCA) scheme for stereoscopic retargeted images (SRI), considering aggregation of hybrid distortions including structure distortion, information loss, binocular incongruity and semantic distortion. Specifically, a Local-SSIM feature is proposed to reflect the local structural distortion of SRI, and information loss is represented by Dual Natural Scene Statistics (D-NSS) feature extracted from the binocular summation and difference channels. Regarding binocular incongruity, visual comfort zone, window violation, binocular rivalry, and accommodation-vergence conflict of human visual system (HVS) are evaluated. Finally, the semantic distortion is represented by the correlation distance of paired feature maps extracted from original stereoscopic image and its retargeted image by using trained deep neural network. We validate the effectiveness of HDA-VCA on published Stereoscopic Image Retargeting Database (SIRD) and two stereoscopic image databases IEEE-SA and NBU 3D-VCA. The results demonstrate HDA-VCA's superior performance in handling hybrid distortions compared to state-of-the-art VCA schemes.Comment: 13 pages, 11 figures, 4 table

arXiv.org e-Print Archive

Modality-specific Cross-modal Similarity Measurement with Recurrent Attention Network

Author: Peng Yuxin
Qi Jinwei
Yuan Yuxin
Publication venue
Publication date: 16/08/2017
Field of study

Nowadays, cross-modal retrieval plays an indispensable role to flexibly find information across different modalities of data. Effectively measuring the similarity between different modalities of data is the key of cross-modal retrieval. Different modalities such as image and text have imbalanced and complementary relationships, which contain unequal amount of information when describing the same semantics. For example, images often contain more details that cannot be demonstrated by textual descriptions and vice versa. Existing works based on Deep Neural Network (DNN) mostly construct one common space for different modalities to find the latent alignments between them, which lose their exclusive modality-specific characteristics. Different from the existing works, we propose modality-specific cross-modal similarity measurement (MCSM) approach by constructing independent semantic space for each modality, which adopts end-to-end framework to directly generate modality-specific cross-modal similarity without explicit common representation. For each semantic space, modality-specific characteristics within one modality are fully exploited by recurrent attention network, while the data of another modality is projected into this space with attention based joint embedding to utilize the learned attention weights for guiding the fine-grained cross-modal correlation learning, which can capture the imbalanced and complementary relationships between different modalities. Finally, the complementarity between the semantic spaces for different modalities is explored by adaptive fusion of the modality-specific cross-modal similarities to perform cross-modal retrieval. Experiments on the widely-used Wikipedia and Pascal Sentence datasets as well as our constructed large-scale XMediaNet dataset verify the effectiveness of our proposed approach, outperforming 9 state-of-the-art methods.Comment: 13 pages, submitted to IEEE Transactions on Image Processin

arXiv.org e-Print Archive

Multi-feature Fusion for Image Retrieval Using Constrained Dominant Sets

Author: Alemu Leulseged Tesfaye
Pelillo Marcello
Publication venue
Publication date: 15/08/2018
Field of study

Aggregating different image features for image retrieval has recently shown its effectiveness. While highly effective, though, the question of how to uplift the impact of the best features for a specific query image persists as an open computer vision problem. In this paper, we propose a computationally efficient approach to fuse several hand-crafted and deep features, based on the probabilistic distribution of a given membership score of a constrained cluster in an unsupervised manner. First, we introduce an incremental nearest neighbor (NN) selection method, whereby we dynamically select k-NN to the query. We then build several graphs from the obtained NN sets and employ constrained dominant sets (CDS) on each graph G to assign edge weights which consider the intrinsic manifold structure of the graph, and detect false matches to the query. Finally, we elaborate the computation of feature positive-impact weight (PIW) based on the dispersive degree of the characteristics vector. To this end, we exploit the entropy of a cluster membership-score distribution. In addition, the final NN set bypasses a heuristic voting scheme. Experiments on several retrieval benchmark datasets show that our method can improve the state-of-the-art result

arXiv.org e-Print Archive