4 research outputs found

    Shape similarity analysis by self-tuning locally constrained mixed-diffusion

    Get PDF
    Similarity analysis is a powerful tool for shape matching/retrieval and other computer vision tasks. In the literature, various shape (dis)similarity measures have been introduced. Different measures specialize on different aspects of the data. In this paper, we consider the problem of improving retrieval accuracy by systematically fusing several different measures. To this end, we propose the locally constrained mixeddiffusion method, which partly fuses the given measures into one and propagates on the resulted locally dense data space. Furthermore, we advocate the use of self-adaptive neighborhoods to automatically determine the appropriate size of the neighborhoods in the diffusion process, with which the retrieval performance is comparable to the best manually tuned kNNs. The superiority of our approach is empirically demonstrated on both shape and image datasets. Our approach achieves a score of 100% in the bull’s eye test on the MPEG-7 shape dataset, which is the best reported result to date.Lei Luo, Chunhua Shen, Chunyuan Zhang and Anton van den Henge

    Visual saliency prediction based on deep learning

    Get PDF
    The Human Visual System (HVS) has the ability to focus on specific parts of a scene, rather than the whole image. Human eye movement is also one of the primary functions used in our daily lives that helps us understand our surroundings. This phenomenon is one of the most active research topics in the computer vision and neuroscience fields. The outcomes that have been achieved by neural network methods in a variety of tasks have highlighted their ability to predict visual saliency. In particular, deep learning models have been used for visual saliency prediction. In this thesis, a deep learning method based on a transfer learning strategy is proposed (Chapter 2), wherein visual features in the convolutional layers are extracted from raw images to predict visual saliency (e.g., saliency map). Specifically, the proposed model uses the VGG-16 network (i.e., Pre-trained CNN model) for semantic segmentation. The proposed model is applied to several datasets, including TORONTO, MIT300, MIT1003, and DUT-OMRON, to illustrate its efficiency. The results of the proposed model are then quantitatively and qualitatively compared to classic and state-of-the-art deep learning models. In Chapter 3, I specifically investigate the performance of five state-of-the-art deep neural networks (VGG-16, ResNet-50, Xception, InceptionResNet-v2, and MobileNet-v2) for the task of visual saliency prediction. Five deep learning models were trained over the SALICON dataset and used to predict visual saliency maps using four standard datasets, namely TORONTO, MIT300, MIT1003, and DUT-OMRON. The results indicate that the ResNet-50 model outperforms the other four and provides a visual saliency map that is very close to human performance. In Chapter 4, a novel deep learning model based on a Fully Convolutional Network (FCN) architecture is proposed. The proposed model is trained in an end-to-end style and designed to predict visual saliency. The model is based on the encoder-decoder structure and includes two types of modules. The first has three stages of inception modules to improve multi-scale derivation and enhance contextual information. The second module includes one stage of the residual module to provide a more accurate recovery of information and to simplify optimization. The entire proposed model is fully trained from scratch to extract distinguishing features and to use a data augmentation technique to create variations in the images. The proposed model is evaluated using several benchmark datasets, including MIT300, MIT1003, TORONTO, and DUT-OMRON. The quantitative and qualitative experiment analyses demonstrate that the proposed model achieves superior performance for predicting visual saliency. In Chapter 5, I study the possibility of using deep learning techniques for Salient Object Detection (SOD) because this work is slightly related to the problem of Visual saliency prediction. Therefore, in this work, the capability of ten well-known pre-trained models for semantic segmentation, including FCNs, VGGs, ResNets, MobileNet-v2, Xception, and InceptionResNet-v2, are investigated. These models have been trained over an ImageNet dataset, fine-tuned on a MSRA-10K dataset, and evaluated using other public datasets, such as ECSSD, MSRA-B, DUTS, and THUR15k. The results illustrate the superiority of ResNet50 and ResNet18, which have Mean Absolute Errors (MAE) of approximately 0.93 and 0.92, respectively, compared to other well-known FCN models. Finally, conclusions are drawn, and possible future works are discussed in chapter 6
    corecore