1,314 research outputs found

    How is Gaze Influenced by Image Transformations? Dataset and Model

    Full text link
    Data size is the bottleneck for developing deep saliency models, because collecting eye-movement data is very time consuming and expensive. Most of current studies on human attention and saliency modeling have used high quality stereotype stimuli. In real world, however, captured images undergo various types of transformations. Can we use these transformations to augment existing saliency datasets? Here, we first create a novel saliency dataset including fixations of 10 observers over 1900 images degraded by 19 types of transformations. Second, by analyzing eye movements, we find that observers look at different locations over transformed versus original images. Third, we utilize the new data over transformed images, called data augmentation transformation (DAT), to train deep saliency models. We find that label preserving DATs with negligible impact on human gaze boost saliency prediction, whereas some other DATs that severely impact human gaze degrade the performance. These label preserving valid augmentation transformations provide a solution to enlarge existing saliency datasets. Finally, we introduce a novel saliency model based on generative adversarial network (dubbed GazeGAN). A modified UNet is proposed as the generator of the GazeGAN, which combines classic skip connections with a novel center-surround connection (CSC), in order to leverage multi level features. We also propose a histogram loss based on Alternative Chi Square Distance (ACS HistLoss) to refine the saliency map in terms of luminance distribution. Extensive experiments and comparisons over 3 datasets indicate that GazeGAN achieves the best performance in terms of popular saliency evaluation metrics, and is more robust to various perturbations. Our code and data are available at: https://github.com/CZHQuality/Sal-CFS-GAN

    Binary Patterns Encoded Convolutional Neural Networks for Texture Recognition and Remote Sensing Scene Classification

    Full text link
    Designing discriminative powerful texture features robust to realistic imaging conditions is a challenging computer vision problem with many applications, including material recognition and analysis of satellite or aerial imagery. In the past, most texture description approaches were based on dense orderless statistical distribution of local features. However, most recent approaches to texture recognition and remote sensing scene classification are based on Convolutional Neural Networks (CNNs). The d facto practice when learning these CNN models is to use RGB patches as input with training performed on large amounts of labeled data (ImageNet). In this paper, we show that Binary Patterns encoded CNN models, codenamed TEX-Nets, trained using mapped coded images with explicit texture information provide complementary information to the standard RGB deep models. Additionally, two deep architectures, namely early and late fusion, are investigated to combine the texture and color information. To the best of our knowledge, we are the first to investigate Binary Patterns encoded CNNs and different deep network fusion architectures for texture recognition and remote sensing scene classification. We perform comprehensive experiments on four texture recognition datasets and four remote sensing scene classification benchmarks: UC-Merced with 21 scene categories, WHU-RS19 with 19 scene classes, RSSCN7 with 7 categories and the recently introduced large scale aerial image dataset (AID) with 30 aerial scene types. We demonstrate that TEX-Nets provide complementary information to standard RGB deep model of the same network architecture. Our late fusion TEX-Net architecture always improves the overall performance compared to the standard RGB network on both recognition problems. Our final combination outperforms the state-of-the-art without employing fine-tuning or ensemble of RGB network architectures.Comment: To appear in ISPRS Journal of Photogrammetry and Remote Sensin

    Quantitative Multidimensional Stress Assessment from Facial Videos

    Get PDF
    Stress has a significant impact on the physical and mental health of an individual and is a growing concern for society, especially during the COVID-19 pandemic. Facial video-based stress evaluation from non-invasive cameras has proven to be a significantly more efficient method to evaluate stress in comparison to approaches that use questionnaires or wearable sensors. Plenty of classification models have been built for stress detection. However, most do not consider individual differences. Also, the results for such models are limited by a uni-dimensional definition of stress levels lacking a comprehensive quantitative definition of stress. The dissertation focuses on building a framework that utilizes the multilevel video frame representations from deep learning and the remote photoplethysmography signals extracted from the facial videos for stress assessment. The fusion model takes the inputs of a baseline video and a target video of the subject. The physiological features such as heart rate and heart rate variability are used with the initial stress scores generated from deep learning are used to predict the stress scores in cognitive anxiety, somatic anxiety, and self-confidence. To generate stress scores with better accuracy, the signal extraction method is improved by introducing the CWT-SNR method that uses the signal-to-noise ratio to assist the adaptive bandpass filtering in the post-processing of the signals. A study on phase space reconstruction features is performed and the results show the potential for additional accuracy improvement for the heart rate variability detection. To select the best deep learning architecture, multiple deep learning architectures are tested to build the deep learning model. Support Vector Regression is used to generate the output stress score results. Testing with the data from the UBFC-Phys dataset, the fusion model shows a strong correlation between ground truth and the predicted results

    Predicting the amount of coke deposition on catalyst through image analysis and soft computing

    Get PDF
    The amount of coke deposition on catalyst pellets is one of the most important indexes of catalytic property and service life. As a result, it is essential to measure this and analyze the active state of the catalysts during a continuous production process. This paper proposes a new method to predict the amount of coke deposition on catalyst pellets based on image analysis and soft computing. An image acquisition system consisting of a flatbed scanner and an opaque cover is used to obtain catalyst images. After imaging processing and feature extraction, twelve effective features are selected and two best feature sets are determined by the prediction tests. A neural network optimized by a particle swarm optimization algorithm is used to establish the prediction model of the coke amount based on various datasets. The root mean square error of the prediction values are all below 0.021 and the coefficient of determination R 2, for the model, are all above 78.71%. Therefore, a feasible, effective and precise method is demonstrated, which may be applied to realize the real-time measurement of coke deposition based on on-line sampling and fast image analysis
    • …
    corecore