9 research outputs found

    Quasi Real-Time Apple Defect Segmentation Using Deep Learning

    Get PDF
    Defect segmentation of apples is an important task in the agriculture industry for quality control and food safety. In this paper, we propose a deep learning approach for the automated segmentation of apple defects using convolutional neural networks (CNNs) based on a U-shaped architecture with skip-connections only within the noise reduction block. An ad-hoc data synthesis technique has been designed to increase the number of samples and at the same time to reduce neural network overfitting. We evaluate our model on a dataset of multi-spectral apple images with pixel-wise annotations for several types of defects. In this paper, we show that our proposal outperforms in terms of segmentation accuracy general-purpose deep learning architectures commonly used for segmentation tasks. From the application point of view, we improve the previous methods for apple defect segmentation. A measure of the computational cost shows that our proposal can be employed in real-time (about 100 frame-per-second on GPU) and in quasi-real-time (about 7/8 frame-per-second on CPU) visual-based apple inspection. To further improve the applicability of the method, we investigate the potential of using only RGB images instead of multi-spectral images as input images. The results prove that the accuracy in this case is almost comparable with the multi-spectral case

    Semi-supervised cross-lingual speech emotion recognition

    Get PDF
    Speech emotion recognition (SER) on a single language has achieved remarkable results through deep learning approaches over the last decade. However, cross-lingual SER remains a challenge in real-world applications due to (i) a large difference between the source and target domain distributions, (ii) the availability of few labeled and many unlabeled utterances for the new language. Taking into account previous aspects, we propose a Semi-Supervised Learning (SSL) method for cross-lingual emotion recognition when a few labels from the new language are available. Based on a Convolutional Neural Network (CNN), our method adapts to a new language by exploiting a pseudo-labeling strategy for the unlabeled utterances. In particular, the use of a hard and soft pseudo-labels approach is investigated. We thoroughly evaluate the performance of the method in a speaker-independent setup on both the source and the new language and show its robustness across five languages belonging to different linguistic strains

    NTIRE 2023 Quality Assessment of Video Enhancement Challenge

    Get PDF
    This paper reports on the NTIRE 2023 Quality Assessment of Video Enhancement Challenge, which will be held in conjunction with the New Trends in Image Restoration and Enhancement Workshop (NTIRE) at CVPR 2023. This challenge is to address a major challenge in the field of video processing, namely, video quality assessment (VQA) for enhanced videos. The challenge uses the VQA Dataset for Perceptual Video Enhancement (VDPVE), which has a total of 1211 enhanced videos, including 600 videos with color, brightness, and contrast enhancements, 310 videos with deblurring, and 301 deshaked videos. The challenge has a total of 167 registered participants. 61 participating teams submitted their prediction results during the development phase, with a total of 3168 submissions. A total of 176 submissions were submitted by 37 participating teams during the final testing phase. Finally, 19 participating teams submitted their models and fact sheets, and detailed the methods they used. Some methods have achieved better results than baseline methods, and the winning methods have demonstrated superior prediction performance

    On the Semantic Dependency of Video Quality Assessment Methods

    No full text

    No-Reference Quality Assessment of In-Capture Distorted Videos

    Get PDF
    We introduce a no-reference method for the assessment of the quality of videos affected by in-capture distortions due to camera hardware and processing software. The proposed method encodes both quality attributes and semantic content of each video frame by using two Convolutional Neural Networks (CNNs) and then estimates the quality score of the whole video by using a Recurrent Neural Network (RNN), which models the temporal information. The extensive experiments conducted on four benchmark databases (CVD2014, KoNViD-1k, LIVE-Qualcomm, and LIVE-VQC) containing in-capture distortions demonstrate the effectiveness of the proposed method and its ability to generalize in cross-database setup

    An Efficient Method for No-Reference Video Quality Assessment

    Get PDF
    Methods for No-Reference Video Quality Assessment (NR-VQA) of consumer-produced video content are largely investigated due to the spread of databases containing videos affected by natural distortions. In this work, we design an effective and efficient method for NR-VQA. The proposed method exploits a novel sampling module capable of selecting a predetermined number of frames from the whole video sequence on which to base the quality assessment. It encodes both the quality attributes and semantic content of video frames using two lightweight Convolutional Neural Networks (CNNs). Then, it estimates the quality score of the entire video using a Support Vector Regressor (SVR). We compare the proposed method against several relevant state-of-the-art methods using four benchmark databases containing user generated videos (CVD2014, KoNViD-1k, LIVE-Qualcomm, and LIVE-VQC). The results show that the proposed method at a substantially lower computational cost predicts subjective video quality in line with the state of the art methods on individual databases and generalizes better than existing methods in cross-database setup

    Quality assessment of enhanced videos guided by aesthetics and technical quality attributes

    No full text
    In this work we propose a novel method to evaluate the quality of enhanced videos. Perceived quality of a video depends on both technical aspects, such as the presence of distortions like noise and blur, and non-technical factors, such as content preference and recommendation. Our approach involves the use of three deep learning based models that encode video sequences in terms of their overall technical quality, quality-related attributes, and aesthetic quality. The resulting feature vectors are adaptively combined and used as input to a Support Vector Regressor to estimate the video quality score. Quantitative results on the recently released VQA Dataset for Perceptual Video Enhancement (VDPVE) introduced for the NTIRE 2023 Quality Assessment of Video Enhancement Challenge demonstrates the effectiveness of the proposed method

    NTIRE 2022 Spectral Recovery Challenge and Data Set

    Get PDF
    This paper reviews the third biennial challenge on spectral reconstruction from RGB images, i.e., the recovery of whole-scene hyperspectral (HS) information from a 3-channel RGB image. This challenge presents the "ARAD_1K" data set: a new, larger-than-ever natural hyperspectral image data set containing 1,000 images. Challenge participants were required to recover hyper-spectral information from synthetically generated JPEG-compressed RGB images simulating capture by a known calibrated camera, operating under partially known parameters, in a setting which includes acquisition noise. The challenge was attended by 241 teams, with 60 teams com-peting in the final testing phase, 12 of which provided de-tailed descriptions of their methodology which are included in this report. The performance of these submissions is re-viewed and provided here as a gauge for the current state-of-the-art in spectral reconstruction from natural RGB images
    corecore