Search CORE

1,163 research outputs found

FSS-1000: A 1000-Class Dataset for Few-Shot Segmentation

Author: Chen Yau Pun
Li Xiang
Tai Yu-Wing
Tang Chi-Keung
Wei Tianhan
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 29/04/2020
Field of study

Over the past few years, we have witnessed the success of deep learning in image recognition thanks to the availability of large-scale human-annotated datasets such as PASCAL VOC, ImageNet, and COCO. Although these datasets have covered a wide range of object categories, there are still a significant number of objects that are not included. Can we perform the same task without a lot of human annotations? In this paper, we are interested in few-shot object segmentation where the number of annotated training examples are limited to 5 only. To evaluate and validate the performance of our approach, we have built a few-shot segmentation dataset, FSS-1000, which consists of 1000 object classes with pixelwise annotation of ground-truth segmentation. Unique in FSS-1000, our dataset contains significant number of objects that have never been seen or annotated in previous datasets, such as tiny daily objects, merchandise, cartoon characters, logos, etc. We build our baseline model using standard backbone networks such as VGG-16, ResNet-101, and Inception. To our surprise, we found that training our model from scratch using FSS-1000 achieves comparable and even better results than training with weights pre-trained by ImageNet which is more than 100 times larger than FSS-1000. Both our approach and dataset are simple, effective, and easily extensible to learn segmentation of new object classes given very few annotated training examples. Dataset is available at https://github.com/HKUSTCV/FSS-1000

arXiv.org e-Print Archive

Crossref

Recommended from our members

Inspection and evaluation of artifacts in digital video sources

Author: Goodall Todd Richard
Publication venue
Publication date: 27/08/2018
Field of study

Streaming digital video content providers such as YouTube, Amazon, Hulu, and Netflix collaborate with production teams to obtain new and old video content. These collaborations lead to an accumulation of video sources, some of which might contain unacceptable visual artifacts. Artifacts may inadvertently enter the video master at any point in the production pipeline, due to any of a number of equipment and user failures. Unfortunately, these artifacts are difficult to detect since no pristine reference exists for comparison. As of now, few automated tools exist that can effectively capture the most common forms of these artifacts. This work studies no-reference video source inspection for generalized artifact detection and subjective quality prediction, which will ultimate inform decisions related to acquisition of new content. Automatically identifying the locations and severities of video artifacts is a difficult problem. We have developed a general method for detecting local artifacts by learning differences in the statistics between distorted and pristine video frames. Our model, which we call the Video Impairment Mapper (VID-MAP), produces a full resolution map of artifact detection probabilities based on comparisons of excitatory and inhibatory convolutional responses. Validation on a large database shows that our method outperforms the previous state-of-the-art of even distortion-specific detectors. A variety of powerful picture quality predictors are available that rely on neuro-statistical models of distortion perception. We extend these principles to video source inspection, by coupling spatial divisive normalization with a series of filterbanks tuned for artifact detection, implemented using a common convolutional framework. We developed the Video Impairment Detection by SParse Error CapTure (VIDSPECT) model, which leverages discriminative sparse dictionaries that are tuned to detect specific artifacts. VIDSPECT is simple, highly generalizable, and yields better accuracy than competing methods. To evaluate the perceived quality of video sources containing artifacts, we built a new digital video database, called the LIVE Video Masters Database, which contains 384 videos affected by the types of artifacts encountered in otherwise pristine digital video sources. We find that VIDSPECT delivers top performance on this database for most artifacts tested, and competitive performance otherwise, using the same basic architecture in all cases.Electrical and Computer Engineerin

Texas ScholarWorks

COMPENSATION THROUGH PREDICTION FOR ATMOSPHERIC TURBULENCE EFFECTS ON TARGET IMAGING AND HIGH ENERGY LASER BEAM

Author: Zhang Jun H.
Publication venue: Monterey, CA; Naval Postgraduate School
Publication date: 01/06/2021
Field of study

Atmospheric turbulence significantly degrades the performance of High Energy Laser (HEL) beams. The three key undesirable effects are: (1) degraded target images used for target tracking; (2) inaccurate HEL pointing; and (3) reduction in HEL power during propagation to the target. The current approach for compensating for these turbulence effects uses adaptive optics to measure atmospheric turbulence and compensate the aberration in the optical beam. However, an adaptive optics system has limited performance in strong turbulence and an optical system makes the HEL system more complex. With improvements in Deep Learning algorithms and further development in Artificial Intelligence, we used Deep Learning and Convolutional Neural Networks to predict the atmospheric turbulence and compensate for its negative effects on laser beams. The predicted turbulence can be used for image correction and HEL beam correction using a deformable mirror to reduce turbulence effects during propagation.Military Expert 5, Republic of Singapore NavyApproved for public release. Distribution is unlimited

Calhoun, Institutional Archive of the Naval Postgraduate School

Let's Enhance: A Deep Learning Approach to Extreme Deblurring of Text Images

Author: Genzel Martin
Macdonald Jan
März Maximilian
Trippe Theophil
Publication venue: 'American Institute of Mathematical Sciences (AIMS)'
Publication date: 23/04/2023
Field of study

This work presents a novel deep-learning-based pipeline for the inverse problem of image deblurring, leveraging augmentation and pre-training with synthetic data. Our results build on our winning submission to the recent Helsinki Deblur Challenge 2021, whose goal was to explore the limits of state-of-the-art deblurring algorithms in a real-world data setting. The task of the challenge was to deblur out-of-focus images of random text, thereby in a downstream task, maximizing an optical-character-recognition-based score function. A key step of our solution is the data-driven estimation of the physical forward model describing the blur process. This enables a stream of synthetic data, generating pairs of ground-truth and blurry images on-the-fly, which is used for an extensive augmentation of the small amount of challenge data provided. The actual deblurring pipeline consists of an approximate inversion of the radial lens distortion (determined by the estimated forward model) and a U-Net architecture, which is trained end-to-end. Our algorithm was the only one passing the hardest challenge level, achieving over

70\%

character recognition accuracy. Our findings are well in line with the paradigm of data-centric machine learning, and we demonstrate its effectiveness in the context of inverse problems. Apart from a detailed presentation of our methodology, we also analyze the importance of several design choices in a series of ablation studies. The code of our challenge submission is available under https://github.com/theophil-trippe/HDC_TUBerlin_version_1.Comment: This article has been published in a revised form in Inverse Problems and Imagin

arXiv.org e-Print Archive

Machine vision processing of deep space images

Author: João Francisco Sengo Guimarães
Publication venue
Publication date: 19/07/2023
Field of study

Repositório Aberto da Universidade do Porto

Learning to Interpret Fluid Type Phenomena via Images

Author: Thapa Simron
Publication venue: LSU Digital Commons
Publication date: 18/08/2021
Field of study

Learning to interpret fluid-type phenomena via images is a long-standing challenging problem in computer vision. The problem becomes even more challenging when the fluid medium is highly dynamic and refractive due to its transparent nature. Here, we consider imaging through such refractive fluid media like water and air. For water, we design novel supervised learning-based algorithms to recover its 3D surface as well as the highly distorted underground patterns. For air, we design a state-of-the-art unsupervised learning algorithm to predict the distortion-free image given a short sequence of turbulent images. Specifically, we design a deep neural network that estimates the depth and normal maps of a fluid surface by analyzing the refractive distortion of a reference background pattern. Regarding the recovery of severely downgraded underwater images due to the refractive distortions caused by water surface fluctuations, we present the distortion-guided network (DG-Net) for restoring distortion-free underwater images. The key idea is to use a distortion map to guide network training. The distortion map models the pixel displacement caused by water refraction. Furthermore, we present a novel unsupervised network to recover the latent distortion-free image. The key idea is to model non-rigid distortions as deformable grids. Our network consists of a grid deformer that estimates the distortion field and an image generator that outputs the distortion-free image. By leveraging the positional encoding operator, we can simplify the network structure while maintaining fine spatial details in the recovered images. We also develop a combinational deep neural network that can simultaneously perform recovery of the latent distortion-free image as well as 3D reconstruction of the transparent and dynamic fluid surface. Through extensive experiments on simulated and real captured fluid images, we demonstrate that our proposed deep neural networks outperform the current state-of-the-art on solving specific tasks

Louisiana State University