52 research outputs found
Visual Recognition with Deep Nearest Centroids
We devise deep nearest centroids (DNC), a conceptually elegant yet
surprisingly effective network for large-scale visual recognition, by
revisiting Nearest Centroids, one of the most classic and simple classifiers.
Current deep models learn the classifier in a fully parametric manner, ignoring
the latent data structure and lacking simplicity and explainability. DNC
instead conducts nonparametric, case-based reasoning; it utilizes sub-centroids
of training samples to describe class distributions and clearly explains the
classification as the proximity of test data and the class sub-centroids in the
feature space. Due to the distance-based nature, the network output
dimensionality is flexible, and all the learnable parameters are only for data
embedding. That means all the knowledge learnt for ImageNet classification can
be completely transferred for pixel recognition learning, under the
"pre-training and fine-tuning" paradigm. Apart from its nested simplicity and
intuitive decision-making mechanism, DNC can even possess ad-hoc explainability
when the sub-centroids are selected as actual training images that humans can
view and inspect. Compared with parametric counterparts, DNC performs better on
image classification (CIFAR-10, ImageNet) and greatly boots pixel recognition
(ADE20K, Cityscapes), with improved transparency and fewer learnable
parameters, using various network architectures (ResNet, Swin) and segmentation
models (FCN, DeepLabV3, Swin). We feel this work brings fundamental insights
into related fields.Comment: 23 pages, 8 figure
Unsupervised Superpixel Generation using Edge-Sparse Embedding
Partitioning an image into superpixels based on the similarity of pixels with
respect to features such as colour or spatial location can significantly reduce
data complexity and improve subsequent image processing tasks. Initial
algorithms for unsupervised superpixel generation solely relied on local cues
without prioritizing significant edges over arbitrary ones. On the other hand,
more recent methods based on unsupervised deep learning either fail to properly
address the trade-off between superpixel edge adherence and compactness or lack
control over the generated number of superpixels. By using random images with
strong spatial correlation as input, \ie, blurred noise images, in a
non-convolutional image decoder we can reduce the expected number of contrasts
and enforce smooth, connected edges in the reconstructed image. We generate
edge-sparse pixel embeddings by encoding additional spatial information into
the piece-wise smooth activation maps from the decoder's last hidden layer and
use a standard clustering algorithm to extract high quality superpixels. Our
proposed method reaches state-of-the-art performance on the BSDS500,
PASCAL-Context and a microscopy dataset
CORE: Cooperative Reconstruction for Multi-Agent Perception
This paper presents CORE, a conceptually simple, effective and
communication-efficient model for multi-agent cooperative perception. It
addresses the task from a novel perspective of cooperative reconstruction,
based on two key insights: 1) cooperating agents together provide a more
holistic observation of the environment, and 2) the holistic observation can
serve as valuable supervision to explicitly guide the model learning how to
reconstruct the ideal observation based on collaboration. CORE instantiates the
idea with three major components: a compressor for each agent to create more
compact feature representation for efficient broadcasting, a lightweight
attentive collaboration component for cross-agent message aggregation, and a
reconstruction module to reconstruct the observation based on aggregated
feature representations. This learning-to-reconstruct idea is task-agnostic,
and offers clear and reasonable supervision to inspire more effective
collaboration, eventually promoting perception tasks. We validate CORE on
OPV2V, a large-scale multi-agent percetion dataset, in two tasks, i.e., 3D
object detection and semantic segmentation. Results demonstrate that the model
achieves state-of-the-art performance on both tasks, and is more
communication-efficient.Comment: Accepted to ICCV 2023; Code: https://github.com/zllxot/COR
Quality-Aware Memory Network for Interactive Volumetric Image Segmentation
Despite recent progress of automatic medical image segmentation techniques,
fully automatic results usually fail to meet the clinical use and typically
require further refinement. In this work, we propose a quality-aware memory
network for interactive segmentation of 3D medical images. Provided by user
guidance on an arbitrary slice, an interaction network is firstly employed to
obtain an initial 2D segmentation. The quality-aware memory network
subsequently propagates the initial segmentation estimation bidirectionally
over the entire volume. Subsequent refinement based on additional user guidance
on other slices can be incorporated in the same manner. To further facilitate
interactive segmentation, a quality assessment module is introduced to suggest
the next slice to segment based on the current segmentation quality of each
slice. The proposed network has two appealing characteristics: 1) The
memory-augmented network offers the ability to quickly encode past segmentation
information, which will be retrieved for the segmentation of other slices; 2)
The quality assessment module enables the model to directly estimate the
qualities of segmentation predictions, which allows an active learning paradigm
where users preferentially label the lowest-quality slice for multi-round
refinement. The proposed network leads to a robust interactive segmentation
engine, which can generalize well to various types of user annotations (e.g.,
scribbles, boxes). Experimental results on various medical datasets demonstrate
the superiority of our approach in comparison with existing techniques.Comment: MICCAI 2021. Code: https://github.com/0liliulei/Mem3
Multi-nighttime-light data comparison analysis based on image quality values and lit fishing vessel identification effect
Fisheries provide high-quality protein for many people, and their sustainable use is of global concern. Light trapping is a widely used fishing method that takes advantage of the phototropism of fish. Remote sensing technology allows for the monitoring of lit fishing vessels at sea from the air at night, which supports the sustainable management of fisheries. To investigate the potential of different nighttime light remote sensing data for lit fishing vessel identification and applications, we used the fuzzy evaluation method to quantitatively assess images in terms of their radiometric and geometric quality, and Otsu’s method to compare the effects of lit fishing vessel identification. Three kinds of nighttime lighting data from the Defense Meteorological Satellite Program/Operational Linescan System (DMSP/OLS), Visible infrared imaging radiometer suite day/night band (VIIRS/DNB), and Luojia1-01(LJ1-01) were analyzed, compared, and application pointers were constructed. The results are as follows. ①In the image radiation quality evaluation, the information entropy, clarity, and noise performance of the LJ1-01 image are higher than those of the DMSP/OLS and VIIRS/DNB images, where the information entropy value of the LJ1-01 image is nearly 10 times that of VIIRS/DNB and 23 times that of DMSP/OLS. The average gradient value is 14 times that of the image from VIIRS/DNB and 1,600 times that of DMSP/OLS, while its noise is only about 2/3 of the VIIRS/DNB image and 1/3 of the DMSP/OLS image. In the geometric quality assessment, the geometric positioning accuracy and ground sampling accuracy of the VIIRS/DNB image is the best among the three images, with a relative difference percentage of 100.1%, and the LJ1-01 and DMSP/OLS images are relatively lower, at 96.9% and 92.3%, respectively. ② The detection of squid fishing vessels in the Northwest Pacific is taken as an example to compare the identification effects of three types of data: DMSP/OLS, VIIRS/DNB, and LJ1-01. Among these data, DMSP/OLS can effectively identify the position of the lit fishing boat, and VIIRS/DNB images can accurately estimate the spatial position and number of lit fishing boats with large distances. However, in the case of fishing boats gathering or clustering, the number of fishing vessels could not be identified. This led to the detected number of lit fishing vessels being less than the real value. For the VIIRS/DNB and LJ1-01 images with a 5′×8′ span in the same spatiotemporal range using the same batch of pelagic squid fishing vessels, LJ1-01 extracted 18 fishing vessels. VIIRS/DNB extracted 15, indicating that LJ1-01 can distinguish multiple fishing vessels in the lighted overlapping area, thus accurately identifying the number of fishing vessels. The application pointing table generated based on the results of the three data analyses can provide a reference for sensor/image selection for nighttime light remote sensing fishery applications and a basis for more refined fishing vessel identification, extraction, and monitoring
A Survey on Deep Learning Technique for Video Segmentation
Video segmentation -- partitioning video frames into multiple segments or
objects -- plays a critical role in a broad range of practical applications,
from enhancing visual effects in movie, to understanding scenes in autonomous
driving, to creating virtual background in video conferencing. Recently, with
the renaissance of connectionism in computer vision, there has been an influx
of deep learning based approaches for video segmentation that have delivered
compelling performance. In this survey, we comprehensively review two basic
lines of research -- generic object segmentation (of unknown categories) in
videos, and video semantic segmentation -- by introducing their respective task
settings, background concepts, perceived need, development history, and main
challenges. We also offer a detailed overview of representative literature on
both methods and datasets. We further benchmark the reviewed methods on several
well-known datasets. Finally, we point out open issues in this field, and
suggest opportunities for further research. We also provide a public website to
continuously track developments in this fast advancing field:
https://github.com/tfzhou/VS-Survey.Comment: Accepted by TPAMI. Website: https://github.com/tfzhou/VS-Surve
- …