48 research outputs found

    Visual Recognition with Deep Nearest Centroids

    Full text link
    We devise deep nearest centroids (DNC), a conceptually elegant yet surprisingly effective network for large-scale visual recognition, by revisiting Nearest Centroids, one of the most classic and simple classifiers. Current deep models learn the classifier in a fully parametric manner, ignoring the latent data structure and lacking simplicity and explainability. DNC instead conducts nonparametric, case-based reasoning; it utilizes sub-centroids of training samples to describe class distributions and clearly explains the classification as the proximity of test data and the class sub-centroids in the feature space. Due to the distance-based nature, the network output dimensionality is flexible, and all the learnable parameters are only for data embedding. That means all the knowledge learnt for ImageNet classification can be completely transferred for pixel recognition learning, under the "pre-training and fine-tuning" paradigm. Apart from its nested simplicity and intuitive decision-making mechanism, DNC can even possess ad-hoc explainability when the sub-centroids are selected as actual training images that humans can view and inspect. Compared with parametric counterparts, DNC performs better on image classification (CIFAR-10, ImageNet) and greatly boots pixel recognition (ADE20K, Cityscapes), with improved transparency and fewer learnable parameters, using various network architectures (ResNet, Swin) and segmentation models (FCN, DeepLabV3, Swin). We feel this work brings fundamental insights into related fields.Comment: 23 pages, 8 figure

    Unsupervised Superpixel Generation using Edge-Sparse Embedding

    Full text link
    Partitioning an image into superpixels based on the similarity of pixels with respect to features such as colour or spatial location can significantly reduce data complexity and improve subsequent image processing tasks. Initial algorithms for unsupervised superpixel generation solely relied on local cues without prioritizing significant edges over arbitrary ones. On the other hand, more recent methods based on unsupervised deep learning either fail to properly address the trade-off between superpixel edge adherence and compactness or lack control over the generated number of superpixels. By using random images with strong spatial correlation as input, \ie, blurred noise images, in a non-convolutional image decoder we can reduce the expected number of contrasts and enforce smooth, connected edges in the reconstructed image. We generate edge-sparse pixel embeddings by encoding additional spatial information into the piece-wise smooth activation maps from the decoder's last hidden layer and use a standard clustering algorithm to extract high quality superpixels. Our proposed method reaches state-of-the-art performance on the BSDS500, PASCAL-Context and a microscopy dataset

    CORE: Cooperative Reconstruction for Multi-Agent Perception

    Full text link
    This paper presents CORE, a conceptually simple, effective and communication-efficient model for multi-agent cooperative perception. It addresses the task from a novel perspective of cooperative reconstruction, based on two key insights: 1) cooperating agents together provide a more holistic observation of the environment, and 2) the holistic observation can serve as valuable supervision to explicitly guide the model learning how to reconstruct the ideal observation based on collaboration. CORE instantiates the idea with three major components: a compressor for each agent to create more compact feature representation for efficient broadcasting, a lightweight attentive collaboration component for cross-agent message aggregation, and a reconstruction module to reconstruct the observation based on aggregated feature representations. This learning-to-reconstruct idea is task-agnostic, and offers clear and reasonable supervision to inspire more effective collaboration, eventually promoting perception tasks. We validate CORE on OPV2V, a large-scale multi-agent percetion dataset, in two tasks, i.e., 3D object detection and semantic segmentation. Results demonstrate that the model achieves state-of-the-art performance on both tasks, and is more communication-efficient.Comment: Accepted to ICCV 2023; Code: https://github.com/zllxot/COR

    Quality-Aware Memory Network for Interactive Volumetric Image Segmentation

    Full text link
    Despite recent progress of automatic medical image segmentation techniques, fully automatic results usually fail to meet the clinical use and typically require further refinement. In this work, we propose a quality-aware memory network for interactive segmentation of 3D medical images. Provided by user guidance on an arbitrary slice, an interaction network is firstly employed to obtain an initial 2D segmentation. The quality-aware memory network subsequently propagates the initial segmentation estimation bidirectionally over the entire volume. Subsequent refinement based on additional user guidance on other slices can be incorporated in the same manner. To further facilitate interactive segmentation, a quality assessment module is introduced to suggest the next slice to segment based on the current segmentation quality of each slice. The proposed network has two appealing characteristics: 1) The memory-augmented network offers the ability to quickly encode past segmentation information, which will be retrieved for the segmentation of other slices; 2) The quality assessment module enables the model to directly estimate the qualities of segmentation predictions, which allows an active learning paradigm where users preferentially label the lowest-quality slice for multi-round refinement. The proposed network leads to a robust interactive segmentation engine, which can generalize well to various types of user annotations (e.g., scribbles, boxes). Experimental results on various medical datasets demonstrate the superiority of our approach in comparison with existing techniques.Comment: MICCAI 2021. Code: https://github.com/0liliulei/Mem3

    Multi-nighttime-light data comparison analysis based on image quality values and lit fishing vessel identification effect

    Get PDF
    Fisheries provide high-quality protein for many people, and their sustainable use is of global concern. Light trapping is a widely used fishing method that takes advantage of the phototropism of fish. Remote sensing technology allows for the monitoring of lit fishing vessels at sea from the air at night, which supports the sustainable management of fisheries. To investigate the potential of different nighttime light remote sensing data for lit fishing vessel identification and applications, we used the fuzzy evaluation method to quantitatively assess images in terms of their radiometric and geometric quality, and Otsu’s method to compare the effects of lit fishing vessel identification. Three kinds of nighttime lighting data from the Defense Meteorological Satellite Program/Operational Linescan System (DMSP/OLS), Visible infrared imaging radiometer suite day/night band (VIIRS/DNB), and Luojia1-01(LJ1-01) were analyzed, compared, and application pointers were constructed. The results are as follows. ①In the image radiation quality evaluation, the information entropy, clarity, and noise performance of the LJ1-01 image are higher than those of the DMSP/OLS and VIIRS/DNB images, where the information entropy value of the LJ1-01 image is nearly 10 times that of VIIRS/DNB and 23 times that of DMSP/OLS. The average gradient value is 14 times that of the image from VIIRS/DNB and 1,600 times that of DMSP/OLS, while its noise is only about 2/3 of the VIIRS/DNB image and 1/3 of the DMSP/OLS image. In the geometric quality assessment, the geometric positioning accuracy and ground sampling accuracy of the VIIRS/DNB image is the best among the three images, with a relative difference percentage of 100.1%, and the LJ1-01 and DMSP/OLS images are relatively lower, at 96.9% and 92.3%, respectively. ② The detection of squid fishing vessels in the Northwest Pacific is taken as an example to compare the identification effects of three types of data: DMSP/OLS, VIIRS/DNB, and LJ1-01. Among these data, DMSP/OLS can effectively identify the position of the lit fishing boat, and VIIRS/DNB images can accurately estimate the spatial position and number of lit fishing boats with large distances. However, in the case of fishing boats gathering or clustering, the number of fishing vessels could not be identified. This led to the detected number of lit fishing vessels being less than the real value. For the VIIRS/DNB and LJ1-01 images with a 5′×8′ span in the same spatiotemporal range using the same batch of pelagic squid fishing vessels, LJ1-01 extracted 18 fishing vessels. VIIRS/DNB extracted 15, indicating that LJ1-01 can distinguish multiple fishing vessels in the lighted overlapping area, thus accurately identifying the number of fishing vessels. The application pointing table generated based on the results of the three data analyses can provide a reference for sensor/image selection for nighttime light remote sensing fishery applications and a basis for more refined fishing vessel identification, extraction, and monitoring
    corecore