Search CORE

332 research outputs found

DiffuseExpand: Expanding dataset for 2D medical image segmentation using diffusion models

Author: Huang Zhen
Qiu Ziming
Shao Shitong
Wang Shuai
Yuan Xiaohan
Zhou Kevin
Publication venue
Publication date: 26/04/2023
Field of study

Dataset expansion can effectively alleviate the problem of data scarcity for medical image segmentation, due to privacy concerns and labeling difficulties. However, existing expansion algorithms still face great challenges due to their inability of guaranteeing the diversity of synthesized images with paired segmentation masks. In recent years, Diffusion Probabilistic Models (DPMs) have shown powerful image synthesis performance, even better than Generative Adversarial Networks. Based on this insight, we propose an approach called DiffuseExpand for expanding datasets for 2D medical image segmentation using DPM, which first samples a variety of masks from Gaussian noise to ensure the diversity, and then synthesizes images to ensure the alignment of images and masks. After that, DiffuseExpand chooses high-quality samples to further enhance the effectiveness of data expansion. Our comparison and ablation experiments on COVID-19 and CGMH Pelvis datasets demonstrate the effectiveness of DiffuseExpand. Our code is released at https://anonymous.4open.science/r/DiffuseExpand.Comment: 10 pages, 5 figure

arXiv.org e-Print Archive

Advancing Vision Transformers with Group-Mix Attention

Author: Ding Xiaohan
Ge Chongjian
Luo Ping
Song Yibing
Tong Zhan
Wang Jiangliu
Yuan Li
Publication venue
Publication date: 25/11/2023
Field of study

Vision Transformers (ViTs) have been shown to enhance visual recognition through modeling long-range dependencies with multi-head self-attention (MHSA), which is typically formulated as Query-Key-Value computation. However, the attention map generated from the Query and Key captures only token-to-token correlations at one single granularity. In this paper, we argue that self-attention should have a more comprehensive mechanism to capture correlations among tokens and groups (i.e., multiple adjacent tokens) for higher representational capacity. Thereby, we propose Group-Mix Attention (GMA) as an advanced replacement for traditional self-attention, which can simultaneously capture token-to-token, token-to-group, and group-to-group correlations with various group sizes. To this end, GMA splits the Query, Key, and Value into segments uniformly and performs different group aggregations to generate group proxies. The attention map is computed based on the mixtures of tokens and group proxies and used to re-combine the tokens and groups in Value. Based on GMA, we introduce a powerful backbone, namely GroupMixFormer, which achieves state-of-the-art performance in image classification, object detection, and semantic segmentation with fewer parameters than existing models. For instance, GroupMixFormer-L (with 70.3M parameters and 384^2 input) attains 86.2% Top-1 accuracy on ImageNet-1K without external data, while GroupMixFormer-B (with 45.8M parameters) attains 51.2% mIoU on ADE20K

arXiv.org e-Print Archive

Beyond Object Recognition: A New Benchmark towards Object Concept Learning

Author: Li Yong-Lu
Liu Siqi
Lu Cewu
Mao Xiaohan
Xu Xinyu
Xu Yue
Yao Yuan
Publication venue
Publication date: 20/08/2023
Field of study

Understanding objects is a central building block of artificial intelligence, especially for embodied AI. Even though object recognition excels with deep learning, current machines still struggle to learn higher-level knowledge, e.g., what attributes an object has, and what can we do with an object. In this work, we propose a challenging Object Concept Learning (OCL) task to push the envelope of object understanding. It requires machines to reason out object affordances and simultaneously give the reason: what attributes make an object possesses these affordances. To support OCL, we build a densely annotated knowledge base including extensive labels for three levels of object concept (category, attribute, affordance), and the causal relations of three levels. By analyzing the causal structure of OCL, we present a baseline, Object Concept Reasoning Network (OCRN). It leverages causal intervention and concept instantiation to infer the three levels following their causal relations. In experiments, OCRN effectively infers the object knowledge while following the causalities well. Our data and code are available at https://mvig-rhos.com/ocl.Comment: ICCV 2023. Webpage: https://mvig-rhos.com/oc

arXiv.org e-Print Archive

RISE-based adaptive control of electro-hydraulic servo system with uncertain compensation

Author: Jie Hang
Xiaohan Yang
Yinghao Cui
Zhanhang Yuan
Publication venue: 'American Institute of Mathematical Sciences (AIMS)'
Publication date: 01/03/2023
Field of study

Electro-hydraulic servo system (EHSS) plays an important role in many industrial and military applications. However, its high-performance tracking control is still a challenging mission due to its nonlinear system dynamics and model uncertainties. In this paper, a novel adaptive robust integral method of the sign of the error (ARISE) with extended state observer (ESO) is proposed. Firstly, the nonlinear mathematical model of typical EHSS with modeling uncurtains and uncertain nonlinear is established. Then, ESO is used to estimate the state and lumped disturbance, of which the unknown parameter estimations can be updated by the novel adaptive law. Results shows that the novel controller achieves better tracking performance in maximum tracking error, average tracking error and standard deviation of the tracking error

Directory of Open Access Journals