332 research outputs found
DiffuseExpand: Expanding dataset for 2D medical image segmentation using diffusion models
Dataset expansion can effectively alleviate the problem of data scarcity for
medical image segmentation, due to privacy concerns and labeling difficulties.
However, existing expansion algorithms still face great challenges due to their
inability of guaranteeing the diversity of synthesized images with paired
segmentation masks. In recent years, Diffusion Probabilistic Models (DPMs) have
shown powerful image synthesis performance, even better than Generative
Adversarial Networks. Based on this insight, we propose an approach called
DiffuseExpand for expanding datasets for 2D medical image segmentation using
DPM, which first samples a variety of masks from Gaussian noise to ensure the
diversity, and then synthesizes images to ensure the alignment of images and
masks. After that, DiffuseExpand chooses high-quality samples to further
enhance the effectiveness of data expansion. Our comparison and ablation
experiments on COVID-19 and CGMH Pelvis datasets demonstrate the effectiveness
of DiffuseExpand. Our code is released at
https://anonymous.4open.science/r/DiffuseExpand.Comment: 10 pages, 5 figure
Advancing Vision Transformers with Group-Mix Attention
Vision Transformers (ViTs) have been shown to enhance visual recognition
through modeling long-range dependencies with multi-head self-attention (MHSA),
which is typically formulated as Query-Key-Value computation. However, the
attention map generated from the Query and Key captures only token-to-token
correlations at one single granularity. In this paper, we argue that
self-attention should have a more comprehensive mechanism to capture
correlations among tokens and groups (i.e., multiple adjacent tokens) for
higher representational capacity. Thereby, we propose Group-Mix Attention (GMA)
as an advanced replacement for traditional self-attention, which can
simultaneously capture token-to-token, token-to-group, and group-to-group
correlations with various group sizes. To this end, GMA splits the Query, Key,
and Value into segments uniformly and performs different group aggregations to
generate group proxies. The attention map is computed based on the mixtures of
tokens and group proxies and used to re-combine the tokens and groups in Value.
Based on GMA, we introduce a powerful backbone, namely GroupMixFormer, which
achieves state-of-the-art performance in image classification, object
detection, and semantic segmentation with fewer parameters than existing
models. For instance, GroupMixFormer-L (with 70.3M parameters and 384^2 input)
attains 86.2% Top-1 accuracy on ImageNet-1K without external data, while
GroupMixFormer-B (with 45.8M parameters) attains 51.2% mIoU on ADE20K
Beyond Object Recognition: A New Benchmark towards Object Concept Learning
Understanding objects is a central building block of artificial intelligence,
especially for embodied AI. Even though object recognition excels with deep
learning, current machines still struggle to learn higher-level knowledge,
e.g., what attributes an object has, and what can we do with an object. In this
work, we propose a challenging Object Concept Learning (OCL) task to push the
envelope of object understanding. It requires machines to reason out object
affordances and simultaneously give the reason: what attributes make an object
possesses these affordances. To support OCL, we build a densely annotated
knowledge base including extensive labels for three levels of object concept
(category, attribute, affordance), and the causal relations of three levels. By
analyzing the causal structure of OCL, we present a baseline, Object Concept
Reasoning Network (OCRN). It leverages causal intervention and concept
instantiation to infer the three levels following their causal relations. In
experiments, OCRN effectively infers the object knowledge while following the
causalities well. Our data and code are available at https://mvig-rhos.com/ocl.Comment: ICCV 2023. Webpage: https://mvig-rhos.com/oc
RISE-based adaptive control of electro-hydraulic servo system with uncertain compensation
Electro-hydraulic servo system (EHSS) plays an important role in many industrial and military applications. However, its high-performance tracking control is still a challenging mission due to its nonlinear system dynamics and model uncertainties. In this paper, a novel adaptive robust integral method of the sign of the error (ARISE) with extended state observer (ESO) is proposed. Firstly, the nonlinear mathematical model of typical EHSS with modeling uncurtains and uncertain nonlinear is established. Then, ESO is used to estimate the state and lumped disturbance, of which the unknown parameter estimations can be updated by the novel adaptive law. Results shows that the novel controller achieves better tracking performance in maximum tracking error, average tracking error and standard deviation of the tracking error
- …