104 research outputs found
DETRs with Collaborative Hybrid Assignments Training
In this paper, we provide the observation that too few queries assigned as
positive samples in DETR with one-to-one set matching leads to sparse
supervision on the encoder's output which considerably hurt the discriminative
feature learning of the encoder and vice visa for attention learning in the
decoder. To alleviate this, we present a novel collaborative hybrid assignments
training scheme, namely o-DETR, to learn more efficient and
effective DETR-based detectors from versatile label assignment manners. This
new training scheme can easily enhance the encoder's learning ability in
end-to-end detectors by training the multiple parallel auxiliary heads
supervised by one-to-many label assignments such as ATSS and Faster RCNN. In
addition, we conduct extra customized positive queries by extracting the
positive coordinates from these auxiliary heads to improve the training
efficiency of positive samples in the decoder. In inference, these auxiliary
heads are discarded and thus our method introduces no additional parameters and
computational cost to the original detector while requiring no hand-crafted
non-maximum suppression (NMS). We conduct extensive experiments to evaluate the
effectiveness of the proposed approach on DETR variants, including DAB-DETR,
Deformable-DETR, and DINO-Deformable-DETR. The state-of-the-art
DINO-Deformable-DETR with Swin-L can be improved from 58.5% to 59.5% AP on COCO
val. Surprisingly, incorporated with ViT-L backbone, we achieve 66.0% AP on
COCO test-dev and 67.9% AP on LVIS val, outperforming previous methods by clear
margins with much fewer model sizes. Codes are available at
\url{https://github.com/Sense-X/Co-DETR}.Comment: ICCV 2023. Codes are available at https://github.com/Sense-X/Co-DET
Decoupled DETR: Spatially Disentangling Localization and Classification for Improved End-to-End Object Detection
The introduction of DETR represents a new paradigm for object detection.
However, its decoder conducts classification and box localization using shared
queries and cross-attention layers, leading to suboptimal results. We observe
that different regions of interest in the visual feature map are suitable for
performing query classification and box localization tasks, even for the same
object. Salient regions provide vital information for classification, while the
boundaries around them are more favorable for box regression. Unfortunately,
such spatial misalignment between these two tasks greatly hinders DETR's
training. Therefore, in this work, we focus on decoupling localization and
classification tasks in DETR. To achieve this, we introduce a new design scheme
called spatially decoupled DETR (SD-DETR), which includes a task-aware query
generation module and a disentangled feature learning process. We elaborately
design the task-aware query initialization process and divide the
cross-attention block in the decoder to allow the task-aware queries to match
different visual regions. Meanwhile, we also observe that the prediction
misalignment problem for high classification confidence and precise
localization exists, so we propose an alignment loss to further guide the
spatially decoupled DETR training. Through extensive experiments, we
demonstrate that our approach achieves a significant improvement in MSCOCO
datasets compared to previous work. For instance, we improve the performance of
Conditional DETR by 4.5 AP. By spatially disentangling the two tasks, our
method overcomes the misalignment problem and greatly improves the performance
of DETR for object detection.Comment: accepted by ICCV202
Review of the Resources and Utilization of Bamboo in China
China has made a breakthrough in the development and scientific cultivation of bamboo. At present, China ranks first in bamboo research worldwide, because of numerous research units and strong technical force. This chapter focuses on the utilization of bamboo resources such as food, roofs and walls of houses, fences, and domestic and agricultural implements such as water containers, food and drink container hats, arrows, quiver, etc. A total of 861 species and infraspecific taxa belonging to 43 genera have been reported and include 707 species, 52 varieties, 98 forma, and 4 hybrids, which are naturally distributed in 21 provinces. The national bamboo forest covers 6.01 million ha, including 4.43 million ha of Moso bamboo and 1.58 million ha of other bamboo species. As the country develops and new economic activities emerge, bamboo production has shifted from harsh processing, such as bamboo basket, to finished machining, such as bamboo flooring. The bamboo industry has attracted new opportunities as a new energy source, particularly renewable energy, and may be considered a lignocellulose substrate for bioethanol production because of its environmental benefits and high annual biomass yield
Mixed leaf litter decomposition and N, P release with a focus on Phyllostachys edulis (Carrière) J. Houz. forest in subtropical southeastern China
As an important non-wood forest product and wood substitute, Moso bamboo grows extremely rapidly and hence acquires large quantities of nutrients from the soil. With regard to litter decomposition, N and P release in Moso bamboo forests is undoubtedly important; however, to date, no comprehensive analysis has been conducted. Here, we chose two dominant species (i.e., Cunninghamia lanceolata and Phoebe bournei), in addition to Moso bamboo, which are widely distributed in subtropical southeastern China, and created five leaf litter mixtures (PE100, PE80PB20, PE80CL20, PE50PB50 and PE50CL50) to investigate species effects on leaf litter decomposition and nutrient release (N and P) via the litterbag method. Over a one-year incubation experiment, mass loss varied significantly with litter type (P 0.94, P < 0.001). N and P had different patterns of release; overall, N showed great temporal variation, while P was released from the litter continually. The mixture of Moso bamboo and Phoebe bournei (PE80PB20 and PE50PB50) showed significantly faster P release compared to the other three types, but there was no significant difference in N release. Litter decomposition and P release were related to initial litter C/N ratio, C/P ratio, and/or C content, while no significant relationship between N release and initial stoichiometric ratios was found. The Moso bamboo–Phoebe bournei (i.e., bamboo–broadleaved) mixture appeared to be the best choice for nutrient return and thus productivity and maintenance of Moso bamboo in this region
Teach-DETR: Better Training DETR with Teachers
In this paper, we present a novel training scheme, namely Teach-DETR, to
learn better DETR-based detectors from versatile teacher detectors. We show
that the predicted boxes from teacher detectors are effective medium to
transfer knowledge of teacher detectors, which could be either RCNN-based or
DETR-based detectors, to train a more accurate and robust DETR model. This new
training scheme can easily incorporate the predicted boxes from multiple
teacher detectors, each of which provides parallel supervisions to the student
DETR. Our strategy introduces no additional parameters and adds negligible
computational cost to the original detector during training. During inference,
Teach-DETR brings zero additional overhead and maintains the merit of requiring
no non-maximum suppression. Extensive experiments show that our method leads to
consistent improvement for various DETR-based detectors. Specifically, we
improve the state-of-the-art detector DINO with Swin-Large backbone, 4 scales
of feature maps and 36-epoch training schedule, from 57.8% to 58.9% in terms of
mean average precision on MSCOCO 2017 validation set. Code will be available at
https://github.com/LeonHLJ/Teach-DETR
RAPHAEL: Text-to-Image Generation via Large Mixture of Diffusion Paths
Text-to-image generation has recently witnessed remarkable achievements. We
introduce a text-conditional image diffusion model, termed RAPHAEL, to generate
highly artistic images, which accurately portray the text prompts, encompassing
multiple nouns, adjectives, and verbs. This is achieved by stacking tens of
mixture-of-experts (MoEs) layers, i.e., space-MoE and time-MoE layers, enabling
billions of diffusion paths (routes) from the network input to the output. Each
path intuitively functions as a "painter" for depicting a particular textual
concept onto a specified image region at a diffusion timestep. Comprehensive
experiments reveal that RAPHAEL outperforms recent cutting-edge models, such as
Stable Diffusion, ERNIE-ViLG 2.0, DeepFloyd, and DALL-E 2, in terms of both
image quality and aesthetic appeal. Firstly, RAPHAEL exhibits superior
performance in switching images across diverse styles, such as Japanese comics,
realism, cyberpunk, and ink illustration. Secondly, a single model with three
billion parameters, trained on 1,000 A100 GPUs for two months, achieves a
state-of-the-art zero-shot FID score of 6.61 on the COCO dataset. Furthermore,
RAPHAEL significantly surpasses its counterparts in human evaluation on the
ViLG-300 benchmark. We believe that RAPHAEL holds the potential to propel the
frontiers of image generation research in both academia and industry, paving
the way for future breakthroughs in this rapidly evolving field. More details
can be found on a project webpage: https://raphael-painter.github.io/.Comment: Technical Repor
- …