104 research outputs found

    DETRs with Collaborative Hybrid Assignments Training

    Full text link
    In this paper, we provide the observation that too few queries assigned as positive samples in DETR with one-to-one set matching leads to sparse supervision on the encoder's output which considerably hurt the discriminative feature learning of the encoder and vice visa for attention learning in the decoder. To alleviate this, we present a novel collaborative hybrid assignments training scheme, namely C\mathcal{C}o-DETR, to learn more efficient and effective DETR-based detectors from versatile label assignment manners. This new training scheme can easily enhance the encoder's learning ability in end-to-end detectors by training the multiple parallel auxiliary heads supervised by one-to-many label assignments such as ATSS and Faster RCNN. In addition, we conduct extra customized positive queries by extracting the positive coordinates from these auxiliary heads to improve the training efficiency of positive samples in the decoder. In inference, these auxiliary heads are discarded and thus our method introduces no additional parameters and computational cost to the original detector while requiring no hand-crafted non-maximum suppression (NMS). We conduct extensive experiments to evaluate the effectiveness of the proposed approach on DETR variants, including DAB-DETR, Deformable-DETR, and DINO-Deformable-DETR. The state-of-the-art DINO-Deformable-DETR with Swin-L can be improved from 58.5% to 59.5% AP on COCO val. Surprisingly, incorporated with ViT-L backbone, we achieve 66.0% AP on COCO test-dev and 67.9% AP on LVIS val, outperforming previous methods by clear margins with much fewer model sizes. Codes are available at \url{https://github.com/Sense-X/Co-DETR}.Comment: ICCV 2023. Codes are available at https://github.com/Sense-X/Co-DET

    Decoupled DETR: Spatially Disentangling Localization and Classification for Improved End-to-End Object Detection

    Full text link
    The introduction of DETR represents a new paradigm for object detection. However, its decoder conducts classification and box localization using shared queries and cross-attention layers, leading to suboptimal results. We observe that different regions of interest in the visual feature map are suitable for performing query classification and box localization tasks, even for the same object. Salient regions provide vital information for classification, while the boundaries around them are more favorable for box regression. Unfortunately, such spatial misalignment between these two tasks greatly hinders DETR's training. Therefore, in this work, we focus on decoupling localization and classification tasks in DETR. To achieve this, we introduce a new design scheme called spatially decoupled DETR (SD-DETR), which includes a task-aware query generation module and a disentangled feature learning process. We elaborately design the task-aware query initialization process and divide the cross-attention block in the decoder to allow the task-aware queries to match different visual regions. Meanwhile, we also observe that the prediction misalignment problem for high classification confidence and precise localization exists, so we propose an alignment loss to further guide the spatially decoupled DETR training. Through extensive experiments, we demonstrate that our approach achieves a significant improvement in MSCOCO datasets compared to previous work. For instance, we improve the performance of Conditional DETR by 4.5 AP. By spatially disentangling the two tasks, our method overcomes the misalignment problem and greatly improves the performance of DETR for object detection.Comment: accepted by ICCV202

    Review of the Resources and Utilization of Bamboo in China

    Get PDF
    China has made a breakthrough in the development and scientific cultivation of bamboo. At present, China ranks first in bamboo research worldwide, because of numerous research units and strong technical force. This chapter focuses on the utilization of bamboo resources such as food, roofs and walls of houses, fences, and domestic and agricultural implements such as water containers, food and drink container hats, arrows, quiver, etc. A total of 861 species and infraspecific taxa belonging to 43 genera have been reported and include 707 species, 52 varieties, 98 forma, and 4 hybrids, which are naturally distributed in 21 provinces. The national bamboo forest covers 6.01 million ha, including 4.43 million ha of Moso bamboo and 1.58 million ha of other bamboo species. As the country develops and new economic activities emerge, bamboo production has shifted from harsh processing, such as bamboo basket, to finished machining, such as bamboo flooring. The bamboo industry has attracted new opportunities as a new energy source, particularly renewable energy, and may be considered a lignocellulose substrate for bioethanol production because of its environmental benefits and high annual biomass yield

    Mixed leaf litter decomposition and N, P release with a focus on Phyllostachys edulis (Carrière) J. Houz. forest in subtropical southeastern China

    Get PDF
    As an important non-wood forest product and wood substitute, Moso bamboo grows extremely rapidly and hence acquires large quantities of nutrients from the soil. With regard to litter decomposition, N and P release in Moso bamboo forests is undoubtedly important; however, to date, no comprehensive analysis has been conducted. Here, we chose two dominant species (i.e., Cunninghamia lanceolata and Phoebe bournei), in addition to Moso bamboo, which are widely distributed in subtropical southeastern China, and created five leaf litter mixtures (PE100, PE80PB20, PE80CL20, PE50PB50 and PE50CL50) to investigate species effects on leaf litter decomposition and nutrient release (N and P) via the litterbag method. Over a one-year incubation experiment, mass loss varied significantly with litter type (P 0.94, P < 0.001). N and P had different patterns of release; overall, N showed great temporal variation, while P was released from the litter continually. The mixture of Moso bamboo and Phoebe bournei (PE80PB20 and PE50PB50) showed significantly faster P release compared to the other three types, but there was no significant difference in N release. Litter decomposition and P release were related to initial litter C/N ratio, C/P ratio, and/or C content, while no significant relationship between N release and initial stoichiometric ratios was found. The Moso bamboo–Phoebe bournei (i.e., bamboo–broadleaved) mixture appeared to be the best choice for nutrient return and thus productivity and maintenance of Moso bamboo in this region

    Teach-DETR: Better Training DETR with Teachers

    Full text link
    In this paper, we present a novel training scheme, namely Teach-DETR, to learn better DETR-based detectors from versatile teacher detectors. We show that the predicted boxes from teacher detectors are effective medium to transfer knowledge of teacher detectors, which could be either RCNN-based or DETR-based detectors, to train a more accurate and robust DETR model. This new training scheme can easily incorporate the predicted boxes from multiple teacher detectors, each of which provides parallel supervisions to the student DETR. Our strategy introduces no additional parameters and adds negligible computational cost to the original detector during training. During inference, Teach-DETR brings zero additional overhead and maintains the merit of requiring no non-maximum suppression. Extensive experiments show that our method leads to consistent improvement for various DETR-based detectors. Specifically, we improve the state-of-the-art detector DINO with Swin-Large backbone, 4 scales of feature maps and 36-epoch training schedule, from 57.8% to 58.9% in terms of mean average precision on MSCOCO 2017 validation set. Code will be available at https://github.com/LeonHLJ/Teach-DETR

    RAPHAEL: Text-to-Image Generation via Large Mixture of Diffusion Paths

    Full text link
    Text-to-image generation has recently witnessed remarkable achievements. We introduce a text-conditional image diffusion model, termed RAPHAEL, to generate highly artistic images, which accurately portray the text prompts, encompassing multiple nouns, adjectives, and verbs. This is achieved by stacking tens of mixture-of-experts (MoEs) layers, i.e., space-MoE and time-MoE layers, enabling billions of diffusion paths (routes) from the network input to the output. Each path intuitively functions as a "painter" for depicting a particular textual concept onto a specified image region at a diffusion timestep. Comprehensive experiments reveal that RAPHAEL outperforms recent cutting-edge models, such as Stable Diffusion, ERNIE-ViLG 2.0, DeepFloyd, and DALL-E 2, in terms of both image quality and aesthetic appeal. Firstly, RAPHAEL exhibits superior performance in switching images across diverse styles, such as Japanese comics, realism, cyberpunk, and ink illustration. Secondly, a single model with three billion parameters, trained on 1,000 A100 GPUs for two months, achieves a state-of-the-art zero-shot FID score of 6.61 on the COCO dataset. Furthermore, RAPHAEL significantly surpasses its counterparts in human evaluation on the ViLG-300 benchmark. We believe that RAPHAEL holds the potential to propel the frontiers of image generation research in both academia and industry, paving the way for future breakthroughs in this rapidly evolving field. More details can be found on a project webpage: https://raphael-painter.github.io/.Comment: Technical Repor