22 research outputs found

    Semi-Supervised and Long-Tailed Object Detection with CascadeMatch

    Full text link
    This paper focuses on long-tailed object detection in the semi-supervised learning setting, which poses realistic challenges, but has rarely been studied in the literature. We propose a novel pseudo-labeling-based detector called CascadeMatch. Our detector features a cascade network architecture, which has multi-stage detection heads with progressive confidence thresholds. To avoid manually tuning the thresholds, we design a new adaptive pseudo-label mining mechanism to automatically identify suitable values from data. To mitigate confirmation bias, where a model is negatively reinforced by incorrect pseudo-labels produced by itself, each detection head is trained by the ensemble pseudo-labels of all detection heads. Experiments on two long-tailed datasets, i.e., LVIS and COCO-LT, demonstrate that CascadeMatch surpasses existing state-of-the-art semi-supervised approaches -- across a wide range of detection architectures -- in handling long-tailed object detection. For instance, CascadeMatch outperforms Unbiased Teacher by 1.9 AP Fix on LVIS when using a ResNet50-based Cascade R-CNN structure, and by 1.7 AP Fix when using Sparse R-CNN with a Transformer encoder. We also show that CascadeMatch can even handle the challenging sparsely annotated object detection problem.Comment: International Journal of Computer Vision (IJCV), 202

    Contextual Object Detection with Multimodal Large Language Models

    Full text link
    Recent Multimodal Large Language Models (MLLMs) are remarkable in vision-language tasks, such as image captioning and question answering, but lack the essential perception ability, i.e., object detection. In this work, we address this limitation by introducing a novel research problem of contextual object detection -- understanding visible objects within different human-AI interactive contexts. Three representative scenarios are investigated, including the language cloze test, visual captioning, and question answering. Moreover, we present ContextDET, a unified multimodal model that is capable of end-to-end differentiable modeling of visual-language contexts, so as to locate, identify, and associate visual objects with language inputs for human-AI interaction. Our ContextDET involves three key submodels: (i) a visual encoder for extracting visual representations, (ii) a pre-trained LLM for multimodal context decoding, and (iii) a visual decoder for predicting bounding boxes given contextual object words. The new generate-then-detect framework enables us to detect object words within human vocabulary. Extensive experiments show the advantages of ContextDET on our proposed CODE benchmark, open-vocabulary detection, and referring image segmentation. Github: https://github.com/yuhangzang/ContextDET.Comment: Github: https://github.com/yuhangzang/ContextDET, Project Page: https://www.mmlab-ntu.com/project/contextdet/index.htm

    KPNet: Towards Minimal Face Detector

    Full text link
    The small receptive field and capacity of minimal neural networks limit their performance when using them to be the backbone of detectors. In this work, we find that the appearance feature of a generic face is discriminative enough for a tiny and shallow neural network to verify from the background. And the essential barriers behind us are 1) the vague definition of the face bounding box and 2) tricky design of anchor-boxes or receptive field. Unlike most top-down methods for joint face detection and alignment, the proposed KPNet detects small facial keypoints instead of the whole face by in a bottom-up manner. It first predicts the facial landmarks from a low-resolution image via the well-designed fine-grained scale approximation and scale adaptive soft-argmax operator. Finally, the precise face bounding boxes, no matter how we define it, can be inferred from the keypoints. Without any complex head architecture or meticulous network designing, the KPNet achieves state-of-the-art accuracy on generic face detection and alignment benchmarks with only ∼1M\sim1M parameters, which runs at 1000fps on GPU and is easy to perform real-time on most modern front-end chips.Comment: AAAI 202

    On-Device Domain Generalization

    Full text link
    We present a systematic study of domain generalization (DG) for tiny neural networks. This problem is critical to on-device machine learning applications but has been overlooked in the literature where research has been merely focused on large models. Tiny neural networks have much fewer parameters and lower complexity and therefore should not be trained the same way as their large counterparts for DG applications. By conducting extensive experiments, we find that knowledge distillation (KD), a well-known technique for model compression, is much better for tackling the on-device DG problem than conventional DG methods. Another interesting observation is that the teacher-student gap on out-of-distribution data is bigger than that on in-distribution data, which highlights the capacity mismatch issue as well as the shortcoming of KD. We further propose a method called out-of-distribution knowledge distillation (OKD) where the idea is to teach the student how the teacher handles out-of-distribution data synthesized via disruptive data augmentation. Without adding any extra parameter to the model -- hence keeping the deployment cost unchanged -- OKD significantly improves DG performance for tiny neural networks in a variety of on-device DG scenarios for image and speech applications. We also contribute a scalable approach for synthesizing visual domain shifts, along with a new suite of DG datasets to complement existing testbeds.Comment: Preprin

    Efficient and Accurate Arbitrary-Shaped Text Detection with Pixel Aggregation Network

    Full text link
    Scene text detection, an important step of scene text reading systems, has witnessed rapid development with convolutional neural networks. Nonetheless, two main challenges still exist and hamper its deployment to real-world applications. The first problem is the trade-off between speed and accuracy. The second one is to model the arbitrary-shaped text instance. Recently, some methods have been proposed to tackle arbitrary-shaped text detection, but they rarely take the speed of the entire pipeline into consideration, which may fall short in practical applications.In this paper, we propose an efficient and accurate arbitrary-shaped text detector, termed Pixel Aggregation Network (PAN), which is equipped with a low computational-cost segmentation head and a learnable post-processing. More specifically, the segmentation head is made up of Feature Pyramid Enhancement Module (FPEM) and Feature Fusion Module (FFM). FPEM is a cascadable U-shaped module, which can introduce multi-level information to guide the better segmentation. FFM can gather the features given by the FPEMs of different depths into a final feature for segmentation. The learnable post-processing is implemented by Pixel Aggregation (PA), which can precisely aggregate text pixels by predicted similarity vectors. Experiments on several standard benchmarks validate the superiority of the proposed PAN. It is worth noting that our method can achieve a competitive F-measure of 79.9% at 84.2 FPS on CTW1500.Comment: Accept by ICCV 201

    Variations in species diversity patterns and community assembly rules among vegetation types in the karst landscape

    Get PDF
    The various vegetation types in the karst landscape have been considered the results of heterogeneous habitats. However, the lack of a comprehensive understanding of regional biodiversity patterns and the underlying ecological processes limits further research on ecological management. This study established forest dynamic plots (FDPs) of the dominant vegetation types (shrubland, SL; mixed tree and shrub forest, MTSF; coniferous forest, CF; coniferous broadleaf mixed forest, CBMF; and broadleaf forest, BF) in the karst landscape and quantified the species diversity patterns and potential ecological processes. The results showed that in terms of diversity patterns, the evenness and species richness of the CF community were significantly lower than other vegetation types, while the BF community had the highest species richness. The other three vegetation types showed no significant variation in species richness and evenness. However, when controlling the number of individuals of FDPs, the rarefied species richness showed significant differences and ranked as BF > SL > MTSF > CBMF > CF, highlighting the importance of considering the impacts of abundance. Additionally, the community assembly of climax communities (CF or BF) was dominated by stochastic processes such as species dispersal or species formation, whereas deterministic processes (habitat filtering) dominated the secondary forests (SL, MTSF, and CBMF). These findings proved that community assembly differs mainly between the climax community and other communities. Hence, it is crucial to consider the biodiversity and of the potential underlying ecological processes together when studying regional ecology and management, particularly in heterogeneous ecosystems

    Actively implementing an evidence-based feeding guideline for critically ill patients (NEED): a multicenter, cluster-randomized, controlled trial

    Get PDF
    Background: Previous cluster-randomized controlled trials evaluating the impact of implementing evidence-based guidelines for nutrition therapy in critical illness do not consistently demonstrate patient benefits. A large-scale, sufficiently powered study is therefore warranted to ascertain the effects of guideline implementation on patient-centered outcomes. Methods: We conducted a multicenter, cluster-randomized, parallel-controlled trial in intensive care units (ICUs) across China. We developed an evidence-based feeding guideline. ICUs randomly allocated to the guideline group formed a local "intervention team", which actively implemented the guideline using standardized educational materials, a graphical feeding protocol, and live online education outreach meetings conducted by members of the study management committee. ICUs assigned to the control group remained unaware of the guideline content. All ICUs enrolled patients who were expected to stay in the ICU longer than seven days. The primary outcome was all-cause mortality within 28 days of enrollment. Results: Forty-eight ICUs were randomized to the guideline group and 49 to the control group. From March 2018 to July 2019, the guideline ICUs enrolled 1399 patients, and the control ICUs enrolled 1373 patients. Implementation of the guideline resulted in significantly earlier EN initiation (1.20 vs. 1.55 mean days to initiation of EN; difference − 0.40 [95% CI − 0.71 to − 0.09]; P = 0.01) and delayed PN initiation (1.29 vs. 0.80 mean days to start of PN; difference 1.06 [95% CI 0.44 to 1.67]; P = 0.001). There was no significant difference in 28-day mortality (14.2% vs. 15.2%; difference − 1.6% [95% CI − 4.3% to 1.2%]; P = 0.42) between groups. Conclusions: In this large-scale, multicenter trial, active implementation of an evidence-based feeding guideline reduced the time to commencement of EN and overall PN use but did not translate to a reduction in mortality from critical illness. Trial registration: ISRCTN, ISRCTN12233792. Registered November 20th, 2017

    Actively implementing an evidence-based feeding guideline for critically ill patients (NEED): a multicenter, cluster-randomized, controlled trial.

    Get PDF
    BackgroundPrevious cluster-randomized controlled trials evaluating the impact of implementing evidence-based guidelines for nutrition therapy in critical illness do not consistently demonstrate patient benefits. A large-scale, sufficiently powered study is therefore warranted to ascertain the effects of guideline implementation on patient-centered outcomes.MethodsWe conducted a multicenter, cluster-randomized, parallel-controlled trial in intensive care units (ICUs) across China. We developed an evidence-based feeding guideline. ICUs randomly allocated to the guideline group formed a local "intervention team", which actively implemented the guideline using standardized educational materials, a graphical feeding protocol, and live online education outreach meetings conducted by members of the study management committee. ICUs assigned to the control group remained unaware of the guideline content. All ICUs enrolled patients who were expected to stay in the ICU longer than seven days. The primary outcome was all-cause mortality within 28 days of enrollment.ResultsForty-eight ICUs were randomized to the guideline group and 49 to the control group. From March 2018 to July 2019, the guideline ICUs enrolled 1399 patients, and the control ICUs enrolled 1373 patients. Implementation of the guideline resulted in significantly earlier EN initiation (1.20 vs. 1.55 mean days to initiation of EN; difference - 0.40 [95% CI - 0.71 to - 0.09]; P = 0.01) and delayed PN initiation (1.29 vs. 0.80 mean days to start of PN; difference 1.06 [95% CI 0.44 to 1.67]; P = 0.001). There was no significant difference in 28-day mortality (14.2% vs. 15.2%; difference - 1.6% [95% CI - 4.3% to 1.2%]; P = 0.42) between groups.ConclusionsIn this large-scale, multicenter trial, active implementation of an evidence-based feeding guideline reduced the time to commencement of EN and overall PN use but did not translate to a reduction in mortality from critical illness.Trial registrationISRCTN, ISRCTN12233792 . Registered November 20th, 2017
    corecore