22 research outputs found
Semi-Supervised and Long-Tailed Object Detection with CascadeMatch
This paper focuses on long-tailed object detection in the semi-supervised
learning setting, which poses realistic challenges, but has rarely been studied
in the literature. We propose a novel pseudo-labeling-based detector called
CascadeMatch. Our detector features a cascade network architecture, which has
multi-stage detection heads with progressive confidence thresholds. To avoid
manually tuning the thresholds, we design a new adaptive pseudo-label mining
mechanism to automatically identify suitable values from data. To mitigate
confirmation bias, where a model is negatively reinforced by incorrect
pseudo-labels produced by itself, each detection head is trained by the
ensemble pseudo-labels of all detection heads. Experiments on two long-tailed
datasets, i.e., LVIS and COCO-LT, demonstrate that CascadeMatch surpasses
existing state-of-the-art semi-supervised approaches -- across a wide range of
detection architectures -- in handling long-tailed object detection. For
instance, CascadeMatch outperforms Unbiased Teacher by 1.9 AP Fix on LVIS when
using a ResNet50-based Cascade R-CNN structure, and by 1.7 AP Fix when using
Sparse R-CNN with a Transformer encoder. We also show that CascadeMatch can
even handle the challenging sparsely annotated object detection problem.Comment: International Journal of Computer Vision (IJCV), 202
Contextual Object Detection with Multimodal Large Language Models
Recent Multimodal Large Language Models (MLLMs) are remarkable in
vision-language tasks, such as image captioning and question answering, but
lack the essential perception ability, i.e., object detection. In this work, we
address this limitation by introducing a novel research problem of contextual
object detection -- understanding visible objects within different human-AI
interactive contexts. Three representative scenarios are investigated,
including the language cloze test, visual captioning, and question answering.
Moreover, we present ContextDET, a unified multimodal model that is capable of
end-to-end differentiable modeling of visual-language contexts, so as to
locate, identify, and associate visual objects with language inputs for
human-AI interaction. Our ContextDET involves three key submodels: (i) a visual
encoder for extracting visual representations, (ii) a pre-trained LLM for
multimodal context decoding, and (iii) a visual decoder for predicting bounding
boxes given contextual object words. The new generate-then-detect framework
enables us to detect object words within human vocabulary. Extensive
experiments show the advantages of ContextDET on our proposed CODE benchmark,
open-vocabulary detection, and referring image segmentation. Github:
https://github.com/yuhangzang/ContextDET.Comment: Github: https://github.com/yuhangzang/ContextDET, Project Page:
https://www.mmlab-ntu.com/project/contextdet/index.htm
KPNet: Towards Minimal Face Detector
The small receptive field and capacity of minimal neural networks limit their
performance when using them to be the backbone of detectors. In this work, we
find that the appearance feature of a generic face is discriminative enough for
a tiny and shallow neural network to verify from the background. And the
essential barriers behind us are 1) the vague definition of the face bounding
box and 2) tricky design of anchor-boxes or receptive field. Unlike most
top-down methods for joint face detection and alignment, the proposed KPNet
detects small facial keypoints instead of the whole face by in a bottom-up
manner. It first predicts the facial landmarks from a low-resolution image via
the well-designed fine-grained scale approximation and scale adaptive
soft-argmax operator. Finally, the precise face bounding boxes, no matter how
we define it, can be inferred from the keypoints. Without any complex head
architecture or meticulous network designing, the KPNet achieves
state-of-the-art accuracy on generic face detection and alignment benchmarks
with only parameters, which runs at 1000fps on GPU and is easy to
perform real-time on most modern front-end chips.Comment: AAAI 202
On-Device Domain Generalization
We present a systematic study of domain generalization (DG) for tiny neural
networks. This problem is critical to on-device machine learning applications
but has been overlooked in the literature where research has been merely
focused on large models. Tiny neural networks have much fewer parameters and
lower complexity and therefore should not be trained the same way as their
large counterparts for DG applications. By conducting extensive experiments, we
find that knowledge distillation (KD), a well-known technique for model
compression, is much better for tackling the on-device DG problem than
conventional DG methods. Another interesting observation is that the
teacher-student gap on out-of-distribution data is bigger than that on
in-distribution data, which highlights the capacity mismatch issue as well as
the shortcoming of KD. We further propose a method called out-of-distribution
knowledge distillation (OKD) where the idea is to teach the student how the
teacher handles out-of-distribution data synthesized via disruptive data
augmentation. Without adding any extra parameter to the model -- hence keeping
the deployment cost unchanged -- OKD significantly improves DG performance for
tiny neural networks in a variety of on-device DG scenarios for image and
speech applications. We also contribute a scalable approach for synthesizing
visual domain shifts, along with a new suite of DG datasets to complement
existing testbeds.Comment: Preprin
Efficient and Accurate Arbitrary-Shaped Text Detection with Pixel Aggregation Network
Scene text detection, an important step of scene text reading systems, has
witnessed rapid development with convolutional neural networks. Nonetheless,
two main challenges still exist and hamper its deployment to real-world
applications. The first problem is the trade-off between speed and accuracy.
The second one is to model the arbitrary-shaped text instance. Recently, some
methods have been proposed to tackle arbitrary-shaped text detection, but they
rarely take the speed of the entire pipeline into consideration, which may fall
short in practical applications.In this paper, we propose an efficient and
accurate arbitrary-shaped text detector, termed Pixel Aggregation Network
(PAN), which is equipped with a low computational-cost segmentation head and a
learnable post-processing. More specifically, the segmentation head is made up
of Feature Pyramid Enhancement Module (FPEM) and Feature Fusion Module (FFM).
FPEM is a cascadable U-shaped module, which can introduce multi-level
information to guide the better segmentation. FFM can gather the features given
by the FPEMs of different depths into a final feature for segmentation. The
learnable post-processing is implemented by Pixel Aggregation (PA), which can
precisely aggregate text pixels by predicted similarity vectors. Experiments on
several standard benchmarks validate the superiority of the proposed PAN. It is
worth noting that our method can achieve a competitive F-measure of 79.9% at
84.2 FPS on CTW1500.Comment: Accept by ICCV 201
Variations in species diversity patterns and community assembly rules among vegetation types in the karst landscape
The various vegetation types in the karst landscape have been considered the results of heterogeneous habitats. However, the lack of a comprehensive understanding of regional biodiversity patterns and the underlying ecological processes limits further research on ecological management. This study established forest dynamic plots (FDPs) of the dominant vegetation types (shrubland, SL; mixed tree and shrub forest, MTSF; coniferous forest, CF; coniferous broadleaf mixed forest, CBMF; and broadleaf forest, BF) in the karst landscape and quantified the species diversity patterns and potential ecological processes. The results showed that in terms of diversity patterns, the evenness and species richness of the CF community were significantly lower than other vegetation types, while the BF community had the highest species richness. The other three vegetation types showed no significant variation in species richness and evenness. However, when controlling the number of individuals of FDPs, the rarefied species richness showed significant differences and ranked as BF > SL > MTSF > CBMF > CF, highlighting the importance of considering the impacts of abundance. Additionally, the community assembly of climax communities (CF or BF) was dominated by stochastic processes such as species dispersal or species formation, whereas deterministic processes (habitat filtering) dominated the secondary forests (SL, MTSF, and CBMF). These findings proved that community assembly differs mainly between the climax community and other communities. Hence, it is crucial to consider the biodiversity and of the potential underlying ecological processes together when studying regional ecology and management, particularly in heterogeneous ecosystems
Actively implementing an evidence-based feeding guideline for critically ill patients (NEED): a multicenter, cluster-randomized, controlled trial
Background: Previous cluster-randomized controlled trials evaluating the impact of implementing evidence-based guidelines for nutrition therapy in critical illness do not consistently demonstrate patient benefits. A large-scale, sufficiently powered study is therefore warranted to ascertain the effects of guideline implementation on patient-centered outcomes.
Methods: We conducted a multicenter, cluster-randomized, parallel-controlled trial in intensive care units (ICUs) across China. We developed an evidence-based feeding guideline. ICUs randomly allocated to the guideline group formed a local "intervention team", which actively implemented the guideline using standardized educational materials, a graphical feeding protocol, and live online education outreach meetings conducted by members of the study management committee. ICUs assigned to the control group remained unaware of the guideline content. All ICUs enrolled patients who were expected to stay in the ICU longer than seven days. The primary outcome was all-cause mortality within 28 days of enrollment.
Results: Forty-eight ICUs were randomized to the guideline group and 49 to the control group. From March 2018 to July 2019, the guideline ICUs enrolled 1399 patients, and the control ICUs enrolled 1373 patients. Implementation of the guideline resulted in significantly earlier EN initiation (1.20 vs. 1.55 mean days to initiation of EN; difference − 0.40 [95% CI − 0.71 to − 0.09]; P = 0.01) and delayed PN initiation (1.29 vs. 0.80 mean days to start of PN; difference 1.06 [95% CI 0.44 to 1.67]; P = 0.001). There was no significant difference in 28-day mortality (14.2% vs. 15.2%; difference − 1.6% [95% CI − 4.3% to 1.2%]; P = 0.42) between groups.
Conclusions: In this large-scale, multicenter trial, active implementation of an evidence-based feeding guideline reduced the time to commencement of EN and overall PN use but did not translate to a reduction in mortality from critical illness. Trial registration: ISRCTN, ISRCTN12233792. Registered November 20th, 2017
Actively implementing an evidence-based feeding guideline for critically ill patients (NEED): a multicenter, cluster-randomized, controlled trial.
BackgroundPrevious cluster-randomized controlled trials evaluating the impact of implementing evidence-based guidelines for nutrition therapy in critical illness do not consistently demonstrate patient benefits. A large-scale, sufficiently powered study is therefore warranted to ascertain the effects of guideline implementation on patient-centered outcomes.MethodsWe conducted a multicenter, cluster-randomized, parallel-controlled trial in intensive care units (ICUs) across China. We developed an evidence-based feeding guideline. ICUs randomly allocated to the guideline group formed a local "intervention team", which actively implemented the guideline using standardized educational materials, a graphical feeding protocol, and live online education outreach meetings conducted by members of the study management committee. ICUs assigned to the control group remained unaware of the guideline content. All ICUs enrolled patients who were expected to stay in the ICU longer than seven days. The primary outcome was all-cause mortality within 28 days of enrollment.ResultsForty-eight ICUs were randomized to the guideline group and 49 to the control group. From March 2018 to July 2019, the guideline ICUs enrolled 1399 patients, and the control ICUs enrolled 1373 patients. Implementation of the guideline resulted in significantly earlier EN initiation (1.20 vs. 1.55 mean days to initiation of EN; difference - 0.40 [95% CI - 0.71 to - 0.09]; P = 0.01) and delayed PN initiation (1.29 vs. 0.80 mean days to start of PN; difference 1.06 [95% CI 0.44 to 1.67]; P = 0.001). There was no significant difference in 28-day mortality (14.2% vs. 15.2%; difference - 1.6% [95% CI - 4.3% to 1.2%]; P = 0.42) between groups.ConclusionsIn this large-scale, multicenter trial, active implementation of an evidence-based feeding guideline reduced the time to commencement of EN and overall PN use but did not translate to a reduction in mortality from critical illness.Trial registrationISRCTN, ISRCTN12233792 . Registered November 20th, 2017