109 research outputs found
Structure fusion based on graph convolutional networks for semi-supervised classification
Suffering from the multi-view data diversity and complexity for
semi-supervised classification, most of existing graph convolutional networks
focus on the networks architecture construction or the salient graph structure
preservation, and ignore the the complete graph structure for semi-supervised
classification contribution. To mine the more complete distribution structure
from multi-view data with the consideration of the specificity and the
commonality, we propose structure fusion based on graph convolutional networks
(SF-GCN) for improving the performance of semi-supervised classification.
SF-GCN can not only retain the special characteristic of each view data by
spectral embedding, but also capture the common style of multi-view data by
distance metric between multi-graph structures. Suppose the linear relationship
between multi-graph structures, we can construct the optimization function of
structure fusion model by balancing the specificity loss and the commonality
loss. By solving this function, we can simultaneously obtain the fusion
spectral embedding from the multi-view data and the fusion structure as
adjacent matrix to input graph convolutional networks for semi-supervised
classification. Experiments demonstrate that the performance of SF-GCN
outperforms that of the state of the arts on three challenging datasets, which
are Cora,Citeseer and Pubmed in citation networks
Semi-Supervised and Long-Tailed Object Detection with CascadeMatch
This paper focuses on long-tailed object detection in the semi-supervised
learning setting, which poses realistic challenges, but has rarely been studied
in the literature. We propose a novel pseudo-labeling-based detector called
CascadeMatch. Our detector features a cascade network architecture, which has
multi-stage detection heads with progressive confidence thresholds. To avoid
manually tuning the thresholds, we design a new adaptive pseudo-label mining
mechanism to automatically identify suitable values from data. To mitigate
confirmation bias, where a model is negatively reinforced by incorrect
pseudo-labels produced by itself, each detection head is trained by the
ensemble pseudo-labels of all detection heads. Experiments on two long-tailed
datasets, i.e., LVIS and COCO-LT, demonstrate that CascadeMatch surpasses
existing state-of-the-art semi-supervised approaches -- across a wide range of
detection architectures -- in handling long-tailed object detection. For
instance, CascadeMatch outperforms Unbiased Teacher by 1.9 AP Fix on LVIS when
using a ResNet50-based Cascade R-CNN structure, and by 1.7 AP Fix when using
Sparse R-CNN with a Transformer encoder. We also show that CascadeMatch can
even handle the challenging sparsely annotated object detection problem.Comment: International Journal of Computer Vision (IJCV), 202
Contextual Object Detection with Multimodal Large Language Models
Recent Multimodal Large Language Models (MLLMs) are remarkable in
vision-language tasks, such as image captioning and question answering, but
lack the essential perception ability, i.e., object detection. In this work, we
address this limitation by introducing a novel research problem of contextual
object detection -- understanding visible objects within different human-AI
interactive contexts. Three representative scenarios are investigated,
including the language cloze test, visual captioning, and question answering.
Moreover, we present ContextDET, a unified multimodal model that is capable of
end-to-end differentiable modeling of visual-language contexts, so as to
locate, identify, and associate visual objects with language inputs for
human-AI interaction. Our ContextDET involves three key submodels: (i) a visual
encoder for extracting visual representations, (ii) a pre-trained LLM for
multimodal context decoding, and (iii) a visual decoder for predicting bounding
boxes given contextual object words. The new generate-then-detect framework
enables us to detect object words within human vocabulary. Extensive
experiments show the advantages of ContextDET on our proposed CODE benchmark,
open-vocabulary detection, and referring image segmentation. Github:
https://github.com/yuhangzang/ContextDET.Comment: Github: https://github.com/yuhangzang/ContextDET, Project Page:
https://www.mmlab-ntu.com/project/contextdet/index.htm
Domain Generalization in Vision: A Survey
Generalization to out-of-distribution (OOD) data is a capability natural to
humans yet challenging for machines to reproduce. This is because most learning
algorithms strongly rely on the i.i.d.~assumption on source/target data, which
is often violated in practice due to domain shift. Domain generalization (DG)
aims to achieve OOD generalization by using only source data for model
learning. Since first introduced in 2011, research in DG has made great
progresses. In particular, intensive research in this topic has led to a broad
spectrum of methodologies, e.g., those based on domain alignment,
meta-learning, data augmentation, or ensemble learning, just to name a few; and
has covered various vision applications such as object recognition,
segmentation, action recognition, and person re-identification. In this paper,
for the first time a comprehensive literature review is provided to summarize
the developments in DG for computer vision over the past decade. Specifically,
we first cover the background by formally defining DG and relating it to other
research fields like domain adaptation and transfer learning. Second, we
conduct a thorough review into existing methods and present a categorization
based on their methodologies and motivations. Finally, we conclude this survey
with insights and discussions on future research directions.Comment: v4: includes the word "vision" in the title; improves the
organization and clarity in Section 2-3; adds future directions; and mor
Desertification Reversal Promotes the Complexity of Plant Community by Increasing Plant Species Diversity of Each Plant Functional Type
Desertification reversal is globally significant for the sustainable development of land resources. However, the mechanisms of desertification reversal at the level of plant community are still unclear. We hypothesized that desertification reversal has clear effects on plant community composition, plant functional types (PFTs), and other vegetation characteristics, including plant diversity and biomass, and their changes in the early stages of reversal are more dramatic than in later stages. We investigated the vegetation of four to five different stages of desertification reversal at each of seven large study sites in southwestern Mu Us Sandy Land, China. The results show that the dominant species in very severe desertification areas were replaced by perennial grasses in potential desertification areas. The importance values of annual forbs and perennial sub-shrubs decreased dramatically (from 42.59 and 32.98 to 22.13 and 5.54, respectively), whereas those of perennial grasses and perennial forbs increased prominently (from 13.26 and 2.71 to 53.94 and 11.79, respectively) with the reversal of desertification. Desertification reversal increased the complexity of plant community composition by increasing plant species in each PFT, and C3 plants replaced C4 plants to become the dominant PFT with reversal. Plant species richness and species diversity rose overall, and aboveground plant biomass significantly (p < 0.05) increased with the reversal of desertification. Most vegetation characteristics changed more strikingly in the early stages of desertification reversal than in later stages. Our results indicate that the type and composition of the plant community were dramatically affected by desertification reversal. Anthropogenic measures are more applicable to being employed in early stages than in later stages, and Amaranthaceae C4 plants are suggested to be planted in mobile dunes for the acceleration of desertification reversal. This study is useful for designing strategies of land management and ecological restoration in arid and semiarid regions
On-Device Domain Generalization
We present a systematic study of domain generalization (DG) for tiny neural
networks. This problem is critical to on-device machine learning applications
but has been overlooked in the literature where research has been merely
focused on large models. Tiny neural networks have much fewer parameters and
lower complexity and therefore should not be trained the same way as their
large counterparts for DG applications. By conducting extensive experiments, we
find that knowledge distillation (KD), a well-known technique for model
compression, is much better for tackling the on-device DG problem than
conventional DG methods. Another interesting observation is that the
teacher-student gap on out-of-distribution data is bigger than that on
in-distribution data, which highlights the capacity mismatch issue as well as
the shortcoming of KD. We further propose a method called out-of-distribution
knowledge distillation (OKD) where the idea is to teach the student how the
teacher handles out-of-distribution data synthesized via disruptive data
augmentation. Without adding any extra parameter to the model -- hence keeping
the deployment cost unchanged -- OKD significantly improves DG performance for
tiny neural networks in a variety of on-device DG scenarios for image and
speech applications. We also contribute a scalable approach for synthesizing
visual domain shifts, along with a new suite of DG datasets to complement
existing testbeds.Comment: Preprin
- …