109 research outputs found

    Structure fusion based on graph convolutional networks for semi-supervised classification

    Full text link
    Suffering from the multi-view data diversity and complexity for semi-supervised classification, most of existing graph convolutional networks focus on the networks architecture construction or the salient graph structure preservation, and ignore the the complete graph structure for semi-supervised classification contribution. To mine the more complete distribution structure from multi-view data with the consideration of the specificity and the commonality, we propose structure fusion based on graph convolutional networks (SF-GCN) for improving the performance of semi-supervised classification. SF-GCN can not only retain the special characteristic of each view data by spectral embedding, but also capture the common style of multi-view data by distance metric between multi-graph structures. Suppose the linear relationship between multi-graph structures, we can construct the optimization function of structure fusion model by balancing the specificity loss and the commonality loss. By solving this function, we can simultaneously obtain the fusion spectral embedding from the multi-view data and the fusion structure as adjacent matrix to input graph convolutional networks for semi-supervised classification. Experiments demonstrate that the performance of SF-GCN outperforms that of the state of the arts on three challenging datasets, which are Cora,Citeseer and Pubmed in citation networks

    Semi-Supervised and Long-Tailed Object Detection with CascadeMatch

    Full text link
    This paper focuses on long-tailed object detection in the semi-supervised learning setting, which poses realistic challenges, but has rarely been studied in the literature. We propose a novel pseudo-labeling-based detector called CascadeMatch. Our detector features a cascade network architecture, which has multi-stage detection heads with progressive confidence thresholds. To avoid manually tuning the thresholds, we design a new adaptive pseudo-label mining mechanism to automatically identify suitable values from data. To mitigate confirmation bias, where a model is negatively reinforced by incorrect pseudo-labels produced by itself, each detection head is trained by the ensemble pseudo-labels of all detection heads. Experiments on two long-tailed datasets, i.e., LVIS and COCO-LT, demonstrate that CascadeMatch surpasses existing state-of-the-art semi-supervised approaches -- across a wide range of detection architectures -- in handling long-tailed object detection. For instance, CascadeMatch outperforms Unbiased Teacher by 1.9 AP Fix on LVIS when using a ResNet50-based Cascade R-CNN structure, and by 1.7 AP Fix when using Sparse R-CNN with a Transformer encoder. We also show that CascadeMatch can even handle the challenging sparsely annotated object detection problem.Comment: International Journal of Computer Vision (IJCV), 202

    GW25-e1472 A Comparative Study of Right Adrenal Venous Sampling with and without 3 Dimensional Reconstruction

    Get PDF

    Contextual Object Detection with Multimodal Large Language Models

    Full text link
    Recent Multimodal Large Language Models (MLLMs) are remarkable in vision-language tasks, such as image captioning and question answering, but lack the essential perception ability, i.e., object detection. In this work, we address this limitation by introducing a novel research problem of contextual object detection -- understanding visible objects within different human-AI interactive contexts. Three representative scenarios are investigated, including the language cloze test, visual captioning, and question answering. Moreover, we present ContextDET, a unified multimodal model that is capable of end-to-end differentiable modeling of visual-language contexts, so as to locate, identify, and associate visual objects with language inputs for human-AI interaction. Our ContextDET involves three key submodels: (i) a visual encoder for extracting visual representations, (ii) a pre-trained LLM for multimodal context decoding, and (iii) a visual decoder for predicting bounding boxes given contextual object words. The new generate-then-detect framework enables us to detect object words within human vocabulary. Extensive experiments show the advantages of ContextDET on our proposed CODE benchmark, open-vocabulary detection, and referring image segmentation. Github: https://github.com/yuhangzang/ContextDET.Comment: Github: https://github.com/yuhangzang/ContextDET, Project Page: https://www.mmlab-ntu.com/project/contextdet/index.htm

    Domain Generalization in Vision: A Survey

    Full text link
    Generalization to out-of-distribution (OOD) data is a capability natural to humans yet challenging for machines to reproduce. This is because most learning algorithms strongly rely on the i.i.d.~assumption on source/target data, which is often violated in practice due to domain shift. Domain generalization (DG) aims to achieve OOD generalization by using only source data for model learning. Since first introduced in 2011, research in DG has made great progresses. In particular, intensive research in this topic has led to a broad spectrum of methodologies, e.g., those based on domain alignment, meta-learning, data augmentation, or ensemble learning, just to name a few; and has covered various vision applications such as object recognition, segmentation, action recognition, and person re-identification. In this paper, for the first time a comprehensive literature review is provided to summarize the developments in DG for computer vision over the past decade. Specifically, we first cover the background by formally defining DG and relating it to other research fields like domain adaptation and transfer learning. Second, we conduct a thorough review into existing methods and present a categorization based on their methodologies and motivations. Finally, we conclude this survey with insights and discussions on future research directions.Comment: v4: includes the word "vision" in the title; improves the organization and clarity in Section 2-3; adds future directions; and mor

    Desertification Reversal Promotes the Complexity of Plant Community by Increasing Plant Species Diversity of Each Plant Functional Type

    Get PDF
    Desertification reversal is globally significant for the sustainable development of land resources. However, the mechanisms of desertification reversal at the level of plant community are still unclear. We hypothesized that desertification reversal has clear effects on plant community composition, plant functional types (PFTs), and other vegetation characteristics, including plant diversity and biomass, and their changes in the early stages of reversal are more dramatic than in later stages. We investigated the vegetation of four to five different stages of desertification reversal at each of seven large study sites in southwestern Mu Us Sandy Land, China. The results show that the dominant species in very severe desertification areas were replaced by perennial grasses in potential desertification areas. The importance values of annual forbs and perennial sub-shrubs decreased dramatically (from 42.59 and 32.98 to 22.13 and 5.54, respectively), whereas those of perennial grasses and perennial forbs increased prominently (from 13.26 and 2.71 to 53.94 and 11.79, respectively) with the reversal of desertification. Desertification reversal increased the complexity of plant community composition by increasing plant species in each PFT, and C3 plants replaced C4 plants to become the dominant PFT with reversal. Plant species richness and species diversity rose overall, and aboveground plant biomass significantly (p < 0.05) increased with the reversal of desertification. Most vegetation characteristics changed more strikingly in the early stages of desertification reversal than in later stages. Our results indicate that the type and composition of the plant community were dramatically affected by desertification reversal. Anthropogenic measures are more applicable to being employed in early stages than in later stages, and Amaranthaceae C4 plants are suggested to be planted in mobile dunes for the acceleration of desertification reversal. This study is useful for designing strategies of land management and ecological restoration in arid and semiarid regions

    On-Device Domain Generalization

    Full text link
    We present a systematic study of domain generalization (DG) for tiny neural networks. This problem is critical to on-device machine learning applications but has been overlooked in the literature where research has been merely focused on large models. Tiny neural networks have much fewer parameters and lower complexity and therefore should not be trained the same way as their large counterparts for DG applications. By conducting extensive experiments, we find that knowledge distillation (KD), a well-known technique for model compression, is much better for tackling the on-device DG problem than conventional DG methods. Another interesting observation is that the teacher-student gap on out-of-distribution data is bigger than that on in-distribution data, which highlights the capacity mismatch issue as well as the shortcoming of KD. We further propose a method called out-of-distribution knowledge distillation (OKD) where the idea is to teach the student how the teacher handles out-of-distribution data synthesized via disruptive data augmentation. Without adding any extra parameter to the model -- hence keeping the deployment cost unchanged -- OKD significantly improves DG performance for tiny neural networks in a variety of on-device DG scenarios for image and speech applications. We also contribute a scalable approach for synthesizing visual domain shifts, along with a new suite of DG datasets to complement existing testbeds.Comment: Preprin
    corecore