32 research outputs found

    Towards Omni-supervised Referring Expression Segmentation

    Full text link
    Referring Expression Segmentation (RES) is an emerging task in computer vision, which segments the target instances in images based on text descriptions. However, its development is plagued by the expensive segmentation labels. To address this issue, we propose a new learning task for RES called Omni-supervised Referring Expression Segmentation (Omni-RES), which aims to make full use of unlabeled, fully labeled and weakly labeled data, e.g., referring points or grounding boxes, for efficient RES training. To accomplish this task, we also propose a novel yet strong baseline method for Omni-RES based on the recently popular teacher-student learning, where the weak labels are not directly transformed into supervision signals but used as a yardstick to select and refine high-quality pseudo-masks for teacher-student learning. To validate the proposed Omni-RES method, we apply it to a set of state-of-the-art RES models and conduct extensive experiments on a bunch of RES datasets. The experimental results yield the obvious merits of Omni-RES than the fully-supervised and semi-supervised training schemes. For instance, with only 10% fully labeled data, Omni-RES can help the base model achieve 100% fully supervised performance, and it also outperform the semi-supervised alternative by a large margin, e.g., +14.93% on RefCOCO and +14.95% on RefCOCO+, respectively. More importantly, Omni-RES also enable the use of large-scale vision-langauges like Visual Genome to facilitate low-cost RES training, and achieve new SOTA performance of RES, e.g., 80.66 on RefCOCO

    iFAN: Image-Instance Full Alignment Networks for Adaptive Object Detection

    Full text link
    Training an object detector on a data-rich domain and applying it to a data-poor one with limited performance drop is highly attractive in industry, because it saves huge annotation cost. Recent research on unsupervised domain adaptive object detection has verified that aligning data distributions between source and target images through adversarial learning is very useful. The key is when, where and how to use it to achieve best practice. We propose Image-Instance Full Alignment Networks (iFAN) to tackle this problem by precisely aligning feature distributions on both image and instance levels: 1) Image-level alignment: multi-scale features are roughly aligned by training adversarial domain classifiers in a hierarchically-nested fashion. 2) Full instance-level alignment: deep semantic information and elaborate instance representations are fully exploited to establish a strong relationship among categories and domains. Establishing these correlations is formulated as a metric learning problem by carefully constructing instance pairs. Above-mentioned adaptations can be integrated into an object detector (e.g. Faster RCNN), resulting in an end-to-end trainable framework where multiple alignments can work collaboratively in a coarse-tofine manner. In two domain adaptation tasks: synthetic-to-real (SIM10K->Cityscapes) and normal-to-foggy weather (Cityscapes->Foggy Cityscapes), iFAN outperforms the state-of-the-art methods with a boost of 10%+ AP over the source-only baseline.Comment: AAAI 202

    X-Mesh: Towards Fast and Accurate Text-driven 3D Stylization via Dynamic Textual Guidance

    Full text link
    Text-driven 3D stylization is a complex and crucial task in the fields of computer vision (CV) and computer graphics (CG), aimed at transforming a bare mesh to fit a target text. Prior methods adopt text-independent multilayer perceptrons (MLPs) to predict the attributes of the target mesh with the supervision of CLIP loss. However, such text-independent architecture lacks textual guidance during predicting attributes, thus leading to unsatisfactory stylization and slow convergence. To address these limitations, we present X-Mesh, an innovative text-driven 3D stylization framework that incorporates a novel Text-guided Dynamic Attention Module (TDAM). The TDAM dynamically integrates the guidance of the target text by utilizing text-relevant spatial and channel-wise attentions during vertex feature extraction, resulting in more accurate attribute prediction and faster convergence speed. Furthermore, existing works lack standard benchmarks and automated metrics for evaluation, often relying on subjective and non-reproducible user studies to assess the quality of stylized 3D assets. To overcome this limitation, we introduce a new standard text-mesh benchmark, namely MIT-30, and two automated metrics, which will enable future research to achieve fair and objective comparisons. Our extensive qualitative and quantitative experiments demonstrate that X-Mesh outperforms previous state-of-the-art methods.Comment: Technical repor

    Detection and localization of citrus fruit based on improved You Only Look Once v5s and binocular vision in the orchard

    Get PDF
    Intelligent detection and localization of mature citrus fruits is a critical challenge in developing an automatic harvesting robot. Variable illumination conditions and different occlusion states are some of the essential issues that must be addressed for the accurate detection and localization of citrus in the orchard environment. In this paper, a novel method for the detection and localization of mature citrus using improved You Only Look Once (YOLO) v5s with binocular vision is proposed. First, a new loss function (polarity binary cross-entropy with logit loss) for YOLO v5s is designed to calculate the loss value of class probability and objectness score, so that a large penalty for false and missing detection is applied during the training process. Second, to recover the missing depth information caused by randomly overlapping background participants, Cr-Cb chromatic mapping, the Otsu thresholding algorithm, and morphological processing are successively used to extract the complete shape of the citrus, and the kriging method is applied to obtain the best linear unbiased estimator for the missing depth value. Finally, the citrus spatial position and posture information are obtained according to the camera imaging model and the geometric features of the citrus. The experimental results show that the recall rates of citrus detection under non-uniform illumination conditions, weak illumination, and well illumination are 99.55%, 98.47%, and 98.48%, respectively, approximately 2–9% higher than those of the original YOLO v5s network. The average error of the distance between the citrus fruit and the camera is 3.98 mm, and the average errors of the citrus diameters in the 3D direction are less than 2.75 mm. The average detection time per frame is 78.96 ms. The results indicate that our method can detect and localize citrus fruits in the complex environment of orchards with high accuracy and speed. Our dataset and codes are available at https://github.com/AshesBen/citrus-detection-localization

    Spatial-temporal clustering of an outbreak of SARS-CoV-2 Delta VOC in Guangzhou, China in 2021

    Get PDF
    BackgroundIn May 2021, the SARS-CoV-2 Delta variant led to the first local outbreak in China in Guangzhou City. We explored the epidemiological characteristics and spatial-temporal clustering of this outbreak.MethodsBased on the 153 cases in the SARS-CoV-2 Delta variant outbreak, the Knox test was used to analyze the spatial-temporal clustering of the outbreak. We further explored the spatial-temporal clustering by gender and age groups, as well as compared the changes of clustering strength (S) value between the two outbreaks in Guangzhou.ResultsThe result of the Knox analysis showed that the areas at short distances and brief periods presented a relatively high risk. The strength of clustering of male-male pairs was higher. Age groups showed that clustering was concentrated in cases aged ≤ 18 years matched to 18–59 years and cases aged 60+ years. The strength of clustering of the outbreak declined after the implementation of public health measures. The change of strength of clustering at time intervals of 1–5 days decreased greater in 2021 (S = 129.19, change rate 38.87%) than that in 2020 (S = 83.81, change rate 30.02%).ConclusionsThe outbreak of SARS-CoV-2 Delta VOC in Guangzhou has obvious spatial-temporal clustering. The timely intervention measures are essential role to contain this outbreak of high transmission

    The biochemical sensor based on liquid-core photonic crystal fiber filled with gold, silver and aluminum

    Get PDF
    A highly sensitive SPR-PCF based biochemical sensor has been proposed based on finite element method simulations. Two metal wires are assumed to fill into two air holes in the y direction and the liquid analyte with refractive index higher than background material is injected into the central air hole. The liquid analyte supports liquid-core mode which couples to SPP mode as the phase matching condition is satisfied. High sensitivity of fiber sensor is achieved by the direct interaction between transmitted light and liquid analyte. The fiber sensor possesses the sensitivities of −8383.9 nm/RIU, −8428.6 nm/RIU and –8776.8 nm/RIU by filling gold, silver and aluminium respectively into the air holes of the PCF as the refractive index of liquid analyte varies from 1.454 to 1.478. The influences of the structural parameters of the PCF on the resonance wavelength and confinement loss are also analyzed
    corecore