11 research outputs found

    Co-Teaching for Unsupervised Domain Adaptation and Expansion

    Full text link
    Unsupervised Domain Adaptation (UDA) is known to trade a model's performance on a source domain for improving its performance on a target domain. To resolve the issue, Unsupervised Domain Expansion (UDE) has been proposed recently to adapt the model for the target domain as UDA does, and in the meantime maintain its performance on the source domain. For both UDA and UDE, a model tailored to a given domain, let it be the source or the target domain, is assumed to well handle samples from the given domain. We question the assumption by reporting the existence of cross-domain visual ambiguity: Due to the lack of a crystally clear boundary between the two domains, samples from one domain can be visually close to the other domain. We exploit this finding and accordingly propose in this paper Co-Teaching (CT) that consists of knowledge distillation based CT (kdCT) and mixup based CT (miCT). Specifically, kdCT transfers knowledge from a leader-teacher network and an assistant-teacher network to a student network, so the cross-domain visual ambiguity will be better handled by the student. Meanwhile, miCT further enhances the generalization ability of the student. Comprehensive experiments on two image-classification benchmarks and two driving-scene-segmentation benchmarks justify the viability of the proposed method

    Renmin University of China at TRECVID 2022: Improving Video Search by Feature Fusion and Negation Understanding

    Full text link
    We summarize our TRECVID 2022 Ad-hoc Video Search (AVS) experiments. Our solution is built with two new techniques, namely Lightweight Attentional Feature Fusion (LAFF) for combining diverse visual / textual features and Bidirectional Negation Learning (BNL) for addressing queries that contain negation cues. In particular, LAFF performs feature fusion at both early and late stages and at both text and video ends to exploit diverse (off-the-shelf) features. Compared to multi-head self attention, LAFF is much more compact yet more effective. Its attentional weights can also be used for selecting fewer features, with the retrieval performance mostly preserved. BNL trains a negation-aware video retrieval model by minimizing a bidirectionally constrained loss per triplet, where a triplet consists of a given training video, its original description and a partially negated description. For video feature extraction, we use pre-trained CLIP, BLIP, BEiT, ResNeXt-101 and irCSN. As for text features, we adopt bag-of-words, word2vec, CLIP and BLIP. Our training data consists of MSR-VTT, TGIF and VATEX that were used in our previous participation. In addition, we automatically caption the V3C1 collection for pre-training. The 2022 edition of the TRECVID benchmark has again been a fruitful participation for the RUCMM team. Our best run, with an infAP of 0.262, is ranked at the second place teamwise

    TeachCLIP: Multi-Grained Teaching for Efficient Text-to-Video Retrieval

    Full text link
    For text-to-video retrieval (T2VR), which aims to retrieve unlabeled videos by ad-hoc textual queries, CLIP-based methods are dominating. Compared to CLIP4Clip which is efficient and compact, the state-of-the-art models tend to compute video-text similarity by fine-grained cross-modal feature interaction and matching, putting their scalability for large-scale T2VR into doubt. For efficient T2VR, we propose TeachCLIP with multi-grained teaching to let a CLIP4Clip based student network learn from more advanced yet computationally heavy models such as X-CLIP, TS2-Net and X-Pool . To improve the student's learning capability, we add an Attentional frame-Feature Aggregation (AFA) block, which by design adds no extra storage/computation overhead at the retrieval stage. While attentive weights produced by AFA are commonly used for combining frame-level features, we propose a novel use of the weights to let them imitate frame-text relevance estimated by the teacher network. As such, AFA provides a fine-grained learning (teaching) channel for the student (teacher). Extensive experiments on multiple public datasets justify the viability of the proposed method

    An enhanced personal photo recommendation system by fusing contextual and textual features on mobile device

    No full text

    ChinaOpen: A Dataset for Open-world Multimodal Learning

    Full text link
    This paper introduces ChinaOpen, a dataset sourced from Bilibili, a popular Chinese video-sharing website, for open-world multimodal learning. While the state-of-the-art multimodal learning networks have shown impressive performance in automated video annotation and cross-modal video retrieval, their training and evaluation have primarily been conducted on YouTube videos with English text. Their effectiveness on Chinese data remains to be verified. In order to support multimodal learning in the new context, we construct ChinaOpen-50k, a webly annotated training set of 50k Bilibili videos associated with user-generated titles and tags. Both text-based and content-based data cleaning are performed to remove low-quality videos in advance. For a multi-faceted evaluation, we build ChinaOpen-1k, a manually labeled test set of 1k videos, wherein each video is accompanied with a manually checked user title and a manually written caption. Besides, each test video is manually tagged to describe what visual entities / actions / scenes are present in the visual content. The original user tags are also manually checked. Moreover, with all the Chinese text translated into English, ChinaOpen-1k is also suited for evaluating models trained on English data. In addition to ChinaOpen, we propose Generative Video-to-text Transformer (GVT) for Chinese video captioning. We conduct an extensive evaluation of the state-of-the-art single-task / multi-task models on the new dataset, resulting in a number of novel findings and insights

    Implementation of international society guidelines on chorionicity determination in twins: A multi-Center cohort study in mainland China

    Get PDF
    Objective: Ultrasound determination of chorionicity is poor in early pregnancy in China. In an effort to increase the accuracy rate of prompt chorionicity determination, clinical training was provided to primary care physicians. This study assesses the effects of implementing clinical guidelines on chorionicity determination. Methods: A multi‑centered cohort study was conducted between January 2014 and June 2017 in 12 hospitals without fetal medicine centers. In 2014, the obstetricians and ultrasound physicians were trained in clinical practice and ultrasound examination relating to chorionicity determination. Linear and binary regression analyses were conducted to identify the effects of introducing the new protocols, including the diagnosis rate of chorionicty and perinatal outcomes, taking the data from 2014 as a baseline. Pregnancy outcomes were additionally adjusted for maternal age. Results: During the period of this study, 3,599 twin pregnancies from 12 centers were enrolled, and a total of 2,998 twin pregnancies were extracted. The rate of overall chorionicity determination, including antenatal and postpartum diagnosis, increased successively from 49.5% in 2014 to 93.5% in 2017 (P < 0.0001). The rate of ultrasonic chorionicity diagnosis before 14 weeks increased from 25.2% in 2014 to 65.0% in 2017 (P < 0.0001). These changes were associated with decreasing incidence of preterm birth, a lower risk of stillbirth, whether for one (P = 0.0456 in 2016) or two fetuses (P = 0.0470 in 2016; P = 0.0042 in 2017) and a decreased rate of admission to neonatal intensive care unit (43.0% in 2014, 37.4% in 2017; P = 0.0032). Conclusions: The implementation of a clinical practice guideline improved both overall and early chorionicity determinations. Regular training workshops of antenatal care are recommended to further promote capability in clinical diagnosis and treatment

    Synthesis and Characterization of Novel 2-Acyl-3-trifluoromethylquinoxaline 1,4-Dioxides as Potential Antimicrobial Agents

    No full text
    The emergence of drug resistance in pathogens leads to a loss of effectiveness of antimicrobials and complicates the treatment of bacterial infections. Quinoxaline 1,4-dioxides represent a prospective scaffold for search of new compounds with improved chemotherapeutic characteristics. Novel 2-acyl-3-trifluoromethylquinoxaline 1,4-dioxides with alteration of substituents at position 2 and 6 were synthesized via nucleophilic substitution with piperazine moiety and evaluated against a broad panel of bacteria and fungi by measuring their minimal inhibitory concentrations. Their mode of action was assessed by whole-genomic sequencing of spontaneous drug-resistant Mycobacterium smegmatis mutants, followed by comparative genomic analysis, and on an original pDualrep2 system. Most of the 2-acyl-3-trifluoromethylquinoxaline 1,4-dioxides showed high antibacterial properties against Gram-positive strains, including mycobacteria, and the introduction of a halogen atom in the position 6 of the quinoxaline ring further increased their activity, with 13c being the most active compound. The mode of action studies confirmed the DNA-damaging nature of the obtained quinoxaline 1,4-dioxides, while drug-resistance may be provided by mutations in redox homeostasis genes, encoding enzymes potentially involved in the activation of the compounds. This study extends views about the antimicrobial and antifungal activities of the quinoxaline 1,4-dioxides and can potentially lead to the discovery of new antibacterial drugs

    DataSheet_1_Comparative genome analysis reveals high-level drug resistance markers in a clinical isolate of Mycobacterium fortuitum subsp. fortuitum MF GZ001.zip

    No full text
    IntroductionInfections caused by non-tuberculosis mycobacteria are significantly worsening across the globe. M. fortuitum complex is a rapidly growing pathogenic species that is of clinical relevance to both humans and animals. This pathogen has the potential to create adverse effects on human healthcare.MethodsThe MF GZ001 clinical strain was collected from the sputum of a 45-year-old male patient with a pulmonary infection. The morphological studies, comparative genomic analysis, and drug resistance profiles along with variants detection were performed in this study. In addition, comparative analysis of virulence genes led us to understand the pathogenicity of this organism.ResultsBacterial growth kinetics and morphology confirmed that MF GZ001 is a rapidly growing species with a rough morphotype. The MF GZ001 contains 6413573 bp genome size with 66.18 % high G+C content. MF GZ001 possesses a larger genome than other related mycobacteria and included 6156 protein-coding genes. Molecular phylogenetic tree, collinearity, and comparative genomic analysis suggested that MF GZ001 is a novel member of the M. fortuitum complex. We carried out the drug resistance profile analysis and found single nucleotide polymorphism (SNP) mutations in key drug resistance genes such as rpoB, katG, AAC(2')-Ib, gyrA, gyrB, embB, pncA, blaF, thyA, embC, embR, and iniA. In addition, the MF GZ001strain contains mutations in iniA, iniC, pncA, and ribD which conferred resistance to isoniazid, ethambutol, pyrazinamide, and para-aminosalicylic acid respectively, which are not frequently observed in rapidly growing mycobacteria. A wide variety of predicted putative potential virulence genes were found in MF GZ001, most of which are shared with well-recognized mycobacterial species with high pathogenic profiles such as M. tuberculosis and M. abscessus.DiscussionOur identified novel features of a pathogenic member of the M. fortuitum complex will provide the foundation for further investigation of mycobacterial pathogenicity and effective treatment.</p
    corecore