25 research outputs found

    DiffDis: Empowering Generative Diffusion Model with Cross-Modal Discrimination Capability

    Full text link
    Recently, large-scale diffusion models, e.g., Stable diffusion and DallE2, have shown remarkable results on image synthesis. On the other hand, large-scale cross-modal pre-trained models (e.g., CLIP, ALIGN, and FILIP) are competent for various downstream tasks by learning to align vision and language embeddings. In this paper, we explore the possibility of jointly modeling generation and discrimination. Specifically, we propose DiffDis to unify the cross-modal generative and discriminative pretraining into one single framework under the diffusion process. DiffDis first formulates the image-text discriminative problem as a generative diffusion process of the text embedding from the text encoder conditioned on the image. Then, we propose a novel dual-stream network architecture, which fuses the noisy text embedding with the knowledge of latent images from different scales for image-text discriminative learning. Moreover, the generative and discriminative tasks can efficiently share the image-branch network structure in the multi-modality model. Benefiting from diffusion-based unified training, DiffDis achieves both better generation ability and cross-modal semantic alignment in one architecture. Experimental results show that DiffDis outperforms single-task models on both the image generation and the image-text discriminative tasks, e.g., 1.65% improvement on average accuracy of zero-shot classification over 12 datasets and 2.42 improvement on FID of zero-shot image synthesis.Comment: ICCV202

    P3^3OVD: Fine-grained Visual-Text Prompt-Driven Self-Training for Open-Vocabulary Object Detection

    Full text link
    Inspired by the success of visual-language methods (VLMs) in zero-shot classification, recent works attempt to extend this line of work into object detection by leveraging the localization ability of pre-trained VLMs and generating pseudo labels for unseen classes in a self-training manner. However, since the current VLMs are usually pre-trained with aligning sentence embedding with global image embedding, the direct use of them lacks fine-grained alignment for object instances, which is the core of detection. In this paper, we propose a simple but effective Pretrain-adaPt-Pseudo labeling paradigm for Open-Vocabulary Detection (P3^3OVD) that introduces a fine-grained visual-text prompt adapting stage to enhance the current self-training paradigm with a more powerful fine-grained alignment. During the adapting stage, we enable VLM to obtain fine-grained alignment by using learnable text prompts to resolve an auxiliary dense pixel-wise prediction task. Furthermore, we propose a visual prompt module to provide the prior task information (i.e., the categories need to be predicted) for the vision branch to better adapt the pretrained VLM to the downstream tasks. Experiments show that our method achieves the state-of-the-art performance for open-vocabulary object detection, e.g., 31.5% mAP on unseen classes of COCO

    Boosting Visual-Language Models by Exploiting Hard Samples

    Full text link
    Contrastive Language-Image Pre-training (CLIP) has become the standard for learning cross-modal representations between images and text. Efforts to improve its capabilities typically demand the collection of additional data and retraining with new loss functions. While effective, the added requirements limit their practical use due to the increased resource and time investments needed. In this work, we present HELIP, a cost-effective strategy tailored to enhance the performance of existing CLIP models without the need for training a model from scratch or collecting additional data. Our method allows for effortless integration with existing models' training pipelines, providing an instant boost by training them with selected challenging text-image pairs from their original training datasets. HELIP treats each text-image pair as a single point in the joint vision-language space, identifying those in close proximity as hard pairs. By incorporating the challenging data, pre-trained CLIP models are refined using both the traditional contrastive loss and the newly introduced hard negative margin loss, ensuring the challenging data is fully utilized. On comprehensive benchmarks, HELIP consistently boosts existing models to achieve leading performance. In particular, it improves the zero-shot classification accuracy on ImageNet for SLIP models pre-trained on CC3M, CC12M and YFCC15M datasets. The improvements are 3.05%, 4.47%, and 10.1% respectively, achieved within two epochs of training. In addition, across fine-grained classification datasets, HELIP improves the zero-shot performance of pre-trained CLIP and SLIP by an average of 8.4% and 18.6%, and their linear probe performance by an average of 9.5% and 3.0%.Comment: The code is publicly available at https://github.com/haonan3/HELI

    Exploration of the Minimum Necessary FVIII Level at Different Physical Activity Levels in Pediatric Patients with Hemophilia A

    Get PDF
    BACKGROUND: Physical activity can increase joint stability and reduce the risk of injury in hemophilia patients. There is limited clinical data on target trough FVIII levels during physical activity in hemophilia A patients. Hence, this study aimed to explore the target trough FVIII level required to avoid bleeding during different physical activities in hemophilia A patients. METHODS: Patients with severe or moderate hemophilia A, who underwent pharmacokinetics (PK) tests at our center were enrolled in this study. Physical activities and clinical information such as bleeding were recorded. The FVIII level during physical activity was calculated by the WAPPS-Hemo. RESULTS: A total of 105 patients were enrolled in this study. A total of 373 physical activities were recorded, of which 57.6% (215/373) was low-risk activities and the remaining 42.4% (158/373) was medium-risk activities. Most common physical activities were bicycling (59.0%), swimming (43.8%), running (48.6%), and jumping rope (41.0%). The FVIII trough level of low-risk physical activity was 3.8 IU/dl (AUC = 0.781, CONCLUSION: The minimum necessary FVIII level increased with higher risk physical activity, irrespective of arthropathy

    GrowCLIP: Data-aware Automatic Model Growing for Large-scale Contrastive Language-Image Pre-training

    Full text link
    Cross-modal pre-training has shown impressive performance on a wide range of downstream tasks, benefiting from massive image-text pairs collected from the Internet. In practice, online data are growing constantly, highlighting the importance of the ability of pre-trained model to learn from data that is continuously growing. Existing works on cross-modal pre-training mainly focus on training a network with fixed architecture. However, it is impractical to limit the model capacity when considering the continuously growing nature of pre-training data in real-world applications. On the other hand, it is important to utilize the knowledge in the current model to obtain efficient training and better performance. To address the above issues, in this paper, we propose GrowCLIP, a data-driven automatic model growing algorithm for contrastive language-image pre-training with continuous image-text pairs as input. Specially, we adopt a dynamic growth space and seek out the optimal architecture at each growth step to adapt to online learning scenarios. And the shared encoder is proposed in our growth space to enhance the degree of cross-modal fusion. Besides, we explore the effect of growth in different dimensions, which could provide future references for the design of cross-modal model architecture. Finally, we employ parameter inheriting with momentum (PIM) to maintain the previous knowledge and address the issue of the local minimum dilemma. Compared with the existing methods, GrowCLIP improves 2.3% average top-1 accuracy on zero-shot image classification of 9 downstream tasks. As for zero-shot image retrieval, GrowCLIP can improve 1.2% for top-1 image-to-text recall on Flickr30K dataset.Comment: Accepted by ICCV202

    Low-Dose Immune Tolerance Induction in Children With Severe Hemophilia A With High-Titer Inhibitors: Type of Factor 8 Mutation and Outcomes

    Get PDF
    BACKGROUND: No studies evaluated the role of OBJECTIVES: To explore the association between METHODS: Children SHA with high-titer inhibitors who received low-dose ITI therapy at least for 1 year were included in this study. Based on the risk of inhibitor development, RESULTS: Of 104 children included, 101 had CONCLUSIONS: Types o

    The Electrical Properties And Microstructure Of An Ion Beam Mixed Metal-Polymer System

    No full text
    Ion beam mixing of the surface of polymers with metal films is done by ion implanting through a thin metal film. The metal layers are evaporated on the polymer substrate surface before ion implementation. A 50keV, nitrogen ion beam with a dose in the range of 10¹⁶ ions/cm² and a dose rate current of 100 and 200 uAmp mixes the metal and polymer at the interface. The polytetrafluorethylene samples (PTFE) mixed with Cr metal layers. The optical properties indicate surface and near-surface structure. SEM pictures show a smooth surface after implantation and cracks in the metal layer. The thermal damage results in gas evolution during implantation. The resistivity is reduced over the resistivity of ion implanted polymers no metal layer. The DC conductivity increases with increasing temperature. The temperature coefficient of resistance decreases with increasing metal layer thickness. Coulomb gap and thermal Hopping theories are used to fit the experiment data. The results show the thin Cr layer samples more like Coulomb gap conducting materials; the thick Cr layer samples more in the Hopping conducting regions. A possible model of ion implanted metal/polymer system is shown at the end of this thesis. The varying metal layer thickness results in the different microstructures. The mixing condition and metal layer thickness directly affects the conducting mechanism

    COMPARISON OF THE UPTAKE OF POLYCYCLIC AROMATIC HYDROCARBONS AND ORGANOCHLORINE PESTICIDES BY SEMIPERMEABLE MEMBRANE DEVICES AND CAGED FISH (CARASSIUS CARASSIUS) IN TAIHU LAKE, CHINA

    No full text
    Uptake of polycyclic aromatic hydrocarbons (PAHs) and organochlorine pesticides-(OCPs) by triolein-containing semipermeable membrane devices (SPMDs) and by crucian carp (Carassius carassius) was studied in Taihu Lake, a shallow, freshwater lake in China. Crucian carp and SPMDs were deployed side by side for 32 d. The first-order uptake rate constants of individual PAHs and OCPs for the, two matrices were calculated and compared to relate the amounts of chemicals accumulated by the matrices to dissolved water concentrations. On a wet-weight basis, total concentrations of PAHs and OCPs in crucian carp fillets averaged 49.5 and 13.6 ng/g, respectively, after the 32-d exposure, whereas concentrations in whole SPMDs averaged 716.9 and 62.3 ng/g, respectively. The uptake rate constants of PAHs and OCPs by SPMDs averaged seven- and fivefold higher, respectively, than those for crucian carp; however, the patterns of uptake rate constants derived from test chemical concentrations in the crucian carp and SPMDs were similar. Although equilibrium was not reached for some PAHs and OCPs during the 32-d exposure period, a reasonably good correlation between the concentration factors (CFs) and octanol/water partition coefficient (K-OW) values of PAHs and OCPs in SPMDs (r = 0.86, p < 0.001) was observed when potential sorption to dissolved organic carbon was taken into account. Similar efforts to correlate the CFs and K-OW values of PAHs and OCPs in crucian carp (r = 0.75, p < 0.001) were less successful, likely because of PAH metabolism by finfish. Overall, the present results suggest that SPMDs may serve as a surrogate for contaminant monitoring with fish in freshwater lake environments

    Inbreeding in Chinese Fir: Insight into the Rare Self-Fertilizing Event from a Genetic View

    No full text
    Chinese fir (Cunninghamia lanceolata (Lamb.) Hook.) is a fast-growing conifer with great forestation value and prefers outcrossing with high inbreeding depression effect. Previously, we captured a special Chinese fir parent clone named as &lsquo;cx569&rsquo; that lacks early inbreeding depression. In view of the fact that very little has been published about the rare self-fertilizing event in Chinese fir from a genetic view, herein, we conduct an SSR-based study on the variation of open- and self-pollinated offspring of this parent to gain a view of the rare self-fertilizing event. The results indicated that genetic diversity of self-pollinated offspring was significantly reduced by half (Ho: 0.302, vs. 0.595, p = 0.001; He: 0.274 vs. 0.512, p = 0.002) when compared to an open-pollinated set. Self-pollinated offspring also had significantly positive FIS values (FIS = 0.057, p = 0.034) with a much higher proportion of common allele (20.59% vs. 0), reflecting their heterozygote deficiency. Clustering analysis further indicated a separation of the self- and opened- pollinated groups, implying a natural preference of outcrossing for cx569. However, the cx569 still had 6% acceptance for selfing. When accepted 100% for its own pollen, the cx569 led to a genetically unique selfing group. Additionally, this selfing group seemed to be consistently homozygous at seven particular loci. These findings gave us more genetic clues to gain insight into the rare self-fertilizing event in conifer (Chinese fir)
    corecore