134 research outputs found

    NExT-Chat: An LMM for Chat, Detection and Segmentation

    Full text link
    The development of large language models (LLMs) has greatly advanced the field of multimodal understanding, leading to the emergence of large multimodal models (LMMs). In order to enhance the level of visual comprehension, recent studies have equipped LMMs with region-level understanding capabilities by representing object bounding box coordinates as a series of text sequences (pix2seq). In this paper, we introduce a novel paradigm for object location modeling called pix2emb method, where we ask the LMM to output the location embeddings and then decode them with different decoders. This paradigm allows us to use different location formats (such as bounding boxes and masks) in multimodal conversations. Leveraging the proposed pix2emb method, we train an LMM named NExT-Chat and demonstrate its capability of handling multiple tasks like visual grounding, region captioning, and grounded reasoning. Comprehensive experiments show the effectiveness of our NExT-Chat on various tasks, e.g., NExT-Chat (87.7) vs. Shikra (86.9) on POPE-Random, NExT-Chat (68.9) vs. LISA (67.9) on referring expression segmentation task, and NExT-Chat (79.6) vs. Kosmos-2 (62.3) on region caption task. The code and model are released at https://github.com/NExT-ChatV/NExT-Chat.Comment: Technical Report (https://next-chatv.github.io/

    Fine-Grained Scene Graph Generation with Data Transfer

    Full text link
    Scene graph generation (SGG) is designed to extract (subject, predicate, object) triplets in images. Recent works have made a steady progress on SGG, and provide useful tools for high-level vision and language understanding. However, due to the data distribution problems including long-tail distribution and semantic ambiguity, the predictions of current SGG models tend to collapse to several frequent but uninformative predicates (e.g., on, at), which limits practical application of these models in downstream tasks. To deal with the problems above, we propose a novel Internal and External Data Transfer (IETrans) method, which can be applied in a plug-and-play fashion and expanded to large SGG with 1,807 predicate classes. Our IETrans tries to relieve the data distribution problem by automatically creating an enhanced dataset that provides more sufficient and coherent annotations for all predicates. By training on the enhanced dataset, a Neural Motif model doubles the macro performance while maintaining competitive micro performance. The code and data are publicly available at https://github.com/waxnkw/IETrans-SGG.pytorch.Comment: ECCV 2022 (Oral

    Transfer Visual Prompt Generator across LLMs

    Full text link
    While developing a new vision-language LLM (VL-LLM) by pre-training on tremendous image-text pairs from scratch can be exceedingly resource-consuming, connecting an existing LLM with a comparatively lightweight visual prompt generator (VPG) becomes a feasible paradigm. However, further tuning the VPG part of the VL-LLM still suffers from indispensable computational costs, i.e., requiring thousands of GPU hours and millions of training data. One alternative solution is to transfer an existing VPG from any existing VL-LLMs for the target VL-LLM. In this work, we for the first time investigate the VPG transferability across LLMs, and explore a solution to reduce the cost of VPG transfer. We first study the VPG transfer across different LLM sizes (e.g., small-to-large), and across different LLM types, through which we diagnose the key factors to maximize the transfer efficiency. Based on our observation, we design a two-stage transfer framework named VPGTrans, which is simple yet highly effective. Through extensive experiments, we demonstrate that VPGTrans helps significantly speed up the transfer learning process without compromising performance. Remarkably, it helps achieve the VPG transfer from BLIP-2 OPT2.7B_\text{2.7B} to BLIP-2 OPT6.7B_\text{6.7B} with over 10 times speed-up and 10.7% training data compared with connecting a VPG to OPT6.7B_\text{6.7B} from scratch. Further, a series of intriguing findings and potential rationales behind them are provided and discussed. Finally, we showcase the practical value of our VPGTrans approach, by customizing two novel VL-LLMs, including VL-LLaMA and VL-Vicuna, with recently released LLaMA and Vicuna LLMs.Comment: Project Website: https://vpgtrans.github.io Code: https://github.com/VPGTrans/VPGTran

    Visually Grounded Commonsense Knowledge Acquisition

    Full text link
    Large-scale commonsense knowledge bases empower a broad range of AI applications, where the automatic extraction of commonsense knowledge (CKE) is a fundamental and challenging problem. CKE from text is known for suffering from the inherent sparsity and reporting bias of commonsense in text. Visual perception, on the other hand, contains rich commonsense knowledge about real-world entities, e.g., (person, can_hold, bottle), which can serve as promising sources for acquiring grounded commonsense knowledge. In this work, we present CLEVER, which formulates CKE as a distantly supervised multi-instance learning problem, where models learn to summarize commonsense relations from a bag of images about an entity pair without any human annotation on image instances. To address the problem, CLEVER leverages vision-language pre-training models for deep understanding of each image in the bag, and selects informative instances from the bag to summarize commonsense entity relations via a novel contrastive attention mechanism. Comprehensive experimental results in held-out and human evaluation show that CLEVER can extract commonsense knowledge in promising quality, outperforming pre-trained language model-based methods by 3.9 AUC and 6.4 mAUC points. The predicted commonsense scores show strong correlation with human judgment with a 0.78 Spearman coefficient. Moreover, the extracted commonsense can also be grounded into images with reasonable interpretability. The data and codes can be obtained at https://github.com/thunlp/CLEVER.Comment: Accepted by AAAI 202

    ADC Histograms from Routine DWI for Longitudinal Studies in Cerebral Small Vessel Disease: A Field Study in CADASIL.

    Get PDF
    Diffusion tensor imaging (DTI) histogram metrics are correlated with clinical parameters in cerebral small vessel diseases (cSVD). Whether ADC histogram parameters derived from simple diffusion weighted imaging (DWI) can provide relevant markers for long term studies of cSVD remains unknown. CADASIL patients were evaluated by DWI and DTI in a large cohort study overa6-year period. ADC histogram parameters were compared to those derived from mean diffusivity (MD) histograms in 280 patients using intra-class correlation and Bland-Altman plots. Impact of image corrections applied to ADC maps was assessed and a mixed effect model was used for analyzing the effects of scanner upgrades. The results showed that ADC histogram parameters are strongly correlated to MD histogram parameters and that image corrections have only limited influence on these results. Unexpectedly, scanner upgrades were found to have major effects on diffusion measures with DWI or DTI that can be even larger than those related to patients' characteristics. These data support that ADC histograms from daily used DWI can provide relevant parameters for assessing cSVD, but the variability related to scanner upgrades as regularly performed in clinical centers should be determined precisely for longitudinal and multicentric studies using diffusion MRI in cSVD

    IL-12 RB1 Genetic Variants Contribute to Human Susceptibility to Severe Acute Respiratory Syndrome Infection among Chinese

    Get PDF
    BACKGROUND: Cytokines play important roles in antiviral action. We examined whether polymorphisms of interleukin (IL)-12 receptor B1 (IL-12RB1) affect the susceptibility to and outcome of severe acute respiratory syndrome (SARS). METHODS: A case-control study was carried out in Chinese SARS patients and healthy controls. The genotypes of 4SNPs on IL-12 RB1 gene, +705A/G,+1158T/C, +1196G/C and +1664 C/T, were determined by PCR-RFLP. Haplotypes were estimated from the genotype data using the expectation-maximisation algorithm. RESULTS: Comparison between patients and close contacts showed that individuals with the +1664 C/T (CT and TT) genotype had a 2.09-fold (95% confidence interval [CI], 1.90-7.16) and 2.34-fold (95% CI, 1.79-13.37) increased risk of developing SARS, respectively. For any of the other three polymorphisms, however, no significant difference can be detected in allele or genotype frequencies between patients and controls. Additionally, estimation of the frequencies of multiple-locus haplotypes revealed potential risk haplotypes (GCCT) for SARS infection. CONCLUSIONS: Our data indicate that genetic variants of IL12RB1confer genetic susceptibility to SARS infection, but not necessary associated with the progression of the disease in Chinese population

    Redundant Mechanisms Prevent Mitotic Entry Following Replication Arrest in the Absence of Cdc25 Hyper-Phosphorylation in Fission Yeast

    Get PDF
    Following replication arrest the Cdc25 phosphatase is phosphorylated and inhibited by Cds1. It has previously been reported that expressing Cdc25 where 9 putative amino-terminal Cds1 phosphorylation sites have been substituted to alanine results in bypass of the DNA replication checkpoint. However, these results were acquired by expression of the phosphorylation mutant using a multicopy expression vector in a genetic background where the DNA replication checkpoint is intact. In order to clarify these results we constructed a Cdc25(9A)-GFP native promoter integrant and examined its effect on the replication checkpoint at endogenous expression levels. In this strain the replication checkpoint operates normally, conditional on the presence of the Mik1 kinase. In response to replication arrest the Cdc25(9A)-GFP protein is degraded, suggesting the presence of a backup mechanism to eliminate the phosphatase when it cannot be inhibited through phosphorylation

    Prevalence, Distribution and Functional Significance of the βˆ’237C to T Polymorphism in the IL-12RΞ²2 Promoter in Indian Tuberculosis Patients

    Get PDF
    Cytokine/cytokine receptor gene polymorphisms related to structure/expression could impact immune response. Hence, the βˆ’237 polymorphic site in the 5β€² promoter region of the IL-12RΞ²2 (SNP ID: rs11810249) gene associated with the AP-4 transcription motif GAGCTG, was examined. Amplicons encompassing the polymorphism were generated from 46 pulmonary tuberculosis patients, 35 family contacts and 28 miscellaneous volunteers and sequenced. The C allele predominated among patients, (93.4%, 43/46), and in all volunteers and contacts screened, but the T allele was exclusively limited to patients, (6.5%, 3/46). The functional impact of this polymorphism on transcriptional activity was assessed by Luciferase-reporter and electrophoretic mobility shift assays (EMSA). Luciferase-reporter assays showed a significant reduction in transcriptional efficiency with T compared to C allele. The reduction in transcriptional efficiency with the T allele construct (pGIL-12Rb2-T), in U-87MG, THP-1 and Jurkat cell lines, were 53, 37.6, and 49.8% respectively, compared to the C allele construct (pGIL-12Rb2-C). Similarly, densitometric analysis of the EMSA assay showed reduced binding of the AP-4 transcription factor, to T compared to the C nucleotide probe. Reduced mRNA expression in all patients (3/3) harboring the T allele was seen, whereas individuals with the C allele exhibited high mRNA expression (17/25; 68%, pβ€Š=β€Š0.05). These observations were in agreement with the in vitro assessment of the promoter activity by Luciferase-reporter and EMSA assays. The reduced expression of IL-12RΞ²2 transcripts in 8 patients despite having the C allele was attributed to the predominant over expression of the suppressors (IL-4 and GATA-3) and reduced expression of enhancers (IFN-Ξ±) of IL-12RΞ²2 transcripts. The 17 high IL-12RΞ²2 mRNA expressers had significantly elevated IFN-Ξ± mRNA levels compared to low expressers and volunteers. Notwithstanding the presence of high levels of IL-12RΞ²2 mRNA in these patients elevated IFN-Ξ± expression could modulate their immune responses to Mycobacterium tuberculosis

    The significance of the complement system for the pathogenesis of age-related macular degeneration β€” current evidence and translation into clinical application

    Get PDF
    BACKGROUND: Dysregulation of the complement system has been shown to play a major role in the pathogenesis of age-related macular degeneration (AMD). METHODS: The current evidence from human studies derives from immunohistochemical and proteomic studies in donor eyes, genetic association studies, and studies of blood complement protein levels. These lines of evidence are corroborated by in vitro and animal studies. RESULTS: In AMD donor eyes, detection of complement proteins in drusen suggested local inflammatory processes involving the complement system. Moreover, higher levels of complement proteins in the Bruch's membrane/choroid complex could be detected in AMD donor eyes compared to controls. A large number of independent genetic studies have consistently confirmed the association of AMD with risk or protective variants in genes coding for complement proteins, including complement factor H (CFH), CFH-related proteins 1 and 3, factor B/C2, C3 and factor I. Another set of independent studies detected increased levels of complement activation products in plasma of AMD patients, suggesting that AMD may be a systemic disease and the macula a vulnerable anatomic site of minimal resistance to complement activation. Genotype-phenotype correlations, including the impact of genetic variants on disease progression, gene-environment and pharmacogenetic interactions, have been investigated. There is evidence that complement gene variants may be associated with the progression from early to late forms of AMD, whereas they do not appear to play a significant role when late atrophic AMD has already developed. There are indications for an interaction between genetic variants and supplementation and dietary factors. Also, there is some evidence that variants in the CFH gene influence treatment effects in patients with neovascular AMD. CONCLUSIONS: Such data suggest that the complement system may have a significant role for developing new prophylactic and therapeutic interventions in AMD. In fact, several compounds acting on the complement pathway are currently in clinical trials. Therapeutics that modulate the complement system need to balance inhibition with preservation of sufficient functional activity in order to maintain adequate immune responses and tissue homeostasis. Specifically, targeting the dysfunction appears more adequate than a global suppression of complement activation in chronic diseases such as AMD
    • …
    corecore