12 research outputs found

    A Challenger to GPT-4V? Early Explorations of Gemini in Visual Expertise

    Full text link
    The surge of interest towards Multi-modal Large Language Models (MLLMs), e.g., GPT-4V(ision) from OpenAI, has marked a significant trend in both academia and industry. They endow Large Language Models (LLMs) with powerful capabilities in visual understanding, enabling them to tackle diverse multi-modal tasks. Very recently, Google released Gemini, its newest and most capable MLLM built from the ground up for multi-modality. In light of the superior reasoning capabilities, can Gemini challenge GPT-4V's leading position in multi-modal learning? In this paper, we present a preliminary exploration of Gemini Pro's visual understanding proficiency, which comprehensively covers four domains: fundamental perception, advanced cognition, challenging vision tasks, and various expert capacities. We compare Gemini Pro with the state-of-the-art GPT-4V to evaluate its upper limits, along with the latest open-sourced MLLM, Sphinx, which reveals the gap between manual efforts and black-box systems. The qualitative samples indicate that, while GPT-4V and Gemini showcase different answering styles and preferences, they can exhibit comparable visual reasoning capabilities, and Sphinx still trails behind them concerning domain generalizability. Specifically, GPT-4V tends to elaborate detailed explanations and intermediate steps, and Gemini prefers to output a direct and concise answer. The quantitative evaluation on the popular MME benchmark also demonstrates the potential of Gemini to be a strong challenger to GPT-4V. Our early investigation of Gemini also observes some common issues of MLLMs, indicating that there still remains a considerable distance towards artificial general intelligence. Our project for tracking the progress of MLLM is released at https://github.com/BradyFU/Awesome-Multimodal-Large-Language-Models.Comment: Total 120 pages. See our project at https://github.com/BradyFU/Awesome-Multimodal-Large-Language-Model

    Fourth-order coherence-function theory of laser-induced molecular reorientational grating and population grating

    No full text
    We have employed fourth-order coherence-function theory to study the influence of the partial-coherence properties of pump beams on the laser-induced gratings. First, we examine the formation of molecular reorientational grating. The different roles of phase fluctuation and amplitude fluctuation have been pointed out. A time-delayed method has been proposed to distinguish molecular reorientational grating from thermal grating. We then apply the fourth-order theory to study the Bragg reflection from a population grating. We obtain an analytic solution which enables us to make an extensive investigation on the temporal behaviour of the Bragg reflection signal. This study is especially helpful for elucidating the generation mechanism of population grating.Nous avons utilisé la théorie des fonctions de corrélation au quatrième ordre pour étudier l'influence des propriétés de cohérence partielle des faisceaux pompe sur les réseaux induits par laser. Tout d'abord, nous examinons la formation du réseau de réorientation moléculaire. Les rôles respectifs des fluctuations de phase et des fluctuations d'amplitude sont dégagés. On propose une méthode de retard temporel pour distinguer le réseau de réorientation moléculaire du réseau de population. Nous appliquons ensuite la théorie au quatrième ordre pour étudier la réflexion de Bragg sur le réseau de population. Nous obtenons une solution analytique qui nous permet d'étudier en détail le comportement temporel du signal de réflexion de Bragg. Cette étude est tout particulièrement utile pour éclaircir le mécanisme de formation du réseau de population

    Population-specific GSTM1 copy number variation

    No full text
    As one of the major glutathione conjugation enzymes, GSTM1 detoxifies a number of drugs and xenobiotics. Its expression and activity have been shown to correlate both with cancer risks and drug resistance. Through a genome-wide association study, we identified a significant association between HapMap SNP rs366631 and GSTM1 expression. In this study, utilizing lymphoblastoid cell lines derived from International HapMap Consortium CEU and YRI populations, we designed and performed site-specific genotyping assays for both rs366631 and a highly homologous GSTM1 upstream site. Copy number variation (CNV) assays were performed for three different regions of the GSTM1 gene. We demonstrated that HapMap SNP rs366631 is a non-polymorphic site. The false genotyping call arises from sequence homology, a common GSTM1 region deletion and a non-specific genotyping platform used to identify the SNP. However, the HapMap call for rs366631 genotype is an indicator of GSTM1 upstream region deletion. Furthermore, this upstream deletion can be used as a marker of GSTM1 gene deletion. Using a novel GSTM1 CNV assay, we showed a population-specific CNV in this region upstream of the gene. More than 75% of the Caucasian (CEU) samples exhibit GSTM1 deletion and none contain two copies of GSTM1. In contrast, up to 25% of African (YRI) samples were found to have two copies of GSTM1. In conclusion, HapMap rs366631 is a pseudo-SNP that can be used as a GSTM1 deletion marker. Both the pseudo-SNP allele frequency and GSTM1 upstream region CNV show population-specific patterns between CEU and YRI samples
    corecore