58 research outputs found

    MME: A Comprehensive Evaluation Benchmark for Multimodal Large Language Models

    Full text link
    Multimodal Large Language Model (MLLM) relies on the powerful LLM to perform multimodal tasks, showing amazing emergent abilities in recent studies, such as writing poems based on an image. However, it is difficult for these case studies to fully reflect the performance of MLLM, lacking a comprehensive evaluation. In this paper, we fill in this blank, presenting the first MLLM Evaluation benchmark MME. It measures both perception and cognition abilities on a total of 14 subtasks. In order to avoid data leakage that may arise from direct use of public datasets for evaluation, the annotations of instruction-answer pairs are all manually designed. The concise instruction design allows us to fairly compare MLLMs, instead of struggling in prompt engineering. Besides, with such an instruction, we can also easily carry out quantitative statistics. A total of 10 advanced MLLMs are comprehensively evaluated on our MME, which not only suggests that existing MLLMs still have a large room for improvement, but also reveals the potential directions for the subsequent model optimization.Comment: https://github.com/BradyFU/Awesome-Multimodal-Large-Language-Model

    A Challenger to GPT-4V? Early Explorations of Gemini in Visual Expertise

    Full text link
    The surge of interest towards Multi-modal Large Language Models (MLLMs), e.g., GPT-4V(ision) from OpenAI, has marked a significant trend in both academia and industry. They endow Large Language Models (LLMs) with powerful capabilities in visual understanding, enabling them to tackle diverse multi-modal tasks. Very recently, Google released Gemini, its newest and most capable MLLM built from the ground up for multi-modality. In light of the superior reasoning capabilities, can Gemini challenge GPT-4V's leading position in multi-modal learning? In this paper, we present a preliminary exploration of Gemini Pro's visual understanding proficiency, which comprehensively covers four domains: fundamental perception, advanced cognition, challenging vision tasks, and various expert capacities. We compare Gemini Pro with the state-of-the-art GPT-4V to evaluate its upper limits, along with the latest open-sourced MLLM, Sphinx, which reveals the gap between manual efforts and black-box systems. The qualitative samples indicate that, while GPT-4V and Gemini showcase different answering styles and preferences, they can exhibit comparable visual reasoning capabilities, and Sphinx still trails behind them concerning domain generalizability. Specifically, GPT-4V tends to elaborate detailed explanations and intermediate steps, and Gemini prefers to output a direct and concise answer. The quantitative evaluation on the popular MME benchmark also demonstrates the potential of Gemini to be a strong challenger to GPT-4V. Our early investigation of Gemini also observes some common issues of MLLMs, indicating that there still remains a considerable distance towards artificial general intelligence. Our project for tracking the progress of MLLM is released at https://github.com/BradyFU/Awesome-Multimodal-Large-Language-Models.Comment: Total 120 pages. See our project at https://github.com/BradyFU/Awesome-Multimodal-Large-Language-Model

    CDK5-dependent BAG3 degradation modulates synaptic protein turnover

    Get PDF
    阿尔茨海默病(AD)是严重威胁人类健康的重大神经系统疾病,AD的发生发展与衰老密切相关,目前临床治疗方法十分有限。因此迫切需要从AD致病早期入手,发现和鉴定导致AD神经功能紊乱的机制和靶点,为AD的早期防治提供基础。张杰教授及其团队从高通量磷酸化蛋白质组学入手,系统研究了CDK5在神经细胞中的磷酸化底物,鉴定出了在蛋白质量控制中发挥重要功能的BAG3蛋白是CDK5的全新底物。课题组从磷酸化蛋白质组学入手,发现和阐明了细胞周期蛋白激酶5(CDK5)通过调控BAG3在维持突触蛋白水平调控中的作用机制,及其在阿尔茨海默病(AD)发生发展中的机理。 该研究是多个团队历时8年合作完成的,香港中文大学的周熙文教授、美国匹兹堡大学的Karl Herrup教授、美国Sanford-Burnham研究所的许华曦教授、美国梅奥医学中心的卜国军教授,厦门大学医学院的文磊教授、张云武教授、赵颖俊教授、薛茂强教授,军事医学科学院的袁增强教授等都参与了该工作。 厦门大学医学院2012级博士生周杰超等为文章的第一作者,张杰教授为通讯作者。Background Synaptic protein dyshomeostasis and functional loss is an early invariant feature of Alzheimer’s disease (AD), yet the unifying etiological pathway remains largely unknown. Knowing that cyclin-dependent kinase 5 (CDK5) plays critical roles in synaptic formation and degeneration, its phosphorylation targets were re-examined in search for candidates with direct global impacts on synaptic protein dynamics, and the associated regulatory network was also analyzed. Methods Quantitative phospho-proteomics and bioinformatics analyses were performed to identify top-ranked candidates. A series of biochemical assays were used to investigate the associated regulatory signaling networks. Histological, electrochemical and behavioral assays were performed in conditional knockout, shRNA-mediated knockdown and AD-related mice models to evaluate its relevance to synaptic homeostasis and functions. Results Among candidates with known implications in synaptic modulations, BCL2-associated athanogene-3 (BAG3) ranked the highest. CDK5-mediated phosphorylation on Ser297/Ser291 (Mouse/Human) destabilized BAG3. Loss of BAG3 unleashed the selective protein degradative function of the HSP70 machinery. In neurons, this resulted in enhanced degradation of a number of glutamatergic synaptic proteins. Conditional neuronal knockout of Bag3 in vivo led to impairment of learning and memory functions. In human AD and related-mouse models, aberrant CDK5-mediated loss of BAG3 yielded similar effects on synaptic homeostasis. Detrimental effects of BAG3 loss on learning and memory functions were confirmed in these mice, and such were reversed by ectopic BAG3 re-expression. Conclusions Our results highlight that neuronal CDK5-BAG3-HSP70 signaling axis plays a critical role in modulating synaptic homeostasis. Dysregulation of the signaling pathway directly contributes to synaptic dysfunction and AD pathogenesis.This work was supported by the National Science Foundation in China (Grant: 31571055, 81522016, 81271421 to J.Z.; 81801337 to L.L; 81774377 and 81373999 to L.W.); Fundamental Research Funds for the Central Universities of China-Xiamen University (Grant: 20720150062, 20720180049 and 20720160075 to J.Z.); Fundamental Research Funds for Fujian Province University Leading Talents (Grant JAT170003 to L.L); Hong Kong Research Grants Council (HKUST12/CRF/13G, GRF660813, GRF16101315, AoE/M-05/12 to K.H.; GRF16103317, GRF16100718 and GRF16100219 to H.-M,C.); Offices of Provost, VPRG and Dean of Science, HKUST (VPRGO12SC02 to K.H.); Chinese University of Hong Kong (CUHK) Improvement on Competitiveness in Hiring New Faculty Funding Scheme (Ref. 133), CUHK Faculty Startup Fund and Alzheimer’s Association Research Fellowship (AARF-17-531566) to H.-M, C. 该研究受到了国家自然科学基金、厦门大学校长基金、福建省卫生教育联合攻关基金等的资助

    Mengdan Sun's Quick Files

    No full text
    The Quick Files feature was discontinued and it’s files were migrated into this Project on March 11, 2022. The file URL’s will still resolve properly, and the Quick Files logs are available in the Project’s Recent Activity

    Competitive dendrite growth during directional solidification of a transparent alloy: Modeling and experiment

    No full text
    A two-dimensional (2-D) cellular automaton-finite difference method (CA-FDM) model and in situ observation experiments of directional solidification using a transparent alloy of SCN-2wt.% ACE are employed to investigate various microstructural evolution of columnar dendrites during directional solidification. In the present model, the growth of columnar dendrites is simulated using a CA technique. The solute diffusion is solved using the FDM. The model is capable of visualizing the interaction between the formation of dendrite arrays with identical or different growth orientations, and the evolving solute concentration field. Several dendritic competitive growth modes between two converging and diverging dendrite arrays are reproduced. The simulation results agree well with the experimental observations. The simulations are also performed to study the effects of temperature gradient and cooling rate on the growth morphology of diverging dendrites. It is found that with the increase of temperature gradient and cooling rate, the tertiary branches produced from the well-developed side branches of the unfavorably oriented grain at the divergent grain boundaries are more likely to become the new primary dendrite arms

    A More Realistic Markov Process Model for Explaining the Disjunction Effect in One-Shot Prisoner’s Dilemma Game

    No full text
    The quantum model has been considered to be advantageous over the Markov model in explaining irrational behaviors (e.g., the disjunction effect) during decision making. Here, we reviewed and re-examined the ability of the quantum belief–action entanglement (BAE) model and the Markov belief–action (BA) model in explaining the disjunction effect considering a more realistic setting. The results indicate that neither of the two models can truly represent the underlying cognitive mechanism. Thus, we proposed a more realistic Markov model to explain the disjunction effect in the prisoner’s dilemma game. In this model, the probability transition pattern of a decision maker (DM) is dependent on the information about the opponent’s action, Also, the relationship between the cognitive components in the evolution dynamics is moderated by the DM’s degree of subjective uncertainty (DSN). The results show that the disjunction effect can be well predicted by a more realistic Markov model. Model comparison suggests the superiority of the proposed Markov model over the quantum BAE model in terms of absolute model performance, relative model performance, and model flexibility. Therefore, we suggest that the key to successfully explaining the disjunction effect is to consider the underlying cognitive mechanism properly

    A More Realistic Markov Process Model for Explaining the Disjunction Effect in One-Shot Prisoner’s Dilemma Game

    No full text
    The quantum model has been considered to be advantageous over the Markov model in explaining irrational behaviors (e.g., the disjunction effect) during decision making. Here, we reviewed and re-examined the ability of the quantum belief–action entanglement (BAE) model and the Markov belief–action (BA) model in explaining the disjunction effect considering a more realistic setting. The results indicate that neither of the two models can truly represent the underlying cognitive mechanism. Thus, we proposed a more realistic Markov model to explain the disjunction effect in the prisoner’s dilemma game. In this model, the probability transition pattern of a decision maker (DM) is dependent on the information about the opponent’s action, Also, the relationship between the cognitive components in the evolution dynamics is moderated by the DM’s degree of subjective uncertainty (DSN). The results show that the disjunction effect can be well predicted by a more realistic Markov model. Model comparison suggests the superiority of the proposed Markov model over the quantum BAE model in terms of absolute model performance, relative model performance, and model flexibility. Therefore, we suggest that the key to successfully explaining the disjunction effect is to consider the underlying cognitive mechanism properly

    Providing depth information in the display for pursuit and compensatory tracking and optimization in 3‐D space

    No full text
    Objective The formats of tracking displays exert important influences on tracking performance. Few previous studies explored the 3‐D tracking display formats. The present study aimed to construct the 3‐D formats for the manual pursuit and compensatory tracking displays by adding the depth information. Based on the results of tracking performance, we further optimized the preferable tracking format. Method Three experiments were conducted. Experiment 1 was a confirmatory experiment to compare the effects of the two display formats on 2‐D manual tracking performance with previous studies. Experiment 2 extended the investigation to a 3‐D display by adding a depth cue indicating the relative size of the control marker and target. Experiment 3 was an optimisation experiment in which an improved 3‐D tracking display was modified, i.e., an extra depth cue was complemented to clearly signify the relative position of the target and the control marker. Results Pursuit tracking performance was better than compensatory tracking performance in both 2‐D (Experiment 1) and 3‐D space (Experiment 2). It also found that the extra depth cue significantly improved the tracking success rate and the subjective satisfaction of the pursuit display format in 3‐D space (Experiment 3). Conclusions These findings indicated that the depth cues could be used in tracking display in 3‐D space and have important implications for the design of some motor training and tracking systems

    The applicability of eye‐controlled highlighting to the field of visual searching

    No full text
    Objective With the increasing amount of information presented on current human–computer interfaces, eye‐controlled highlighting has been proposed, as a new display technique, to optimise users’ task performances. However, it is unknown to what extent the eye‐controlled highlighting display facilitates visual search performance. The current study examined the facilitative effect of eye‐controlled highlighting display technique on visual search with two major attributes of visual stimuli: stimulus type and the visual similarity between targets and distractors. Method In Experiment 1, we used digits and Chinese words as materials to explore the generalisation of the facilitative effect of the eye‐controlled highlighting. In Experiment 2, we used Chinese words to examine the effect of target‐distractor similarity on the facilitation of eye‐controlled highlighting display. Results The eye‐controlling highlighting display improved visual search performance when words were used as searching target and when the target‐distractor similarity was high. No facilitative effect was found when digits were used as searching target or target‐distractor similarity was low. Conclusions The effectiveness of the eye‐controlled highlighting on a visual task was influenced by both stimulus type and target‐distractor similarity. These findings provided guidelines for modern interface design with eye‐based displays implemented
    corecore