Search CORE

58 research outputs found

MME: A Comprehensive Evaluation Benchmark for Multimodal Large Language Models

Author: Chen Peixian
Fu Chaoyou
Ji Rongrong
Li Ke
Lin Wei
Lin Xu
Qin Yulei
Qiu Zhenyu
Shen Yunhang
Sun Xing
Yang Jinrui
Zhang Mengdan
Zheng Xiawu
Publication venue
Publication date: 23/06/2023
Field of study

Multimodal Large Language Model (MLLM) relies on the powerful LLM to perform multimodal tasks, showing amazing emergent abilities in recent studies, such as writing poems based on an image. However, it is difficult for these case studies to fully reflect the performance of MLLM, lacking a comprehensive evaluation. In this paper, we fill in this blank, presenting the first MLLM Evaluation benchmark MME. It measures both perception and cognition abilities on a total of 14 subtasks. In order to avoid data leakage that may arise from direct use of public datasets for evaluation, the annotations of instruction-answer pairs are all manually designed. The concise instruction design allows us to fairly compare MLLMs, instead of struggling in prompt engineering. Besides, with such an instruction, we can also easily carry out quantitative statistics. A total of 10 advanced MLLMs are comprehensively evaluated on our MME, which not only suggests that existing MLLMs still have a large room for improvement, but also reveals the potential directions for the subsequent model optimization.Comment: https://github.com/BradyFU/Awesome-Multimodal-Large-Language-Model

arXiv.org e-Print Archive

A Challenger to GPT-4V? Early Explorations of Gemini in Visual Expertise

Author: Chen Peixian
Fu Chaoyou
Gao Peng
Huang Yubo
Jiang Deqiang
Li Hongsheng
Li Ke
Lin Shaohui
Qiu Longtian
Shen Yunhang
Sun Xing
Wang Zihan
Ye Gaoxiang
Yin Di
Zhang Mengdan
Zhang Renrui
Zhang Zhengye
Zhao Sirui
Publication venue
Publication date: 20/12/2023
Field of study

The surge of interest towards Multi-modal Large Language Models (MLLMs), e.g., GPT-4V(ision) from OpenAI, has marked a significant trend in both academia and industry. They endow Large Language Models (LLMs) with powerful capabilities in visual understanding, enabling them to tackle diverse multi-modal tasks. Very recently, Google released Gemini, its newest and most capable MLLM built from the ground up for multi-modality. In light of the superior reasoning capabilities, can Gemini challenge GPT-4V's leading position in multi-modal learning? In this paper, we present a preliminary exploration of Gemini Pro's visual understanding proficiency, which comprehensively covers four domains: fundamental perception, advanced cognition, challenging vision tasks, and various expert capacities. We compare Gemini Pro with the state-of-the-art GPT-4V to evaluate its upper limits, along with the latest open-sourced MLLM, Sphinx, which reveals the gap between manual efforts and black-box systems. The qualitative samples indicate that, while GPT-4V and Gemini showcase different answering styles and preferences, they can exhibit comparable visual reasoning capabilities, and Sphinx still trails behind them concerning domain generalizability. Specifically, GPT-4V tends to elaborate detailed explanations and intermediate steps, and Gemini prefers to output a direct and concise answer. The quantitative evaluation on the popular MME benchmark also demonstrates the potential of Gemini to be a strong challenger to GPT-4V. Our early investigation of Gemini also observes some common issues of MLLMs, indicating that there still remains a considerable distance towards artificial general intelligence. Our project for tracking the progress of MLLM is released at https://github.com/BradyFU/Awesome-Multimodal-Large-Language-Models.Comment: Total 120 pages. See our project at https://github.com/BradyFU/Awesome-Multimodal-Large-Language-Model

arXiv.org e-Print Archive

Direct Modification of Multiple Gene homoeologs in Brassica Oleracea and Brassica Napus Using Doubled Haploid Inducer-Mediated Genome-Editing System

Author: Cheng HongTao
Chu Wen
Fu Li
Fu ShaoHong
Hao MengYu
Hu Qiong
Hu XueZhi
Li Chao
Li Yun
Liu Jia
Mei DeSheng
Sang Shifei
Shi YuQin
Sun MengDan
Wang Hui
Wang WenXiang
Yang Jin
Zhang BaoHong
Zhang HaiYan
Publication venue: 'Wiley'
Publication date: 01/01/2021
Field of study

Crossref

ScholarShip

CDK5-dependent BAG3 degradation modulates synaptic protein turnover

Author: Di Wu
Guanyun Zhang
Guimiao Chen
Guojun Bu
Hao Sun
Hei-Man Chow
Huaxi Xu
Hui Lin
Huifang Li
Jie Zhang
Jiechao Zhou
Jieyin Li
Kai Zhuang
Kai Zhuang
Karl Herrup
Lei Wen
Lige Leng
Maoqiang Xue
Meng Shi
Mengdan Wang
Naizhen Zheng
Timothy Y. Huang
Wenting Xie
Yan Liu
Yingjun Zhao
Yuehong Gao
Yunwu Zhang
Zengqiang Yuan
卜国军
周杰超
张云武
张杰
文磊
薛茂强
袁增强
许华曦
赵颖俊
Publication venue: 'Elsevier BV'
Publication date: 01/11/2019
Field of study

阿尔茨海默病（AD）是严重威胁人类健康的重大神经系统疾病，AD的发生发展与衰老密切相关，目前临床治疗方法十分有限。因此迫切需要从AD致病早期入手，发现和鉴定导致AD神经功能紊乱的机制和靶点，为AD的早期防治提供基础。张杰教授及其团队从高通量磷酸化蛋白质组学入手，系统研究了CDK5在神经细胞中的磷酸化底物，鉴定出了在蛋白质量控制中发挥重要功能的BAG3蛋白是CDK5的全新底物。课题组从磷酸化蛋白质组学入手，发现和阐明了细胞周期蛋白激酶5（CDK5）通过调控BAG3在维持突触蛋白水平调控中的作用机制，及其在阿尔茨海默病（AD）发生发展中的机理。该研究是多个团队历时8年合作完成的，香港中文大学的周熙文教授、美国匹兹堡大学的Karl Herrup教授、美国Sanford-Burnham研究所的许华曦教授、美国梅奥医学中心的卜国军教授，厦门大学医学院的文磊教授、张云武教授、赵颖俊教授、薛茂强教授，军事医学科学院的袁增强教授等都参与了该工作。厦门大学医学院2012级博士生周杰超等为文章的第一作者，张杰教授为通讯作者。Background Synaptic protein dyshomeostasis and functional loss is an early invariant feature of Alzheimer’s disease (AD), yet the unifying etiological pathway remains largely unknown. Knowing that cyclin-dependent kinase 5 (CDK5) plays critical roles in synaptic formation and degeneration, its phosphorylation targets were re-examined in search for candidates with direct global impacts on synaptic protein dynamics, and the associated regulatory network was also analyzed. Methods Quantitative phospho-proteomics and bioinformatics analyses were performed to identify top-ranked candidates. A series of biochemical assays were used to investigate the associated regulatory signaling networks. Histological, electrochemical and behavioral assays were performed in conditional knockout, shRNA-mediated knockdown and AD-related mice models to evaluate its relevance to synaptic homeostasis and functions. Results Among candidates with known implications in synaptic modulations, BCL2-associated athanogene-3 (BAG3) ranked the highest. CDK5-mediated phosphorylation on Ser297/Ser291 (Mouse/Human) destabilized BAG3. Loss of BAG3 unleashed the selective protein degradative function of the HSP70 machinery. In neurons, this resulted in enhanced degradation of a number of glutamatergic synaptic proteins. Conditional neuronal knockout of Bag3 in vivo led to impairment of learning and memory functions. In human AD and related-mouse models, aberrant CDK5-mediated loss of BAG3 yielded similar effects on synaptic homeostasis. Detrimental effects of BAG3 loss on learning and memory functions were confirmed in these mice, and such were reversed by ectopic BAG3 re-expression. Conclusions Our results highlight that neuronal CDK5-BAG3-HSP70 signaling axis plays a critical role in modulating synaptic homeostasis. Dysregulation of the signaling pathway directly contributes to synaptic dysfunction and AD pathogenesis.This work was supported by the National Science Foundation in China (Grant: 31571055, 81522016, 81271421 to J.Z.; 81801337 to L.L; 81774377 and 81373999 to L.W.); Fundamental Research Funds for the Central Universities of China-Xiamen University (Grant: 20720150062, 20720180049 and 20720160075 to J.Z.); Fundamental Research Funds for Fujian Province University Leading Talents (Grant JAT170003 to L.L); Hong Kong Research Grants Council (HKUST12/CRF/13G, GRF660813, GRF16101315, AoE/M-05/12 to K.H.; GRF16103317, GRF16100718 and GRF16100219 to H.-M,C.); Offices of Provost, VPRG and Dean of Science, HKUST (VPRGO12SC02 to K.H.); Chinese University of Hong Kong (CUHK) Improvement on Competitiveness in Hiring New Faculty Funding Scheme (Ref. 133), CUHK Faculty Startup Fund and Alzheimer’s Association Research Fellowship (AARF-17-531566) to H.-M, C. 该研究受到了国家自然科学基金、厦门大学校长基金、福建省卫生教育联合攻关基金等的资助

Xiamen University Institutional Repository

Mengdan Sun's Quick Files

Author: Mengdan Sun
Publication venue: 'Center for Open Science'
Publication date: 23/03/2021
Field of study

The Quick Files feature was discontinued and it’s files were migrated into this Project on March 11, 2022. The file URL’s will still resolve properly, and the Quick Files logs are available in the Project’s Recent Activity

OSF Preprints

Competitive dendrite growth during directional solidification of a transparent alloy: Modeling and experiment

Author: Chang Sun
Hui Fang
Mengdan Hu
Mingfang Zhu
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 02/03/2020
Field of study

A two-dimensional (2-D) cellular automaton-finite difference method (CA-FDM) model and in situ observation experiments of directional solidification using a transparent alloy of SCN-2wt.% ACE are employed to investigate various microstructural evolution of columnar dendrites during directional solidification. In the present model, the growth of columnar dendrites is simulated using a CA technique. The solute diffusion is solved using the FDM. The model is capable of visualizing the interaction between the formation of dendrite arrays with identical or different growth orientations, and the evolving solute concentration field. Several dendritic competitive growth modes between two converging and diverging dendrite arrays are reproduced. The simulation results agree well with the experimental observations. The simulations are also performed to study the effects of temperature gradient and cooling rate on the growth morphology of diverging dendrites. It is found that with the increase of temperature gradient and cooling rate, the tertiary branches produced from the well-developed side branches of the unfavorably oriented grain at the divergent grain boundaries are more likely to become the new primary dendrite arms

EDP Sciences OAI-PMH repository (1.2.0)

A More Realistic Markov Process Model for Explaining the Disjunction Effect in One-Shot Prisoner’s Dilemma Game

Author: Bo Liu
Mengdan Sun
Xiaoqing Gao
Xiaoyang Xin
Ying Li
Publication venue: 'MDPI AG'
Publication date: 06/03/2022
Field of study

The quantum model has been considered to be advantageous over the Markov model in explaining irrational behaviors (e.g., the disjunction effect) during decision making. Here, we reviewed and re-examined the ability of the quantum belief–action entanglement (BAE) model and the Markov belief–action (BA) model in explaining the disjunction effect considering a more realistic setting. The results indicate that neither of the two models can truly represent the underlying cognitive mechanism. Thus, we proposed a more realistic Markov model to explain the disjunction effect in the prisoner’s dilemma game. In this model, the probability transition pattern of a decision maker (DM) is dependent on the information about the opponent’s action, Also, the relationship between the cognitive components in the evolution dynamics is moderated by the DM’s degree of subjective uncertainty (DSN). The results show that the disjunction effect can be well predicted by a more realistic Markov model. Model comparison suggests the superiority of the proposed Markov model over the quantum BAE model in terms of absolute model performance, relative model performance, and model flexibility. Therefore, we suggest that the key to successfully explaining the disjunction effect is to consider the underlying cognitive mechanism properly

Multidisciplinary Digital Publishing Institute

A More Realistic Markov Process Model for Explaining the Disjunction Effect in One-Shot Prisoner’s Dilemma Game

Author: Bo Liu
Mengdan Sun
Xiaoqing Gao
Xiaoyang Xin
Ying Li
Publication venue: MDPI AG
Publication date: 01/03/2022
Field of study

Directory of Open Access Journals

Providing depth information in the display for pursuit and compensatory tracking and optimization in 3‐D space

Author: Duming Wang
Haili Ye
Hongyan Liu
Liezhong Ge
Mengdan Sun
Publication venue: Taylor & Francis Group
Publication date: 01/03/2017
Field of study

Objective The formats of tracking displays exert important influences on tracking performance. Few previous studies explored the 3‐D tracking display formats. The present study aimed to construct the 3‐D formats for the manual pursuit and compensatory tracking displays by adding the depth information. Based on the results of tracking performance, we further optimized the preferable tracking format. Method Three experiments were conducted. Experiment 1 was a confirmatory experiment to compare the effects of the two display formats on 2‐D manual tracking performance with previous studies. Experiment 2 extended the investigation to a 3‐D display by adding a depth cue indicating the relative size of the control marker and target. Experiment 3 was an optimisation experiment in which an improved 3‐D tracking display was modified, i.e., an extra depth cue was complemented to clearly signify the relative position of the target and the control marker. Results Pursuit tracking performance was better than compensatory tracking performance in both 2‐D (Experiment 1) and 3‐D space (Experiment 2). It also found that the extra depth cue significantly improved the tracking success rate and the subjective satisfaction of the pursuit display format in 3‐D space (Experiment 3). Conclusions These findings indicated that the depth cues could be used in tracking display in 3‐D space and have important implications for the design of some motor training and tracking systems

Directory of Open Access Journals

The applicability of eye‐controlled highlighting to the field of visual searching

Author: Hongyan Liu
Li Wang
Liezhong Ge
Mengdan Sun
Qijun Wang
Yunxian Pan
Publication venue: Taylor & Francis Group
Publication date: 01/09/2018
Field of study

Objective With the increasing amount of information presented on current human–computer interfaces, eye‐controlled highlighting has been proposed, as a new display technique, to optimise users’ task performances. However, it is unknown to what extent the eye‐controlled highlighting display facilitates visual search performance. The current study examined the facilitative effect of eye‐controlled highlighting display technique on visual search with two major attributes of visual stimuli: stimulus type and the visual similarity between targets and distractors. Method In Experiment 1, we used digits and Chinese words as materials to explore the generalisation of the facilitative effect of the eye‐controlled highlighting. In Experiment 2, we used Chinese words to examine the effect of target‐distractor similarity on the facilitation of eye‐controlled highlighting display. Results The eye‐controlling highlighting display improved visual search performance when words were used as searching target and when the target‐distractor similarity was high. No facilitative effect was found when digits were used as searching target or target‐distractor similarity was low. Conclusions The effectiveness of the eye‐controlled highlighting on a visual task was influenced by both stimulus type and target‐distractor similarity. These findings provided guidelines for modern interface design with eye‐based displays implemented

Directory of Open Access Journals