Search CORE

18 research outputs found

Knowledge Distillation for Closed-Source Language Models

Author: Chen Hehong
Chen Hongzhan
Quan Xiaojun
Yan Ming
Zhang Ji
Publication venue
Publication date: 13/01/2024
Field of study

Closed-source language models such as GPT-4 have achieved remarkable performance. Many recent studies focus on enhancing the capabilities of smaller models through knowledge distillation from closed-source language models. However, due to the incapability to directly access the weights, hidden states, and output distributions of these closed-source models, the distillation can only be performed by fine-tuning smaller models with data samples generated by closed-source language models, which constrains the effectiveness of knowledge distillation. In this paper, we propose to estimate the output distributions of closed-source language models within a Bayesian estimation framework, involving both prior and posterior estimation. The prior estimation aims to derive a prior distribution by utilizing the corpus generated by closed-source language models, while the posterior estimation employs a proxy model to update the prior distribution and derive a posterior distribution. By leveraging the estimated output distribution of closed-source language models, traditional knowledge distillation can be executed. Experimental results demonstrate that our method surpasses the performance of current models directly fine-tuned on data generated by closed-source language models

arXiv.org e-Print Archive

Coordination control and analysis of TCSC devices to protect electrical power systems against disruptive disturbances

Author: Chen Guanghou
Wang Zhaoxu
Xiao Gaoxi
Xu Yulin
Zhai Chao
Zhang Hehong
Publication venue: 'Institute of Information Theory and Automation'
Publication date: 01/01/2022
Field of study

summary:In this work, we study coordination control and effective deployment of thyristor-controlled series compensation (TCSC) to protect power grids against disruptive disturbances. The power grid consists of flexible alternate current transmission systems (FACTS) devices for regulating power flow, phasor measurement units (PMUs) for detecting system states, and control station for generating the regulation signals. We propose a novel coordination control approach of TCSC devices to change branch impedance and regulate the power flow against unexpected disturbances on buses or branches. More significantly, a numerical method is developed to estimate a gradient vector for generating regulation signals of TCSC devices and reducing computational costs. To describe the degree of power system stress, a performance index is designed based on the error between the desired power flow and actual values. Moreover, technical analysis is presented to ensure the convergence of the proposed coordination control algorithm. Numerical simulations are implemented to substantiate that the coordination control approach can effectively alleviate the stress caused by contingencies on IEEE 24 bus system, as compared to the classic PID control. It is also demonstrated that the deployment of TCSCs can alleviate the system stress greatly by considering both impedance magnitude and active power on branches

Institute of Mathematics AS CR, v. v. i.

DR-NTU (Digital Repository of NTU)

Small LLMs Are Weak Tool Learners: A Multi-LLM Agent

Author: Chen Hehong
Chen Hongzhan
Huang Fei
Li Chenliang
Quan Xiaojun
Shen Weizhou
Yan Ming
Zhang Ji
Publication venue
Publication date: 16/02/2024
Field of study

Large Language Model (LLM) agents significantly extend the capabilities of standalone LLMs, empowering them to interact with external tools (e.g., APIs, functions) and complete various tasks in a self-directed fashion. The challenge of tool use demands that LLMs not only understand user queries and generate answers accurately but also excel in task planning, tool invocation, and result summarization. While traditional works focus on training a single LLM with all these capabilities, performance limitations become apparent, particularly with smaller models. To overcome these challenges, we propose a novel approach that decomposes the aforementioned capabilities into a planner, caller, and summarizer. Each component is implemented by a single LLM that focuses on a specific capability and collaborates with others to accomplish the task. This modular framework facilitates individual updates and the potential use of smaller LLMs for building each capability. To effectively train this framework, we introduce a two-stage training paradigm. First, we fine-tune a backbone LLM on the entire dataset without discriminating sub-tasks, providing the model with a comprehensive understanding of the task. Second, the fine-tuned LLM is used to instantiate the planner, caller, and summarizer respectively, which are continually fine-tuned on respective sub-tasks. Evaluation across various tool-use benchmarks illustrates that our proposed multi-LLM framework surpasses the traditional single-LLM approach, highlighting its efficacy and advantages in tool learning.Comment: On progress, github repo: https://github.com/X-PLUG/Multi-LLM-Agen

arXiv.org e-Print Archive

ModelScope-Agent: Building Your Customizable Agent System with Open-source Large Language Models

Author: Chen Hehong
Chen Yingda
Cheng Chen
Huang Fei
Li Chenliang
Shen Weizhou
Shi Hongzhu
Wu Zhikai
Xu Haiyang
Yan Ming
Zhang Ji
Zhang Zhicheng
Zhou Jingren
Zhou Wenmeng
Publication venue
Publication date: 02/09/2023
Field of study

Large language models (LLMs) have recently demonstrated remarkable capabilities to comprehend human intentions, engage in reasoning, and design planning-like behavior. To further unleash the power of LLMs to accomplish complex tasks, there is a growing trend to build agent framework that equips LLMs, such as ChatGPT, with tool-use abilities to connect with massive external APIs. In this work, we introduce ModelScope-Agent, a general and customizable agent framework for real-world applications, based on open-source LLMs as controllers. It provides a user-friendly system library, with customizable engine design to support model training on multiple open-source LLMs, while also enabling seamless integration with both model APIs and common APIs in a unified way. To equip the LLMs with tool-use abilities, a comprehensive framework has been proposed spanning over tool-use data collection, tool retrieval, tool registration, memory control, customized model training, and evaluation for practical real-world applications. Finally, we showcase ModelScopeGPT, a real-world intelligent assistant of ModelScope Community based on the ModelScope-Agent framework, which is able to connect open-source LLMs with more than 1000 public AI models and localized community knowledge in ModelScope. The ModelScope-Agent library\footnote{https://github.com/modelscope/modelscope-agent} and online demo\footnote{https://modelscope.cn/studios/damo/ModelScopeGPT/summary} are now publicly available

arXiv.org e-Print Archive

Construction and Applications of Billion-Scale Pre-trained Multimodal Business Knowledge Graph

Author: Chen Hehong
Chen Huajun
Chen Jiaoyan
Chen Mosha
Chen Qiang
Dai Zelin
Deng Shumin
Hooi Bryan
Li Zhoubo
Pan Jeff Z.
Wang Chengming
Xiong Feiyu
Yan Ming
Zhang Ningyu
Publication venue
Publication date: 27/10/2022
Field of study

Business Knowledge Graphs (KGs) are important to many enterprises today, providing factual knowledge and structured data that steer many products and make them more intelligent. Despite their promising benefits, building business KG necessitates solving prohibitive issues of deficient structure and multiple modalities. In this paper, we advance the understanding of the practical challenges related to building KG in non-trivial real-world systems. We introduce the process of building an open business knowledge graph (OpenBG) derived from a well-known enterprise, Alibaba Group. Specifically, we define a core ontology to cover various abstract products and consumption demands, with fine-grained taxonomy and multimodal facts in deployed applications. OpenBG is an open business KG of unprecedented scale: 2.6 billion triples with more than 88 million entities covering over 1 million core classes/concepts and 2,681 types of relations. We release all the open resources (OpenBG benchmarks) derived from it for the community and report experimental results of KG-centric tasks. We also run up an online competition based on OpenBG benchmarks, and has attracted thousands of teams. We further pre-train OpenBG and apply it to many KG- enhanced downstream tasks in business scenarios, demonstrating the effectiveness of billion-scale multimodal knowledge for e-commerce. All the resources with codes have been released at \url{https://github.com/OpenBGBenchmark/OpenBG}.Comment: OpenBG. Work in Progres

arXiv.org e-Print Archive

mPLUG-Owl: Modularization Empowers Large Language Models with Multimodality

Author: Chen Hehong
Hu Anwen
Huang Fei
Li Chenliang
Qian Qi
Shi Pengcheng
Shi Yaya
Tian Junfeng
Wang Junyang
Xu Guohai
Xu Haiyang
Xu Yuanhong
Yan Ming
Ye Jiabo
Ye Qinghao
Zhang Ji
Zhou Jingren
Zhou Yiyang
Publication venue
Publication date: 29/03/2024
Field of study

Large language models (LLMs) have demonstrated impressive zero-shot abilities on a variety of open-ended tasks, while recent research has also explored the use of LLMs for multi-modal generation. In this study, we introduce mPLUG-Owl, a novel training paradigm that equips LLMs with multi-modal abilities through modularized learning of foundation LLM, a visual knowledge module, and a visual abstractor module. This approach can support multiple modalities and facilitate diverse unimodal and multimodal abilities through modality collaboration. The training paradigm of mPLUG-Owl involves a two-stage method for aligning image and text, which learns visual knowledge with the assistance of LLM while maintaining and even improving the generation abilities of LLM. In the first stage, the visual knowledge module and abstractor module are trained with a frozen LLM module to align the image and text. In the second stage, language-only and multi-modal supervised datasets are used to jointly fine-tune a low-rank adaption (LoRA) module on LLM and the abstractor module by freezing the visual knowledge module. We carefully build a visually-related instruction evaluation set OwlEval. Experimental results show that our model outperforms existing multi-modal models, demonstrating mPLUG-Owl's impressive instruction and visual understanding ability, multi-turn conversation ability, and knowledge reasoning ability. Besides, we observe some unexpected and exciting abilities such as multi-image correlation and scene text understanding, which makes it possible to leverage it for harder real scenarios, such as vision-only document comprehension. Our code, pre-trained model, instruction-tuned models, and evaluation set are available at https://github.com/X-PLUG/mPLUG-Owl. The online demo is available at https://www.modelscope.cn/studios/damo/mPLUG-Owl.Comment: Working in Proces

arXiv.org e-Print Archive

Direct enzymatic ethanolysis of potential Nannochloropsis biomass for co-production of sustainable biodiesel and nutraceutical eicosapentaenoic acid

Author: Bilian Chen
Feng Chen
Hehong Wei
Jianzhi Zhang
Xiaofei Wang
Yongjin He
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/04/2019
Field of study

Abstract Background Marine microalga Nannochloropsis is a promising source for the production of renewable and sustainable biodiesel in replacement of depleting petroleum. Other than biodiesel, Nannochloropsis is a green and potential resource for the commercial production of nutraceutical eicosapentaenoic acid (EPA, C20:5). In recent studies, low-value biodiesel can be achieved by transesterification of Nannochloropsis biomass. However, it is undoubtedly wasteful to produce microalgal biodiesel containing EPA from nutritional and economical aspects. A new strategy was addressed and exploited to produce low-value bulky biodiesel along with EPA enrichment via enzymatic ethanolysis of Nannochloropsis biomass with a specific lipase. Results Cellulase pretreatment on Nannochloropsis sp. biomass significantly improved the biodiesel conversion by direct ethanolysis with five enzymes from Candida antarctica (CALA and CALB), Thermomyces lanuginosus (TL), Rhizomucor miehei (RM), and Aspergillus oryzae (PLA). Among these five biocatalysts, CALA was the best suitable enzyme to yield high biodiesel conversion and effectively enrich EPA. After optimization, the maximum biodiesel conversion (46.53–48.57%) was attained by CALA at 8:1 ethanol/biomass ratio (v/w) in 10–15% water content with 10% lipase weight at 35 °C for 72 h. Meanwhile, EPA (60.81%) was highly enriched in microalgae NPLs (neutral lipids and polar lipids), increasing original EPA levels by 1.51-fold. Moreover, this process was re-evaluated with two Nannochloropsis species (IMET1 and Salina 537). Under the optimized conditions, the biodiesel conversions of IMET1 and Salina 537 by CALA were 63.41% and 54.33%, respectively. EPA contents of microalgal NPLs were 50.06% for IMET1 and 53.73% for Salina 537. Conclusion CALA was the potential biocatalyst to discriminate against EPA in the ethanolysis of Nannochloropsis biomass. The biodiesel conversion and EPA enrich efficiency of CALA were greatly dependent on lipidic class and fatty acid compositions of Nannochloropsis biomass. CALA-catalyzed ethanolysis with Nannochloropsis biomass was a promising approach for co-production of low-value biodiesel and high-value microalgae products rich in EPA

Directory of Open Access Journals

Rice Black-streaked Dwarf Virus Preparation and Infection on Rice

Author: Chen Jianping
Sun Zongtao
Tan Xiaoxiang
Xie Kaili
Zhang Hehong
Zhang Hengmu
Publication venue: 'Bio-Protocol, LLC'
Publication date: 01/01/2017
Field of study

Rice black-streaked dwarf virus (RBSDV), a member of genus Fijivirus in the family Reoviridae, infects rice, maize, barley and wheat, and can seriously affect crop yields. RBSDV is transmitted by the small brown planthopper (Laodelphax striatellus, SBPH) in a persistent manner. RBSDV has 10 linear dsRNA genomic segments, making it difficult to construct infectious clones for functional studies in plants. Here we describe a method for inoculating and maintaining RBSDV on rice in a greenhouse for use in laboratory research. The protocol uses SBPHs mass reared in the laboratory. We also describe in detail the propagation of a healthy planthopper population, the preparation of plant material, RBSDV inoculation and the evaluation of the rice after inoculation

Crossref

Directory of Open Access Journals

PubMed Central

A Rice Receptor-like Protein Negatively Regulates Rice Resistance to Southern Rice Black-Streaked Dwarf Virus Infection

Author: Chaorui Huang
Fengmin Wang
Hehong Zhang
Jianping Chen
Weiqi Song
Yanjun Li
Zhongyan Wei
Zongtao Sun
Publication venue: 'MDPI AG'
Publication date: 01/04/2023
Field of study

Plants rely on various receptor-like proteins and receptor-like kinases to recognize and defend against invading pathogens. However, research on the role of receptor-like proteins in plant antiviral defense, particularly in rice–virus interactions, is limited. In this study, we identified a receptor-like gene, OsBAP1, which was significantly induced upon infection with southern rice black-streaked dwarf virus (SRBSDV) infection. A viral inoculation assay showed that the OsBAP1 knockout mutant exhibited enhanced resistance to SRBSDV infection, indicating that OsBAP1 plays a negatively regulated role in rice resistance to viral infection. Transcriptome analysis revealed that the genes involved in plant–pathogen interactions, plant hormone signal transduction, oxidation–reduction reactions, and protein phosphorylation pathways were significantly enriched in OsBAP1 mutant plants (osbap1-cas). Quantitative real-time PCR (RT-qPCR) analysis further demonstrated that some defense-related genes were significantly induced during SRBSDV infection in osbap1-cas mutants. Our findings provide new insights into the role of receptor-like proteins in plant immune signaling pathways, and demonstrate that OsBAP1 negatively regulates rice resistance to SRBSDV infection

Directory of Open Access Journals

Genome-Wide Identification and Gene Expression Analysis of the OTU DUB Family in <i>Oryza sativa</i>

Author: Hehong Zhang
Jianping Chen
Qiannan Liu
Tingyun Yan
Xiaoxiang Tan
Yanjun Li
Zhongyan Wei
Zongtao Sun
Publication venue: MDPI AG
Publication date: 01/02/2022
Field of study

Ovarian tumor domain (OTU)-containing deubiquitinating enzymes (DUBs) are an essential DUB to maintain protein stability in plants and play important roles in plant growth development and stress response. However, there is little genome-wide identification and analysis of the OTU gene family in rice. In this study, we identified 20 genes of the OTU family in rice genome, which were classified into four groups based on the phylogenetic analysis. Their gene structures, conserved motifs and domains, chromosomal distribution, and cis elements in promoters were further studied. In addition, OTU gene expression patterns in response to plant hormone treatments, including SA, MeJA, NAA, BL, and ABA, were investigated by RT-qPCR analysis. The results showed that the expression profile of OsOTU genes exhibited plant hormone-specific expression. Expression levels of most of the rice OTU genes were significantly changed in response to rice stripe virus (RSV), rice black-streaked dwarf virus (RBSDV), Southern rice black-streaked dwarf virus (SRBSDV), and Rice stripe mosaic virus (RSMV). These results suggest that the rice OTU genes are involved in diverse hormone signaling pathways and in varied responses to virus infection, providing new insights for further functional study of OsOTU genes

Directory of Open Access Journals

PubMed Central