Search CORE

58 research outputs found

Phy-chemical Attributes of Nano-scale V2O5/TiO2 Catalyst and Its’ Effect on Soot Oxidation

Author: Li Lichang
Mei Deqing
Yuan Yinnan
Zhao Xiang
Zhu Chen
Publication venue: 'Bulletin of Chemical Reaction Engineering and Catalysis'
Publication date: 01/08/2016
Field of study

The V2O5 catalysts which supported on nano-scale TiO2 with variation of vanadium contents (5%, 10%, 20% and 40%) were prepared by an incipient-wetness impregnation method. The phase structures of nano-scale V2O5/TiO2 catalysts with different loading rates were characterized by Scanning electron microscope (SEM), X-Ray diffraction (XRD) and Fourier transform infrared (FT-IR) spectra. The oxidation activities of catalysts over diesel soot were performed in a themogravimetric analysis (TGA) system. The kinetics of the catalytic oxidation process were analyzed based on Flynn-Wall-Ozawa method. The characterization results showed that the phase structure of V2O5 supported on TiO2 depends heavily on the vanadium contents, which will put great effects on the catalytic performances for soot oxidation. At a low vanadium loading rates (V5-V20), active species exist as monomers and polymeric states. At a high loading rate (V40), the crystalline bulk V2O5 covers the surface of TiO2. The formed crystal structure occupied the active sites and led a decreasing in the catalytic effect. By comparing the characteristics temperatures of soot oxidation over V2O5 catalysts, the catalytic activities of catalysts with different loading rates for soot oxidation can be ranked as: V5 < V10 < V40 < V20. Via pyrolysis kinetics analysis, it is revealed that the activation energy of soot oxidation is minimum when the vanadium loading rates is 20%, which is fit well with the TG experimental results. The consistency of pyrolysis kinetics and TG experimental results confirm that the best activity catalyst is V20 in discussed catalysts of this paper, which is nearest to the monolayer dispersion saturated state of V2O5/TiO2 catalyst. Moreover, it convincingly demonstrate the obvious threshold effect in V2O5 catalysts.

Crossref

Directory of Open Access Journals

Universitas Diponegoro: Undip E-Journal System (UEJS) Portal

Analysis of High Frequency Noise of Inverter Rotary Compressor

Author: Li Yonggui
Shen Hui
Sun Wanjie
Wang Yong
Xie Lichang
Publication venue: 'Purdue University (bepress)'
Publication date: 01/01/2018
Field of study

The inverter compressor driven by the inverter will cause high frequency noise, which will have adverse influence on total noise value and sound quality. In order to improve this problem, an existing compact rotary inverter compressor is studied in this paper. The influence law of inverter carrier wave of space vector pulse width modulation(SVPWM) technique on motor vibration and noise of compressor is analyzed and summarized. Combining order analysis and motor modal analysis, the results show that the high harmonic current induced by inverter carrier wave will produce high frequency electromagnetic force which excites the stator resonance, and finally results in high frequency noise of the compressor. Through optimization of the motor structure, the high frequency noise is reduced by more than 5dB(A), the sound quality is improved as well

Purdue E-Pubs

AlpaCare:Instruction-tuned Large Language Models for Medical Application

Author: Chen Lichang
Li Zekun
Petzold Linda Ruth
Tian Chenxin
Yang Xianjun
Zhang Xinlu
Publication venue
Publication date: 23/10/2023
Field of study

Large Language Models (LLMs) have demonstrated significant enhancements in instruction-following abilities through instruction tuning, achieving notable performances across various tasks. Previous research has focused on fine-tuning medical domain-specific LLMs using an extensive array of medical-specific data, incorporating millions of pieces of biomedical literature to augment their medical capabilities. However, existing medical instruction-tuned LLMs have been constrained by the limited scope of tasks and instructions available, restricting the efficacy of instruction tuning and adversely affecting performance in the general domain. In this paper, we fine-tune LLaMA-series models using 52k diverse, machine-generated, medical instruction-following data, MedInstruct-52k, resulting in the model AlpaCare. Comprehensive experimental results on both general and medical-specific domain free-form instruction evaluations showcase AlpaCare's strong medical proficiency and generalizability compared to previous instruction-tuned models in both medical and general domains. We provide public access to our MedInstruct-52k dataset and a clinician-crafted free-form instruction test set, MedInstruct-test, along with our codebase, to foster further research and development. Our project page is available at https://github.com/XZhang97666/AlpaCare

arXiv.org e-Print Archive

Reflection-Tuning: Data Recycling Improves LLM Instruction-Tuning

Author: Chen Jiuhai
Chen Lichang
Gu Jiuxiang
He Shwai
Huang Heng
Li Ming
Zhou Tianyi
Publication venue
Publication date: 18/10/2023
Field of study

Recent advancements in Large Language Models (LLMs) have expanded the horizons of natural language understanding and generation. Notably, the output control and alignment with the input of LLMs can be refined through instruction tuning. However, as highlighted in several studies, low-quality data in the training set are usually detrimental to instruction tuning, resulting in inconsistent or even misleading LLM outputs. We propose a novel method, termed "reflection-tuning," which addresses the problem by self-improvement and judging capabilities of LLMs. This approach utilizes an oracle LLM to recycle the original training data by introspecting and enhancing the quality of instructions and responses in the data. Extensive experiments on widely used evaluation benchmarks show that LLMs trained with our recycled data outperform those trained with existing datasets in various benchmarks

arXiv.org e-Print Archive

From Quantity to Quality: Boosting LLM Performance with Self-Guided Data Selection for Instruction Tuning

Author: Chen Jiuhai
Chen Lichang
Cheng Ning
Li Ming
Li Zhitao
Wang Jianzong
Xiao Jing
Zhang Yong
Zhou Tianyi
Publication venue
Publication date: 15/09/2023
Field of study

In the realm of Large Language Models, the balance between instruction data quality and quantity has become a focal point. Recognizing this, we introduce a self-guided methodology for LLMs to autonomously discern and select cherry samples from vast open-source datasets, effectively minimizing manual curation and potential cost for instruction tuning an LLM. Our key innovation, the Instruction-Following Difficulty (IFD) metric, emerges as a pivotal tool to identify discrepancies between a model's expected responses and its autonomous generation prowess. Through the adept application of IFD, cherry samples are pinpointed, leading to a marked uptick in model training efficiency. Empirical validations on renowned datasets like Alpaca and WizardLM underpin our findings; with a mere 10% of conventional data input, our strategy showcases improved results. This synthesis of self-guided cherry-picking and the IFD metric signifies a transformative leap in the optimization of LLMs, promising both efficiency and resource-conscious advancements. Codes, data, and models are available: https://github.com/MingLiiii/Cherry_LL

arXiv.org e-Print Archive

Virtual Prompt Injection for Instruction-Tuned Large Language Models

Author: Chen Lichang
Jin Hongxia
Li Shiyang
Ren Xiang
Srinivasan Vijay
Tang Zheng
Wang Hai
Yadav Vikas
Yan Jun
Publication venue
Publication date: 31/07/2023
Field of study

We present Virtual Prompt Injection (VPI) for instruction-tuned Large Language Models (LLMs). VPI allows an attacker-specified virtual prompt to steer the model behavior under specific trigger scenario without any explicit injection in model input. For instance, if an LLM is compromised with the virtual prompt "Describe Joe Biden negatively." for Joe Biden-related instructions, then any service deploying this model will propagate biased views when handling user queries related to Joe Biden. VPI is especially harmful for two primary reasons. Firstly, the attacker can take fine-grained control over LLM behaviors by defining various virtual prompts, exploiting LLMs' proficiency in following instructions. Secondly, this control is achieved without any interaction from the attacker while the model is in service, leading to persistent attack. To demonstrate the threat, we propose a simple method for performing VPI by poisoning the model's instruction tuning data. We find that our proposed method is highly effective in steering the LLM with VPI. For example, by injecting only 52 poisoned examples (0.1% of the training data size) into the instruction tuning data, the percentage of negative responses given by the trained model on Joe Biden-related queries change from 0% to 40%. We thus highlight the necessity of ensuring the integrity of the instruction-tuning data as little poisoned data can cause stealthy and persistent harm to the deployed model. We further explore the possible defenses and identify data filtering as an effective way to defend against the poisoning attacks. Our project page is available at https://poison-llm.github.io

arXiv.org e-Print Archive

Comprehensive evaluation of deep and graph learning on drug-drug interactions prediction

Author: Cao Dong-Sheng
Chen Haowen
Dai Lichang
Lin Xuan
Shi Jian-Yu
Song Bosheng
Yu Philip S.
Yu Zu-Guo
Zeng Li
Zeng Xiangxiang
Zhang Wen
Zhou Yafang
Publication venue: 'Oxford University Press (OUP)'
Publication date: 08/06/2023
Field of study

Recent advances and achievements of artificial intelligence (AI) as well as deep and graph learning models have established their usefulness in biomedical applications, especially in drug-drug interactions (DDIs). DDIs refer to a change in the effect of one drug to the presence of another drug in the human body, which plays an essential role in drug discovery and clinical research. DDIs prediction through traditional clinical trials and experiments is an expensive and time-consuming process. To correctly apply the advanced AI and deep learning, the developer and user meet various challenges such as the availability and encoding of data resources, and the design of computational methods. This review summarizes chemical structure based, network based, NLP based and hybrid methods, providing an updated and accessible guide to the broad researchers and development community with different domain knowledge. We introduce widely-used molecular representation and describe the theoretical frameworks of graph neural network models for representing molecular structures. We present the advantages and disadvantages of deep and graph learning methods by performing comparative experiments. We discuss the potential technical challenges and highlight future directions of deep and graph learning models for accelerating DDIs prediction.Comment: Accepted by Briefings in Bioinformatic

arXiv.org e-Print Archive

Skywork: A More Open Bilingual Foundation Model

Author: Chen Yifu
Cheng Cheng
Cheng Peng
Cheng Wenjun
Dong Chuanhai
Fang Han
Hu Rui
Li Biye
Li Chenxia
Liang Xiaojuan
Lin Lei
Liu Lunan
Luo Xilin
Lü Weiwei
Ma Yutuan
Peng Yongyi
Sun Yanqi
Wang Lijie
Wang Xiaokun
Wei Tianwen
Wu Xuejie
Yan Shuicheng
Yang Haihua
Yang Liu
Zhang Jianhao
Zhang Lichang
Zhang Xiaoyu
Zhao Liang
Zhou Yahui
Zhu Bo
Publication venue
Publication date: 30/10/2023
Field of study

In this technical report, we present Skywork-13B, a family of large language models (LLMs) trained on a corpus of over 3.2 trillion tokens drawn from both English and Chinese texts. This bilingual foundation model is the most extensively trained and openly published LLMs of comparable size to date. We introduce a two-stage training methodology using a segmented corpus, targeting general purpose training and then domain-specific enhancement training, respectively. We show that our model not only excels on popular benchmarks, but also achieves \emph{state of the art} performance in Chinese language modeling on diverse domains. Furthermore, we propose a novel leakage detection method, demonstrating that test data contamination is a pressing issue warranting further investigation by the LLM community. To spur future research, we release Skywork-13B along with checkpoints obtained during intermediate stages of the training process. We are also releasing part of our SkyPile corpus, a collection of over 150 billion tokens of web text, which is the largest high quality open Chinese pre-training corpus to date. We hope Skywork-13B and our open corpus will serve as a valuable open-source resource to democratize access to high-quality LLMs

arXiv.org e-Print Archive