58 research outputs found
Phy-chemical Attributes of Nano-scale V2O5/TiO2 Catalyst and Its’ Effect on Soot Oxidation
The V2O5 catalysts which supported on nano-scale TiO2 with variation of vanadium contents (5%, 10%, 20% and 40%) were prepared by an incipient-wetness impregnation method. The phase structures of nano-scale V2O5/TiO2 catalysts with different loading rates were characterized by Scanning electron microscope (SEM), X-Ray diffraction (XRD) and Fourier transform infrared (FT-IR) spectra. The oxidation activities of catalysts over diesel soot were performed in a themogravimetric analysis (TGA) system. The kinetics of the catalytic oxidation process were analyzed based on Flynn-Wall-Ozawa method. The characterization results showed that the phase structure of V2O5 supported on TiO2 depends heavily on the vanadium contents, which will put great effects on the catalytic performances for soot oxidation. At a low vanadium loading rates (V5-V20), active species exist as monomers and polymeric states. At a high loading rate (V40), the crystalline bulk V2O5 covers the surface of TiO2. The formed crystal structure occupied the active sites and led a decreasing in the catalytic effect. By comparing the characteristics temperatures of soot oxidation over V2O5 catalysts, the catalytic activities of catalysts with different loading rates for soot oxidation can be ranked as: V5 < V10 < V40 < V20. Via pyrolysis kinetics analysis, it is revealed that the activation energy of soot oxidation is minimum when the vanadium loading rates is 20%, which is fit well with the TG experimental results. The consistency of pyrolysis kinetics and TG experimental results confirm that the best activity catalyst is V20 in discussed catalysts of this paper, which is nearest to the monolayer dispersion saturated state of V2O5/TiO2 catalyst. Moreover, it convincingly demonstrate the obvious threshold effect in V2O5 catalysts.
Analysis of High Frequency Noise of Inverter Rotary Compressor
The inverter compressor driven by the inverter will cause high frequency noise, which will have adverse influence on total noise value and sound quality. In order to improve this problem, an existing compact rotary inverter compressor is studied in this paper. The influence law of inverter carrier wave of space vector pulse width modulation(SVPWM) technique on motor vibration and noise of compressor is analyzed and summarized. Combining order analysis and motor modal analysis, the results show that the high harmonic current induced by inverter carrier wave will produce high frequency electromagnetic force which excites the stator resonance, and finally results in high frequency noise of the compressor. Through optimization of the motor structure, the high frequency noise is reduced by more than 5dB(A), the sound quality is improved as well
AlpaCare:Instruction-tuned Large Language Models for Medical Application
Large Language Models (LLMs) have demonstrated significant enhancements in
instruction-following abilities through instruction tuning, achieving notable
performances across various tasks. Previous research has focused on fine-tuning
medical domain-specific LLMs using an extensive array of medical-specific data,
incorporating millions of pieces of biomedical literature to augment their
medical capabilities. However, existing medical instruction-tuned LLMs have
been constrained by the limited scope of tasks and instructions available,
restricting the efficacy of instruction tuning and adversely affecting
performance in the general domain. In this paper, we fine-tune LLaMA-series
models using 52k diverse, machine-generated, medical instruction-following
data, MedInstruct-52k, resulting in the model AlpaCare. Comprehensive
experimental results on both general and medical-specific domain free-form
instruction evaluations showcase AlpaCare's strong medical proficiency and
generalizability compared to previous instruction-tuned models in both medical
and general domains. We provide public access to our MedInstruct-52k dataset
and a clinician-crafted free-form instruction test set, MedInstruct-test, along
with our codebase, to foster further research and development. Our project page
is available at https://github.com/XZhang97666/AlpaCare
Reflection-Tuning: Data Recycling Improves LLM Instruction-Tuning
Recent advancements in Large Language Models (LLMs) have expanded the
horizons of natural language understanding and generation. Notably, the output
control and alignment with the input of LLMs can be refined through instruction
tuning. However, as highlighted in several studies, low-quality data in the
training set are usually detrimental to instruction tuning, resulting in
inconsistent or even misleading LLM outputs. We propose a novel method, termed
"reflection-tuning," which addresses the problem by self-improvement and
judging capabilities of LLMs. This approach utilizes an oracle LLM to recycle
the original training data by introspecting and enhancing the quality of
instructions and responses in the data. Extensive experiments on widely used
evaluation benchmarks show that LLMs trained with our recycled data outperform
those trained with existing datasets in various benchmarks
From Quantity to Quality: Boosting LLM Performance with Self-Guided Data Selection for Instruction Tuning
In the realm of Large Language Models, the balance between instruction data
quality and quantity has become a focal point. Recognizing this, we introduce a
self-guided methodology for LLMs to autonomously discern and select cherry
samples from vast open-source datasets, effectively minimizing manual curation
and potential cost for instruction tuning an LLM. Our key innovation, the
Instruction-Following Difficulty (IFD) metric, emerges as a pivotal tool to
identify discrepancies between a model's expected responses and its autonomous
generation prowess. Through the adept application of IFD, cherry samples are
pinpointed, leading to a marked uptick in model training efficiency. Empirical
validations on renowned datasets like Alpaca and WizardLM underpin our
findings; with a mere 10% of conventional data input, our strategy showcases
improved results. This synthesis of self-guided cherry-picking and the IFD
metric signifies a transformative leap in the optimization of LLMs, promising
both efficiency and resource-conscious advancements. Codes, data, and models
are available: https://github.com/MingLiiii/Cherry_LL
Virtual Prompt Injection for Instruction-Tuned Large Language Models
We present Virtual Prompt Injection (VPI) for instruction-tuned Large
Language Models (LLMs). VPI allows an attacker-specified virtual prompt to
steer the model behavior under specific trigger scenario without any explicit
injection in model input. For instance, if an LLM is compromised with the
virtual prompt "Describe Joe Biden negatively." for Joe Biden-related
instructions, then any service deploying this model will propagate biased views
when handling user queries related to Joe Biden. VPI is especially harmful for
two primary reasons. Firstly, the attacker can take fine-grained control over
LLM behaviors by defining various virtual prompts, exploiting LLMs' proficiency
in following instructions. Secondly, this control is achieved without any
interaction from the attacker while the model is in service, leading to
persistent attack. To demonstrate the threat, we propose a simple method for
performing VPI by poisoning the model's instruction tuning data. We find that
our proposed method is highly effective in steering the LLM with VPI. For
example, by injecting only 52 poisoned examples (0.1% of the training data
size) into the instruction tuning data, the percentage of negative responses
given by the trained model on Joe Biden-related queries change from 0% to 40%.
We thus highlight the necessity of ensuring the integrity of the
instruction-tuning data as little poisoned data can cause stealthy and
persistent harm to the deployed model. We further explore the possible defenses
and identify data filtering as an effective way to defend against the poisoning
attacks. Our project page is available at https://poison-llm.github.io
Comprehensive evaluation of deep and graph learning on drug-drug interactions prediction
Recent advances and achievements of artificial intelligence (AI) as well as
deep and graph learning models have established their usefulness in biomedical
applications, especially in drug-drug interactions (DDIs). DDIs refer to a
change in the effect of one drug to the presence of another drug in the human
body, which plays an essential role in drug discovery and clinical research.
DDIs prediction through traditional clinical trials and experiments is an
expensive and time-consuming process. To correctly apply the advanced AI and
deep learning, the developer and user meet various challenges such as the
availability and encoding of data resources, and the design of computational
methods. This review summarizes chemical structure based, network based, NLP
based and hybrid methods, providing an updated and accessible guide to the
broad researchers and development community with different domain knowledge. We
introduce widely-used molecular representation and describe the theoretical
frameworks of graph neural network models for representing molecular
structures. We present the advantages and disadvantages of deep and graph
learning methods by performing comparative experiments. We discuss the
potential technical challenges and highlight future directions of deep and
graph learning models for accelerating DDIs prediction.Comment: Accepted by Briefings in Bioinformatic
Skywork: A More Open Bilingual Foundation Model
In this technical report, we present Skywork-13B, a family of large language
models (LLMs) trained on a corpus of over 3.2 trillion tokens drawn from both
English and Chinese texts. This bilingual foundation model is the most
extensively trained and openly published LLMs of comparable size to date. We
introduce a two-stage training methodology using a segmented corpus, targeting
general purpose training and then domain-specific enhancement training,
respectively. We show that our model not only excels on popular benchmarks, but
also achieves \emph{state of the art} performance in Chinese language modeling
on diverse domains. Furthermore, we propose a novel leakage detection method,
demonstrating that test data contamination is a pressing issue warranting
further investigation by the LLM community. To spur future research, we release
Skywork-13B along with checkpoints obtained during intermediate stages of the
training process. We are also releasing part of our SkyPile corpus, a
collection of over 150 billion tokens of web text, which is the largest high
quality open Chinese pre-training corpus to date. We hope Skywork-13B and our
open corpus will serve as a valuable open-source resource to democratize access
to high-quality LLMs
- …