111 research outputs found
MoDS: Model-oriented Data Selection for Instruction Tuning
Instruction tuning has become the de facto method to equip large language
models (LLMs) with the ability of following user instructions. Usually,
hundreds of thousands or millions of instruction-following pairs are employed
to fine-tune the foundation LLMs. Recently, some studies show that a small
number of high-quality instruction data is enough. However, how to select
appropriate instruction data for a given LLM is still an open problem. To
address this problem, in this paper we present a model-oriented data selection
(MoDS) approach, which selects instruction data based on a new criteria
considering three aspects: quality, coverage and necessity. First, our approach
utilizes a quality evaluation model to filter out the high-quality subset from
the original instruction dataset, and then designs an algorithm to further
select from the high-quality subset a seed instruction dataset with good
coverage. The seed dataset is applied to fine-tune the foundation LLM to obtain
an initial instruction-following LLM. Finally, we develop a necessity
evaluation model to find out the instruction data which are performed badly in
the initial instruction-following LLM and consider them necessary instructions
to further improve the LLMs. In this way, we can get a small high-quality,
broad-coverage and high-necessity subset from the original instruction
datasets. Experimental results show that, the model fine-tuned with 4,000
instruction pairs selected by our approach could perform better than the model
fine-tuned with the full original dataset which includes 214k instruction data
ChineseWebText: Large-scale High-quality Chinese Web Text Extracted with Effective Evaluation Model
During the development of large language models (LLMs), the scale and quality
of the pre-training data play a crucial role in shaping LLMs' capabilities. To
accelerate the research of LLMs, several large-scale datasets, such as C4 [1],
Pile [2], RefinedWeb [3] and WanJuan [4], have been released to the public.
However, most of the released corpus focus mainly on English, and there is
still lack of complete tool-chain for extracting clean texts from web data.
Furthermore, fine-grained information of the corpus, e.g. the quality of each
text, is missing. To address these challenges, we propose in this paper a new
complete tool-chain EvalWeb to extract Chinese clean texts from noisy web data.
First, similar to previous work, manually crafted rules are employed to discard
explicit noisy texts from the raw crawled web contents. Second, a well-designed
evaluation model is leveraged to assess the remaining relatively clean data,
and each text is assigned a specific quality score. Finally, we can easily
utilize an appropriate threshold to select the high-quality pre-training data
for Chinese. Using our proposed approach, we release the largest and latest
large-scale high-quality Chinese web text ChineseWebText, which consists of
1.42 TB and each text is associated with a quality score, facilitating the LLM
researchers to choose the data according to the desired quality thresholds. We
also release a much cleaner subset of 600 GB Chinese data with the quality
exceeding 90%
2D Materials Graphene related materials for thermal management Graphene related materials for thermal management
International audienceAlmost 15 years have gone ever since the discovery of graphene as a single atom layer. Numerous papers have been published to demonstrate its high electron mobility, excellent thermal and mechanical as well as optical properties. We have recently seen more and more applications towards using graphene in commercial products. This paper is an attempt to review and summarize the current status of the research of the thermal properties of graphene and other 2D based materials including the manufacturing and characterization techniques and their applications, especially in electronics and power modules. It is obvious from the review that graphene has penetrated the market and gets more and more applications in commercial electronics thermal management context. In the paper, we also made a critical analysis of how mature the manufacturing processes are; what are the accuracies and challenges with the various characterization techniques and what are the remaining questions and issues left before we see further more applications in this exciting and fascinating field. TOPICAL REVIE
Photoredox-catalyzed reaction as a powerful tool for rapid natural product Gem -dimethylation modification: discovery of potent anti-cancer agents with improved druggability
Tylophorine has diverse biological activities; however, the stability, solubility, and central nervous system toxicity have severely limited use of tylophorine. The gem -dimethyl group is an organic chemistry functional group that consists of two methyl groups bonded to the same carbon atom. This feature has gained significant attention in medicinal chemistry due to its unique properties and potential applications in drug design. We applied a new photoredox methodology to tylophorine modification, resulting in a series of gem-dimethyl tylophorine analogues. Among the analogues, compound 4b demonstrated promising activity against a wide range of tumor cell lines and exhibited significantly improved drug-like properties, including enhanced solubility and stability. Compound 4b showed an exceptional inhibitory effect (7.8 nM) against a C481S mutation-induced ibrutinib-resistant non-Hodgkin’s lymphoma cell line, as well as primary tumor cell lines obtained from patients. Importantly, compound 4b exhibited significantly reduced anti-proliferative activity against the normal cell line tested, indicating the potential for an enhanced therapeutic window for compound 4b . Based on these early-stage data, we believe that our study provides a solid foundation for the development of new therapeutic agents for potential drug-resistant cancer treatment in the near future
New techniques for raising fish in flooded ricefields
Meeting: National Rice Fish Farming Systems Symposium, 4-8 Oct. 1988, Wuxi, CNIn IDL-1614
Impact of Political Connection Strength on the Internationalization Outcome of Chinese Firms: Perspectives from Market Exploration and Technology Acquisition
Although the role of a home country’s government in firms’ internationalization processes has been investigated, there is a gap in the literature concerning the effectiveness of the government’s effort. Based on the data of 1996 Chinese listed firms, this study investigates how the strength of Chinese firms’ political connection with their home country government impacts the outcomes of their internationalization activities. These activities are classified into market exploration and technology acquisition. We establish an index to measure the firms’ political connection strength and find that it exhibits a bimodal distribution, which indicates that some firms maintain a close relationship with the government, while the business activities of others are distant from the government. The strength of the political connection has different moderating effects on firms’ internationalization processes when either the international market context or the firms’ internationalization activities vary. A strong political connection is beneficial for firms to explore the markets and acquire beneficial technology from developed countries. Compared with its role in exploring international markets, political connection plays a more significant moderating role in augmenting the positive effect of international technology acquisition on firms’ innovation capability. Therefore, Chinese firms may perform better in the internationalization process if they maintain a close relationship with the Chinese government, which engages in promoting the internationalization of domestic firms through an array of policies that may compensate for the firms’ disadvantages. Our results show the mechanism through which emerging countries’ governments use directives and incentives to facilitate the internationalization of domestic firms
Cultivating different breeds of fish in ricefields
Meeting: National Rice Fish Farming Systems Symposium, 4-8 Oct. 1988, Wuxi, CNIn IDL-1614
FPGA Implementation of Video Transmission System Based on LTE
In order to support high-definition video transmission, an implementation of video transmission system based on Long Term Evolution is designed. This system is developed on Xilinx Virtex-6 FPGA ML605 Evaluation Board. The paper elaborates the features of baseband link designed in Xilinx ISE and protocol stack designed in Xilinx SDK, and introduces the process of setting up hardware and software platform in Xilinx XPS. According to test, this system consumes less hardware resource and is able to transmit bidirectional video clearly and stably
FPGA Implementation of Video Transmission System Based on LTE
In order to support high-definition video transmission, an implementation of video transmission system based on Long Term Evolution is designed. This system is developed on Xilinx Virtex-6 FPGA ML605 Evaluation Board. The paper elaborates the features of baseband link designed in Xilinx ISE and protocol stack designed in Xilinx SDK, and introduces the process of setting up hardware and software platform in Xilinx XPS. According to test, this system consumes less hardware resource and is able to transmit bidirectional video clearly and stably
- …