Search CORE

14 research outputs found

ChineseWebText: Large-scale High-quality Chinese Web Text Extracted with Effective Evaluation Model

Author: Chen Jianghao
Ding Chenglin
Du Qianlong
Jian Pu
Wang Jinqiao
Xi Tengxiao
Yi Dongyi
Zhang Jiajun
Zhu Guibo
Zong Chengqing
Publication venue
Publication date: 10/11/2023
Field of study

During the development of large language models (LLMs), the scale and quality of the pre-training data play a crucial role in shaping LLMs' capabilities. To accelerate the research of LLMs, several large-scale datasets, such as C4 [1], Pile [2], RefinedWeb [3] and WanJuan [4], have been released to the public. However, most of the released corpus focus mainly on English, and there is still lack of complete tool-chain for extracting clean texts from web data. Furthermore, fine-grained information of the corpus, e.g. the quality of each text, is missing. To address these challenges, we propose in this paper a new complete tool-chain EvalWeb to extract Chinese clean texts from noisy web data. First, similar to previous work, manually crafted rules are employed to discard explicit noisy texts from the raw crawled web contents. Second, a well-designed evaluation model is leveraged to assess the remaining relatively clean data, and each text is assigned a specific quality score. Finally, we can easily utilize an appropriate threshold to select the high-quality pre-training data for Chinese. Using our proposed approach, we release the largest and latest large-scale high-quality Chinese web text ChineseWebText, which consists of 1.42 TB and each text is associated with a quality score, facilitating the LLM researchers to choose the data according to the desired quality thresholds. We also release a much cleaner subset of 600 GB Chinese data with the quality exceeding 90%

arXiv.org e-Print Archive

CoLLiE: Collaborative Training of Large Language Models in an Efficient Way

Author: Chen Keyu
Gu Tianle
Guo Honglin
Guo Qipeng
Hong Jiawei
Liu Tengxiao
Liu Xiaoran
Lv Kai
Qiu Xipeng
Sun Yu
Xing Shuhao
Yan Hang
Yang Yuqing
Zhang Shuo
Publication venue
Publication date: 01/12/2023
Field of study

Large language models (LLMs) are increasingly pivotal in a wide range of natural language processing tasks. Access to pre-trained models, courtesy of the open-source community, has made it possible to adapt these models to specific applications for enhanced performance. However, the substantial resources required for training these models necessitate efficient solutions. This paper introduces CoLLiE, an efficient library that facilitates collaborative training of large language models using 3D parallelism, parameter-efficient fine-tuning (PEFT) methods, and optimizers such as Lion, Adan, Sophia, LOMO and AdaLomo. With its modular design and comprehensive functionality, CoLLiE offers a balanced blend of efficiency, ease of use, and customization. CoLLiE has proven superior training efficiency in comparison with prevalent solutions in pre-training and fine-tuning scenarios. Furthermore, we provide an empirical evaluation of the correlation between model size and GPU memory consumption under different optimization methods, as well as an analysis of the throughput. Lastly, we carry out a comprehensive comparison of various optimizers and PEFT methods within the instruction-tuning context. CoLLiE is available at https://github.com/OpenLMLab/collie.Comment: To appear at EMNLP 2023 Demo; Code is available at https://github.com/OpenLMLab/colli

arXiv.org e-Print Archive

Experience reverses the red effect among Chinese stockbrokers.

Author: Buxin Han
Tengxiao Zhang
Publication venue: Public Library of Science (PLoS)
Publication date: 24/02/2014
Field of study

Recent research has shown that the color red influences psychological functioning. Red is hypothesized to be linked to aggression and danger in evolution, and these links are enhanced by culture-specific uses of red. Thus, color meanings are thought to be grounded in biologically based proclivities and learned associations. However, to date, there has been no direct evidence for the influence of experience on the red effect. This study focused on whether experience could change the psychological effects of the color red. In the context of the Chinese stock market, contrary to the meaning generally associated with red as negative and green as positive, red represents a rise in stock price and green stands for a decrease. An experiment using a 2×2 between subjects factorial design demonstrated that red (compared with green) impaired Chinese college students' performance on an IQ test (in accordance with the red effect), but the opposite effect was found among stockbrokers. These results provide direct evidence of learned color meanings, in support of the general model of color effect

Public Library of Science (PLOS)

Directory of Open Access Journals

PubMed Central

Institute of Psychology,Chinese Academy Of Sciences

Institutional Repository of Institute of Psychology, Chinese Academy of Sciences

The effect of color on performance on Raven’s Standard Progressive Matrices The college students in the red group (n = 12) performed worse than did those in the green group (n = 12).

Author: Buxin Han (490717)
Tengxiao Zhang (526467)
Publication venue
Publication date
Field of study

<p>Conversely, the stockbrokers in the red group (n = 12) performed better than did those in the green group (n = 12). Error bars indicate standard error of test scores.</p

FigShare

Red color in flags: A signal for competition

Author: Feng Shiyu
Han Buxin
Sun Si
Zhang Tengxiao
Publication venue: 'Wiley'
Publication date: 10/08/2017
Field of study

The color-in-context theory and ecological valence theory suggest that color preference depends on the context and ecological object that define the psychological meanings of colors. The present study was conducted to identify the preference for the color red in national flags across the world. We explored 192 national flags across the world and found that red was the most frequently used color. Through a systemic examination of symbolic meanings behind use of the color red in flags, it was also found that the color red was often attached with an aggressive connotation. In contrast, the flags of the selected international collaborative organizations did not appear to prefer red. These results support the hypothesis of red flag preference in real-world competitive contexts. Limitations and future research directions are also discussed.</p

Crossref

Institutional Repository of Institute of Psychology, Chinese Academy of Sciences

Acute stress responses in Chinese soldiers performing various military tasks

Author: Huang Peng
Miao Danmin
Zhang Tengxiao
Zhu Xia
Publication venue
Publication date: 20/11/2014
Field of study

Background: To examine Chinese soldiers' acute stress responses, we did this study

Crossref

Institute of Psychology,Chinese Academy Of Sciences

PubMed Central

Institutional Repository of Institute of Psychology, Chinese Academy of Sciences

Urban wider Adults Becoming Unhealthier in Modern China: A Cross-Temporal Meta-Analysis

Author: Han Buxin
Tan Hao
Wang Ting
Wu Yiling
Zhang Tengxiao
Publication venue: 'SAGE Publications'
Publication date: 04/05/2016
Field of study

This study investigated patterns of change in the health status of urban older adults in urban China from 2001 to 2013. A cross-temporal meta-analysis was applied to I I I selected studies in which the SF-36 had been administered to urban older adults in China. Scores from a total of 72,441 participants were analyzed. Correlations between the SF-36 scores and sampling years were examined. The self-reported health status of urban older adults in China has declined significantly in the past 13 years. The observed decline in the health status of older adults suggests that economic progress and a rapidly aging population have had more negative than positive effects on the health of this population

Crossref

Institutional Repository of Institute of Psychology, Chinese Academy of Sciences

Influence of Ink Properties on the Morphology of Long-Wave Infrared HgSe Quantum Dot Films

Author: Shuya Cao
Suhui Wang
Tengxiao Guo
Xu Zhang
Yi Wang
Publication venue: 'MDPI AG'
Publication date: 01/06/2022
Field of study

As the core device of the miniature quantum dot (QD) spectrometer, the morphology control of the filter film array cannot be ignored. We eliminated strong interference from additives on the spectrum of a long-wave infrared (LWIR) QD filter film by selecting volatile additives. This work is significant for detecting targets by spectroscopic methods. In this work, a filter film with characteristic spectral bands located in the LWIR was obtained by the natural evaporation of QD ink, which was prepared by mixing various volatile organic solvents with HgSe QD–toluene solution. The factors affecting the morphology of HgSe LWIR films, including ink surface tension, particle size, and solute volume fraction, were the main focus of the analysis. The experimental results suggested that the film slipped in the evaporation process, and the multilayer annular deposition formed when the surface tension of the ink was no more than 24.86 mN/m. The “coffee ring” and the multilayer annular deposition essentially disappeared when the solute particles were larger than 188.11 nm. QDs in the film were accumulated, and a “gully” morphology appeared when the solute volume fraction was greater than 0.1. In addition, both the increase rate of the film height and the decrease rate of the transmission slowed down. The relationship between film height and transmission was obtained by fitting, and the curve conformed to the Lambert–Beer law. Therefore, a uniform and flat film without “coffee rings” can be prepared by adjusting the surface tension, particle size, and volume fraction. This method could provide an empirical method for the preparation of LWIR QD filter film arrays

Multidisciplinary Digital Publishing Institute

Directory of Open Access Journals

PubMed Central

Synthesis and Application of Polymer SXFA in the Detection of Organophosphine Agents with a SAW Sensor

Author: Cancan Yan
Junchao Yang
Lin Zhang
Molin Qin
Tengxiao Guo
Yong Pan
Publication venue: MDPI AG
Publication date: 01/03/2024
Field of study

The effective detection of isopropyl methylfluorophosphonate (GB, sarin), a type of organophosphine poisoning agent, is an urgent issue to address to maintain public safety. In this research, a gas-sensitive film material, poly (4-hydroxy-4,4-bis trifluoromethyl)-butyl-1-enyl)-siloxane (SXFA), with a structure of hexafluoroisopropyl (HFIP) functional group was synthesized by using methyl vinylpropyl dichlorosilane and hexafluoroacetone trihydrate as initial materials. The synthesis process products were characterized using FTIR. SXFA was prepared on a 200 MHz shear surface wave delay line using the spin-coating method for GB detection. A detection limit of 3 was achieved through conditional experiments. Meanwhile, we also obtained a maximum response of 2.168 mV at a 0.1 mg/m3 concentration, indicating the much lower detection limit of the SAW-SXFA sensor. Additionally, a maximum response standard deviation of 0.11 mV with a coefficient of variation of 0.01 and a maximum recovery standard deviation of 0.22 mV with a coefficient of variation of 0.02 were also obtained through five repeated experiments. The results show that the SAW-SXFA sensor has strong selectivity and reproducibility, good selectivity, positive detection ability, high sensitivity, and fast alarm performance for sarin detection

Directory of Open Access Journals

Thermally activated delayed fluorescence sensitizer for D–A–A type emitters with orange-red light emission

Author: Baldo
Cai
Chen
Chiang
Han
Huang
Li
Li
Li
Liu
Mukherjee
Nakanotani
Petri
Poriel
Poriel
Romain
Takahiro
Tengxiao
Tuong
Uoyama
Wang
Wang
Wang
Wang
Wu
Yao
Yu
Yu
Zhang
Zhang
Zhang
Zhang
Publication venue: 'Royal Society of Chemistry (RSC)'
Publication date: 01/01/2018
Field of study

Crossref