Search CORE

77 research outputs found

DyVal: Dynamic Evaluation of Large Language Models for Reasoning Tasks

Author: Chen Jiaao
Gong Neil Zhenqiang
Wang Jindong
Xie Xing
Yang Diyi
Zhu Kaijie
Publication venue
Publication date: 14/03/2024
Field of study

Large language models (LLMs) have achieved remarkable performance in various evaluation benchmarks. However, concerns are raised about potential data contamination in their considerable volume of training corpus. Moreover, the static nature and fixed complexity of current benchmarks may inadequately gauge the advancing capabilities of LLMs. In this paper, we introduce DyVal, a general and flexible protocol for dynamic evaluation of LLMs. Based on our framework, we build graph-informed DyVal by leveraging the structural advantage of directed acyclic graphs to dynamically generate evaluation samples with controllable complexities. DyVal generates challenging evaluation sets on reasoning tasks including mathematics, logical reasoning, and algorithm problems. We evaluate various LLMs ranging from Flan-T5-large to GPT-3.5-Turbo and GPT-4. Experiments show that LLMs perform worse in DyVal-generated evaluation samples with different complexities, highlighting the significance of dynamic evaluation. We also analyze the failure cases and results of different prompting methods. Moreover, DyVal-generated samples are not only evaluation sets, but also helpful data for fine-tuning to improve the performance of LLMs on existing benchmarks. We hope that DyVal can shed light on future evaluation research of LLMs. Code is available at: https://github.com/microsoft/promptbench.Comment: ICLR 2024 spotlight; 38 pages; code is at aka.ms/dyva

arXiv.org e-Print Archive

PromptBench: Towards Evaluating the Robustness of Large Language Models on Adversarial Prompts

Author: Chen Hao
Gong Neil Zhenqiang
Wang Jindong
Wang Yidong
Wang Zichen
Xie Xing
Yang Linyi
Ye Wei
Zhang Yue
Zhou Jiaheng
Zhu Kaijie
Publication venue
Publication date: 13/06/2023
Field of study

The increasing reliance on Large Language Models (LLMs) across academia and industry necessitates a comprehensive understanding of their robustness to prompts. In response to this vital need, we introduce PromptBench, a robustness benchmark designed to measure LLMs' resilience to adversarial prompts. This study uses a plethora of adversarial textual attacks targeting prompts across multiple levels: character, word, sentence, and semantic. These prompts are then employed in diverse tasks, such as sentiment analysis, natural language inference, reading comprehension, machine translation, and math problem-solving. Our study generates 4,032 adversarial prompts, meticulously evaluated over 8 tasks and 13 datasets, with 567,084 test samples in total. Our findings demonstrate that contemporary LLMs are vulnerable to adversarial prompts. Furthermore, we present comprehensive analysis to understand the mystery behind prompt robustness and its transferability. We then offer insightful robustness analysis and pragmatic recommendations for prompt composition, beneficial to both researchers and everyday users. We make our code, prompts, and methodologies to generate adversarial prompts publicly accessible, thereby enabling and encouraging collaborative exploration in this pivotal field: https://github.com/microsoft/promptbench.Comment: Technical report; 23 pages; code is at: https://github.com/microsoft/promptbenc

arXiv.org e-Print Archive

Fine mapping and candidate gene analysis of gynoecy trait in chieh-qua (Benincasa hispida Cogn. var. chieh-qua How)

Author: Biao Jiang
Biao Jiang
Dasen Xie
Dasen Xie
Jinqiang Yan
Jinqiang Yan
Lin Chen
Lin Chen
Min Wang
Min Wang
Qingwu Peng
Songguang Yang
Songguang Yang
Wei Liu
Wei Liu
Wenrui Liu
Wenrui Liu
Zhenqiang Cao
Zhenqiang Cao
Publication venue: 'Frontiers Media SA'
Publication date: 01/04/2023
Field of study

Gynoecy demonstrates an earlier production of hybrids and a higher yield and improves the efficiency of hybrid seed production. Therefore, the utilization of gynoecy is beneficial for the genetic breeding of chieh-qua. However, little knowledge of gynoecious-related genes in chieh-qua has been reported until now. Here, we used an F2 population from the cross between the gynoecious line ‘A36’ and the monoecious line ‘SX’ for genetic mapping and revealed that chieh-qua gynoecy was regulated by a single recessive gene. We fine-mapped it into a 530-kb region flanked by the markers Indel-3 and KASP145 on Chr.8, which harbors eight candidate genes. One of the candidate genes, Bhi08G000345, encoding networked protein 4 (CqNET4), contained a non-synonymous SNP resulting in the amino acid substitution of isoleucine (ATA; I) to methionine (ATG; M). CqNET4 was prominently expressed in the female flower, and only three genes related to ethylene synthesis were significantly expressed between ‘A36’ and ‘SX.’ The results presented here provide support for the CqNET4 as the most likely candidate gene for chieh-qua gynoecy, which differed from the reported gynoecious genes

Directory of Open Access Journals

Modulatory Effect of Fermented Papaya Extracts on Mammary Gland Hyperplasia Induced by Estrogen and Progestin in Female Rats

Author: Fang Liu
Feng Xie
Gaoli Zheng
Guocan Chen
Hao Chen
Junying Sun
Lili Li
Sheng Zhang
Yanfei Xin
Yaoxian Xuan
Yisheng Song
Zhenqiang You
Zhiqin Chen
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2017
Field of study

Fermented papaya extracts (FPEs) are obtained by fermentation of papaya by Aspergillus oryzae and yeasts. In this study, we investigated the protective effects of FPEs on mammary gland hyperplasia induced by estrogen and progestogen. Rats were randomly divided into 6 groups, including a control group, an FPE-alone group, a model group, and three FPE treatment groups (each receiving 30, 15, or 5 ml/kg FPEs). Severe mammary gland hyperplasia was induced upon estradiol benzoate and progestin administration. FPEs could improve the pathological features of the animal model and reduce estrogen levels in the serum. Analysis of oxidant indices revealed that FPEs could increase superoxide dismutase (SOD) and glutathione peroxidase (GSH-Px) activities, decrease malondialdehyde (MDA) level in the mammary glands and serum of the animal models, and decrease the proportion of cells positive for the oxidative DNA damage marker 8-oxo-dG in the mammary glands. Additionally, estradiol benzoate and progestin altered the levels of serum biochemical compounds such as aspartate transaminase (AST), total bilirubin (TBIL), and alanine transaminase (ALT), as well as hepatic oxidant indices such as SOD, GSH-Px, MDA, and 8-oxo-2′-deoxyguanosine (8-oxo-dG). These indices reverted to normal levels upon oral administration of a high dose of FPEs. Taken together, our results indicate that FPEs can protect the mammary glands and other visceral organs from oxidative damage

Crossref

Directory of Open Access Journals

The Reproducibility of Lists of Differentially Expressed Genes in Microarray Studies

Reproducibility is a fundamental requirement in scientific experiments and clinical contexts. Recent publications raise concerns about the reliability of microarray technology because of the apparent lack of agreement between lists of differentially expressed genes (DEGs). In this study we demonstrate that (1) such discordance may stem from ranking and selecting DEGs solely by statistical significance (P) derived from widely used simple t-tests; (2) when fold change (FC) is used as the ranking criterion, the lists become much more reproducible, especially when fewer genes are selected; and (3) the instability of short DEG lists based on P cutoffs is an expected mathematical consequence of the high variability of the t-values. We recommend the use of FC ranking plus a non-stringent P cutoff as a baseline practice in order to generate more reproducible DEG lists. The FC criterion enhances reproducibility while the P criterion balances sensitivity and specificity

Crossref

Nature Precedings

Fertilization of Grapevine Based on Gene Expression

Author: Chen Wang
Cheng Zhang
Christensen L.P.
Conradie W.J.
Dai D.W.
Haifeng Jia
He P.C.
Jingjue Zeng
Keller M.
Kong Q.S.
Li S.J.
Ma ZQ
Marangoni B.
Neilsen G.H.
Peacock W.L.
Spayd S.E.
Tariq Perraiz
Wheeler S.J.
Xie H.X.
Xudong Zhu
Zhang C.X.
Zhang Y.
Zhenqiang Xie
Publication venue: 'Crop Science Society of America'
Publication date
Field of study

Crossref

Cross-platform comparability of microarray technology: Intra-platform consistency and appropriate data analysis procedures are essential

Author: A Barczak
AK Jarvinen
AT Rogojina
BH Mecham
CL Yauk
Daniel A Casciano
DF Ransohoff
DF Ransohoff
E Marshall
EF Petricoin 3rd
Federico M Goodsaid
Felix W Frueh
FW Frueh
GP Page
H Van Bakel
Hong Fang
Huixiao Hong
James C Fuscoe
James J Chen
Jing Han
JL Hackett
L Shi
L Shi
Lei Guo
Leming Shi
M Bakay
MD Piper
N Mah
N Raikhel
PK Tan
Qian Xie
R Breitling
R Shippy
Raj K Puri
Roger G Perkins
T Barrett
T Mehta
T Yuen
Tao Han
TR Hughes
Tucker A Patterson
Uwe Scherf
VG Tusher
Weida Tong
WP Kuo
Y Woo
Z aAlex Xu
Zhenqiang Su
Publication venue: BioMed Central
Publication date: 01/01/2005
Field of study

BACKGROUND: The acceptance of microarray technology in regulatory decision-making is being challenged by the existence of various platforms and data analysis methods. A recent report (E. Marshall, Science, 306, 630–631, 2004), by extensively citing the study of Tan et al. (Nucleic Acids Res., 31, 5676–5684, 2003), portrays a disturbingly negative picture of the cross-platform comparability, and, hence, the reliability of microarray technology. RESULTS: We reanalyzed Tan's dataset and found that the intra-platform consistency was low, indicating a problem in experimental procedures from which the dataset was generated. Furthermore, by using three gene selection methods (i.e., p-value ranking, fold-change ranking, and Significance Analysis of Microarrays (SAM)) on the same dataset we found that p-value ranking (the method emphasized by Tan et al.) results in much lower cross-platform concordance compared to fold-change ranking or SAM. Therefore, the low cross-platform concordance reported in Tan's study appears to be mainly due to a combination of low intra-platform consistency and a poor choice of data analysis procedures, instead of inherent technical differences among different platforms, as suggested by Tan et al. and Marshall. CONCLUSION: Our results illustrate the importance of establishing calibrated RNA samples and reference datasets to objectively assess the performance of different microarray platforms and the proficiency of individual laboratories as well as the merits of various data analysis procedures. Thus, we are progressively coordinating the MAQC project, a community-wide effort for microarray quality control

Crossref

Springer - Publisher Connector

PubMed Central

Microarray scanner calibration curves: characteristics and implications

Author: AM Dudley
Axon
BA Rosenzweig
D Hekstra
EP Hoffman
F Naef
Federico M Goodsaid
Felix W Frueh
GA Held
H Bengtsson
H Lyng
H Yue
Hong Fang
Huixiao Hong
IV Yang
J Fuscoe
J Quackenbush
James C Fuscoe
James J Chen
Jing Han
JN Weinstein
K Dobbin
K Dobbin
L Shi
L Shi
LE Dodd
Lei Guo
Leming Shi
MJ Martinez
N Raghavachari
Qian Xie
Raj K Puri
Roger G Perkins
S Pickett
Stephen C Harris
T Yuen
Tao Han
VG Cheung
VG Desai
W Tong
W Tong
Weida Tong
William S Branham
WR Foster
Y Zong
YH Yang
Z Alex Xu
Zhenqiang Su
Publication venue: BioMed Central
Publication date: 01/01/2005
Field of study

BACKGROUND: Microarray-based measurement of mRNA abundance assumes a linear relationship between the fluorescence intensity and the dye concentration. In reality, however, the calibration curve can be nonlinear. RESULTS: By scanning a microarray scanner calibration slide containing known concentrations of fluorescent dyes under 18 PMT gains, we were able to evaluate the differences in calibration characteristics of Cy5 and Cy3. First, the calibration curve for the same dye under the same PMT gain is nonlinear at both the high and low intensity ends. Second, the degree of nonlinearity of the calibration curve depends on the PMT gain. Third, the two PMTs (for Cy5 and Cy3) behave differently even under the same gain. Fourth, the background intensity for the Cy3 channel is higher than that for the Cy5 channel. The impact of such characteristics on the accuracy and reproducibility of measured mRNA abundance and the calculated ratios was demonstrated. Combined with simulation results, we provided explanations to the existence of ratio underestimation, intensity-dependence of ratio bias, and anti-correlation of ratios in dye-swap replicates. We further demonstrated that although Lowess normalization effectively eliminates the intensity-dependence of ratio bias, the systematic deviation from true ratios largely remained. A method of calculating ratios based on concentrations estimated from the calibration curves was proposed for correcting ratio bias. CONCLUSION: It is preferable to scan microarray slides at fixed, optimal gain settings under which the linearity between concentration and intensity is maximized. Although normalization methods improve reproducibility of microarray measurements, they appear less effective in improving accuracy

Crossref

Springer - Publisher Connector

PubMed Central

The balance of reproducibility, sensitivity, and specificity of lists of differentially expressed genes in microarray studies

Abstract Background Reproducibility is a fundamental requirement in scientific experiments. Some recent publications have claimed that microarrays are unreliable because lists of differentially expressed genes (DEGs) are not reproducible in similar experiments. Meanwhile, new statistical methods for identifying DEGs continue to appear in the scientific literature. The resultant variety of existing and emerging methods exacerbates confusion and continuing debate in the microarray community on the appropriate choice of methods for identifying reliable DEG lists. Results Using the data sets generated by the MicroArray Quality Control (MAQC) project, we investigated the impact on the reproducibility of DEG lists of a few widely used gene selection procedures. We present comprehensive results from inter-site comparisons using the same microarray platform, cross-platform comparisons using multiple microarray platforms, and comparisons between microarray results and those from TaqMan – the widely regarded "standard" gene expression platform. Our results demonstrate that (1) previously reported discordance between DEG lists could simply result from ranking and selecting DEGs solely by statistical significance (<it>P</it>) derived from widely used simple <it>t</it>-tests; (2) when fold change (FC) is used as the ranking criterion with a non-stringent <it>P</it>-value cutoff filtering, the DEG lists become much more reproducible, especially when fewer genes are selected as differentially expressed, as is the case in most microarray studies; and (3) the instability of short DEG lists solely based on <it>P</it>-value ranking is an expected mathematical consequence of the high variability of the <it>t</it>-values; the more stringent the <it>P</it>-value threshold, the less reproducible the DEG list is. These observations are also consistent with results from extensive simulation calculations. Conclusion We recommend the use of FC-ranking plus a non-stringent <it>P </it>cutoff as a straightforward and baseline practice in order to generate more reproducible DEG lists. Specifically, the <it>P</it>-value cutoff should not be stringent (too small) and FC should be as large as possible. Our results provide practical guidance to choose the appropriate FC and <it>P</it>-value cutoffs when selecting a given number of DEGs. The FC criterion enhances reproducibility, whereas the <it>P </it>criterion balances sensitivity and specificity.</p

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

The Novartis Repository

Methylprednisolone as Adjunct to Endovascular Thrombectomy for Large-Vessel Occlusion Stroke

Author: Cai Tieying
Chen Anqiang
Chen Junbin
Chen Kechun
Chen Yangmei
Chen Yifei
Chen Zhuo
Cheng Daoyou
Cheng Wen
Dai Ling
Deng Shenglin
Du Jie
Fan Shitao
Guo Changwei
Hao Yonggang
He Wencheng
Hu Jinrong
Huang Jiacheng
Huang Jiandi
Huang Wenguo
Huang Xinyuan
Huang Yihong
Jiang Lin
Jiang Shunfu
Jin Zhenglong
Ju Dongsheng
Kong Deyan
Kong Weilin
Lei Bo
Li Bo
Li Fengli
Li Jinglun
Li Linyu
Li Yanling
Li Zhenqiang
Li Zuopeng
Liao Jiasheng
Lin Xiaoli
Liu Chang
Liu Chen
Liu Da
Liu Jiazuo
Liu Junsheng
Liu Shudong
Liu Wenhua
Liu Xiang
Liu Zongtao
Luan Jianfang
Luo Jun
Luo Shiwei
Luo Xiaojun
Ma Jinfu
Mou Qingchun
Nguyen Thanh N.
Ni Yang
Niu Jianqin
Nogueira Raul G.
Ouyang Qin
Pan Chengde
Peng Yuqi
Peng Zhouzhou
Pu Jie
Pu Shengxiong
Qi Li
Qiu Tao
Qiu Zhongming
Ruan Zhongfan
Saver Jeffrey L.
Shan Yuanjun
Shan Yuanjun
Shi Xiaolei
Song Bo
Song Dengwen
Song Jiaxing
Sun Wenzhe
Sun Yaxuan
Tan Xiaolin
Tang Yufeng
Tian Yan
Tian Yan
Tian Yaoyu
Wan Yue
Wang Duolao
Wang Hongjun
Wang Jian
Wang Kuiyun
Wang Pengfei
Wang Shouchun
Wang Zhixi
Wei Shirong
Wu Changchuan
Wu Youlin
Xia Zhongbin
Xie Dongjing
Xiong Xiaoping
Xu Rufu
Xu Xu
Yan Shiqiang
Yang Dahong
Yang De
Yang Jie
Yang Mei
Yang Qingwu
Yang Shihai
Yang Shuang
Yao Li
Yu Nizhen
Yu Shui
Yue Chengsong
Zeng Guoyong
Zhang Guling
Zhang Jie
Zhang Min
Zhao Haojin
Zheng Chong
Zheng Hongting
Zhou Peiyang
Zhu Xunfeng
Zi Wenjie
Zou Xin
Publication venue: American Medical Association (AMA)
Publication date: 12/03/2024
Field of study

Importance It is uncertain whether intravenous methylprednisolone improves outcomes for patients with acute ischemic stroke due to large-vessel occlusion (LVO) undergoing endovascular thrombectomy. Objective To assess the efficacy and adverse events of adjunctive intravenous low-dose methylprednisolone to endovascular thrombectomy for acute ischemic stroke secondary to LVO. Design, Setting, and Participants This investigator-initiated, randomized, double-blind, placebo-controlled trial was implemented at 82 hospitals in China, enrolling 1680 patients with stroke and proximal intracranial LVO presenting within 24 hours of time last known to be well. Recruitment took place between February 9, 2022, and June 30, 2023, with a final follow-up on September 30, 2023.InterventionsEligible patients were randomly assigned to intravenous methylprednisolone (n = 839) at 2 mg/kg/d or placebo (n = 841) for 3 days adjunctive to endovascular thrombectomy. Main Outcomes and Measures The primary efficacy outcome was disability level at 90 days as measured by the overall distribution of the modified Rankin Scale scores (range, 0 [no symptoms] to 6 [death]). The primary safety outcomes included mortality at 90 days and the incidence of symptomatic intracranial hemorrhage within 48 hours. Results Among 1680 patients randomized (median age, 69 years; 727 female [43.3%]), 1673 (99.6%) completed the trial. The median 90-day modified Rankin Scale score was 3 (IQR, 1-5) in the methylprednisolone group vs 3 (IQR, 1-6) in the placebo group (adjusted generalized odds ratio for a lower level of disability, 1.10 [95% CI, 0.96-1.25]; P = .17). In the methylprednisolone group, there was a lower mortality rate (23.2% vs 28.5%; adjusted risk ratio, 0.84 [95% CI, 0.71-0.98]; P = .03) and a lower rate of symptomatic intracranial hemorrhage (8.6% vs 11.7%; adjusted risk ratio, 0.74 [95% CI, 0.55-0.99]; P = .04) compared with placebo. Conclusions and Relevance Among patients with acute ischemic stroke due to LVO undergoing endovascular thrombectomy, adjunctive methylprednisolone added to endovascular thrombectomy did not significantly improve the degree of overall disability.Trial RegistrationChiCTR.org.cn Identifier: ChiCTR210005172

LSTM Online Archive