Search CORE

594 research outputs found

Examining Scientific Writing Styles from the Perspective of Linguistic Complexity

Author: Bu Yi
Ding Ying
Lu Chao
Schnaars Matthew
Torvik Vetle
Wang Jie
Zhang Chengzhi
Publication venue
Publication date: 12/09/2018
Field of study

Publishing articles in high-impact English journals is difficult for scholars around the world, especially for non-native English-speaking scholars (NNESs), most of whom struggle with proficiency in English. In order to uncover the differences in English scientific writing between native English-speaking scholars (NESs) and NNESs, we collected a large-scale data set containing more than 150,000 full-text articles published in PLoS between 2006 and 2015. We divided these articles into three groups according to the ethnic backgrounds of the first and corresponding authors, obtained by Ethnea, and examined the scientific writing styles in English from a two-fold perspective of linguistic complexity: (1) syntactic complexity, including measurements of sentence length and sentence complexity; and (2) lexical complexity, including measurements of lexical diversity, lexical density, and lexical sophistication. The observations suggest marginal differences between groups in syntactical and lexical complexity.Comment: 6 figure

arXiv.org e-Print Archive

IUScholarWorks Open

Mitigating Propagation Failures in PINNs using Evolutionary Sampling

Author: Bu Jie
Daw Arka
Karpatne Anuj
Perdikaris Paris
Wang Sifan
Publication venue
Publication date: 03/10/2022
Field of study

Despite the success of physics-informed neural networks (PINNs) in approximating partial differential equations (PDEs), it is known that PINNs can sometimes fail to converge to the correct solution in problems involving complicated PDEs. This is reflected in several recent studies on characterizing and mitigating the ``failure modes'' of PINNs. While most of these studies have focused on balancing loss functions or adaptively tuning PDE coefficients, what is missing is a thorough understanding of the connection between failure modes of PINNs and sampling strategies used for training PINNs. In this paper, we provide a novel perspective of failure modes of PINNs by hypothesizing that the training of PINNs rely on successful ``propagation'' of solution from initial and/or boundary condition points to interior points. We show that PINNs with poor sampling strategies can get stuck at trivial solutions if there are propagation failures. We additionally demonstrate that propagation failures are characterized by highly imbalanced PDE residual fields where very high residuals are observed over very narrow regions. To mitigate propagation failures, we propose a novel evolutionary sampling (Evo) method that can incrementally accumulate collocation points in regions of high PDE residuals with little to no computational overhead. We provide an extension of Evo to respect the principle of causality while solving time-dependent PDEs. We theoretically analyze the behavior of Evo and empirically demonstrate its efficacy and efficiency in comparison with baselines on a variety of PDE problems.Comment: 34 pages, 46 figures, 2 table

arXiv.org e-Print Archive

Beyond Discriminative Regions: Saliency Maps as Alternatives to CAMs for Weakly Supervised Semantic Segmentation

Author: Bu Jie
Daw Arka
Dutta Amartya
Karpatne Anuj
Maruf M.
Publication venue
Publication date: 21/08/2023
Field of study

In recent years, several Weakly Supervised Semantic Segmentation (WS3) methods have been proposed that use class activation maps (CAMs) generated by a classifier to produce pseudo-ground truths for training segmentation models. While CAMs are good at highlighting discriminative regions (DR) of an image, they are known to disregard regions of the object that do not contribute to the classifier's prediction, termed non-discriminative regions (NDR). In contrast, attribution methods such as saliency maps provide an alternative approach for assigning a score to every pixel based on its contribution to the classification prediction. This paper provides a comprehensive comparison between saliencies and CAMs for WS3. Our study includes multiple perspectives on understanding their similarities and dissimilarities. Moreover, we provide new evaluation metrics that perform a comprehensive assessment of WS3 performance of alternative methods w.r.t. CAMs. We demonstrate the effectiveness of saliencies in addressing the limitation of CAMs through our empirical studies on benchmark datasets. Furthermore, we propose random cropping as a stochastic aggregation technique that improves the performance of saliency, making it a strong alternative to CAM for WS3.Comment: 24 pages, 13 figures, 4 table

arXiv.org e-Print Archive

Stereotypical Images of STEM Professionals and STEM Career Interests in Chinese Elementary School Students

Author: Bu Yanzhe
Lin Yuni
Pan Heliu
Tian Saiqi
Wu Ruying
Zhou Jie
Publication venue: 'Scholink Co, Ltd.'
Publication date: 20/07/2022
Field of study

This study investigated stereotypical images of STEM professions and STEM career interest in Chinese elementary school students. The relationships between stereotypical images of STEM professionals and STEM career interests were also determined. Data for this study was gathered from two elementary schools in China, forming a convenience sample of 318 students enrolled from 3rd to 6th grade. Quantitative data of stereotypes about STEM professionals’ social skills, positive images of STEM professionals, views on STEM implications for society, and STEM career interests were gathered by a questionnaire with Likert scale. Follow-up structured interviews were performed with 12 participants. Elementary school students had strong stereotypes about STEM professionals’ social skills, slightly deep positive image of STEM professionals, and very positive views on STEM implications for society. However, their STEM career interests were not very high. Besides, elementary school students’ stereotypes about STEM professionals’ social skills have minor negative effects on their STEM career interests. Their positive image of STEM professionals and views on STEM implications for society have significant correlation with their STEM career interests

Scholink Journals

Bias Assessment and Mitigation in LLM-based Code Generation

Author: Bu Qingwen
Chen Junjie
Cui Heming
Huang Dong
Xie Xiaofei
Zhang Jie
Publication venue
Publication date: 03/09/2023
Field of study

Utilizing state-of-the-art Large Language Models (LLMs), automatic code generation models play a pivotal role in enhancing the productivity and efficiency of software development coding procedures. As the adoption of LLMs becomes more widespread in software coding ecosystems, a pressing issue has emerged: does the generated code contain social biases, such as those related to age, gender, and race? This issue concerns the integrity, fairness, and ethical foundation of software applications that depend on the code generated by these models, yet is under-explored in the literature. This paper presents a novel bias assessment framework that is specifically designed for code generation tasks. Based on this framework, we conduct an extensive evaluation on the bias of nine state-of-the-art LLM-based code generation models. Our findings reveal that first, 31.45\% to 79.93\% code functions generated by our evaluated code generation models are biased, and 9.68\% to 37.37\% code functions' functionality are affected by the bias, which means biases not only exist in code generation models but in some cases, directly affect the functionality of the generated code, posing risks of unintended and possibly harmful software behaviors. To mitigate bias from code generation models, we propose three mitigation strategies, which can decrease the biased code ratio to a very low level of 0.4\% to 4.57\%

arXiv.org e-Print Archive