Search CORE

556 research outputs found

Automated Refactoring of Nested-IF Formulae in Spreadsheets

Author: Han Shi
Hao Dan
Zhang Dongmei
Zhang Jie
Zhang Lu
Publication venue
Publication date: 28/12/2017
Field of study

Spreadsheets are the most popular end-user programming software, where formulae act like programs and also have smells. One well recognized common smell of spreadsheet formulae is nest-IF expressions, which have low readability and high cognitive cost for users, and are error-prone during reuse or maintenance. However, end users usually lack essential programming language knowledge and skills to tackle or even realize the problem. The previous research work has made very initial attempts in this aspect, while no effective and automated approach is currently available. This paper firstly proposes an AST-based automated approach to systematically refactoring nest-IF formulae. The general idea is two-fold. First, we detect and remove logic redundancy on the AST. Second, we identify higher-level semantics that have been fragmented and scattered, and reassemble the syntax using concise built-in functions. A comprehensive evaluation has been conducted against a real-world spreadsheet corpus, which is collected in a leading IT company for research purpose. The results with over 68,000 spreadsheets with 27 million nest-IF formulae reveal that our approach is able to relieve the smell of over 99\% of nest-IF formulae. Over 50% of the refactorings have reduced nesting levels of the nest-IFs by more than a half. In addition, a survey involving 49 participants indicates that for most cases the participants prefer the refactored formulae, and agree on that such automated refactoring approach is necessary and helpful

arXiv.org e-Print Archive

Crossref

Distinct host immune responses in recurrent vulvovaginal candidiasis and vulvovaginal candidiasis

Author: Biao Chen
Dongmei Li
Dongmei Shi
Dongmei Shi
Gai Ge
Ning Zhang
Zhiya Yang
Publication venue: 'Frontiers Media SA'
Publication date: 01/07/2022
Field of study

Recurrent vulvovaginal candidiasis (RVVC) and vulvovaginal candidiasis (RVVC) are one of the most common gynecological infections, primarily caused by Candida species. Although risk factors of RVVC and VVC have been identified in many studies, antifungal immunological mechanisms are still not fully understood. We performed a 1-year prospective study in a local hospital to monitor 98 patients clinically diagnosed with gynecological Candida infection. The results showed that 20.41% (20/98) are with RVVC, and 79.59% (78/98) patients have VVC. C. albicans accounts for 90% and 96.1% of all strains isolated collected from RVVC and VVC patients, respectively. Antifungal susceptibility testing showed no significant difference in Candida species between RVVC and VVC patients. However, the serum levels of IFN-γ, TNF-α, and IL-17F in the RVVC group were significantly lower than those of the VVC group, while IL-4, IL-6, and IL-10 were higher in the RVVC patients than VVC patients. IL-17A and IL-2 levels were comparable between the two groups. Taken together, our results suggest that the host-immune responses, especially Th1/2 immunity, may play important roles in prognosis of RVVC and VVC

Directory of Open Access Journals

Identification of Free and Bound Exciton States and Their Phase-Dependent Trapping Behavior in Lead Halide Perovskites

Author: Jasieniak Jacek J.
Li Dongmei
Li Yiming
Li Yusheng
Luo Yanhong
Meng Qingbo
Shi Jiangjian
Wu Huijue
Zhang Huiyin
Publication venue: 'Royal Society of Chemistry (RSC)'
Publication date: 27/10/2017
Field of study

In this work we probe the sub-gap energy states within polycrystalline and single crystal lead halide perovskites to better understand their intrinsic photophysics behaviors. Through combined temperature and intensity-dependent optical measurements, we reveal the existence of both free and bound exciton contributions within the sub-gap energy state manifold. The trapping and recombination dynamics of these excitons is shown to be strongly dependent on the structural phase of the perovskite. The orthorhombic phase exhibits ultrafast exciton trapping and distinct trap emission, while the tetragonal phase gives low monomolecular recombination velocity and capture cross-sections (~10-18 cm2). Within the multiphonon transition scenario, this suppression in charge trapping is caused by the increase in the charge capture activation energy due to the reduction in electron-lattice interactions, which can be the origin for the unexpected long carrier lifetime in these material systems.Comment: 5 figure

arXiv.org e-Print Archive

Monash University Research Portal

Towards Efficient Fine-tuning of Pre-trained Code Models: An Experimental Study and Beyond

Author: Du Lun
Han Shi
Shi Ensheng
Sun Hongbin
Wang Yanlin
Zhang Dongmei
Zhang Hongyu
Publication venue
Publication date: 11/04/2023
Field of study

Recently, fine-tuning pre-trained code models such as CodeBERT on downstream tasks has achieved great success in many software testing and analysis tasks. While effective and prevalent, fine-tuning the pre-trained parameters incurs a large computational cost. In this paper, we conduct an extensive experimental study to explore what happens to layer-wise pre-trained representations and their encoded code knowledge during fine-tuning. We then propose efficient alternatives to fine-tune the large pre-trained code model based on the above findings. Our experimental study shows that (1) lexical, syntactic and structural properties of source code are encoded in the lower, intermediate, and higher layers, respectively, while the semantic property spans across the entire model. (2) The process of fine-tuning preserves most of the code properties. Specifically, the basic code properties captured by lower and intermediate layers are still preserved during fine-tuning. Furthermore, we find that only the representations of the top two layers change most during fine-tuning for various downstream tasks. (3) Based on the above findings, we propose Telly to efficiently fine-tune pre-trained code models via layer freezing. The extensive experimental results on five various downstream tasks demonstrate that training parameters and the corresponding time cost are greatly reduced, while performances are similar or better. Replication package including source code, datasets, and online Appendix is available at: \url{https://github.com/DeepSoftwareAnalytics/Telly}.Comment: Accepted by ISSTA 2023 (The 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis

arXiv.org e-Print Archive

GPT4Table: Can Large Language Models Understand Structured Table Data? A Benchmark and Empirical Study

Author: Han Shi
Sui Yuan
Zhang Dongmei
Zhou Mengyu
Zhou Mingjie
Publication venue
Publication date: 15/11/2023
Field of study

Large language models (LLMs) are becoming attractive as few-shot reasoners to solve Natural Language (NL)-related tasks. However, there is still much to learn about how well LLMs understand structured data, such as tables. While it is true that tables can be used as inputs to LLMs with serialization, there is a lack of comprehensive studies examining whether LLMs can truly comprehend such data. In this paper, we try to understand this by designing a benchmark to evaluate the structural understanding capabilities (SUC) of LLMs. The benchmark we create includes seven tasks, each with its own unique challenges, \eg, cell lookup, row retrieval, and size detection. We conduct a series of evaluations on GPT-3.5 and GPT-4. We find that the performance varied depending on several input choices, including table input format, content order, role prompting, and partition marks. Drawing from the insights gained through the benchmark evaluations, we propose \textit{self-augmentation} for effective structural prompting, such as critical value / range identification using LLMs' internal knowledge. When combined with carefully chosen input choices, these structural prompting methods lead to promising improvements in LLM performance on a variety of tabular tasks, \eg, TabFact(

\uparrow2.31\%

), HybridQA(

\uparrow2.13\%

), SQA(

\uparrow2.72\%

), Feverous(

\uparrow0.84\%

), and ToTTo(

\uparrow5.68\%

). We believe that our benchmark and proposed prompting methods can serve as a simple yet generic selection for future research.Comment: This paper has been accepted as a full paper at WSDM 202

arXiv.org e-Print Archive

XInsight: eXplainable Data Analysis Through The Lens of Causality

Author: Ding Rui
Han Shi
Ma Pingchuan
Wang Shuai
Zhang Dongmei
Publication venue
Publication date: 30/05/2023
Field of study

In light of the growing popularity of Exploratory Data Analysis (EDA), understanding the underlying causes of the knowledge acquired by EDA is crucial. However, it remains under-researched. This study promotes a transparent and explicable perspective on data analysis, called eXplainable Data Analysis (XDA). For this reason, we present XInsight, a general framework for XDA. XInsight provides data analysis with qualitative and quantitative explanations of causal and non-causal semantics. This way, it will significantly improve human understanding and confidence in the outcomes of data analysis, facilitating accurate data interpretation and decision making in the real world. XInsight is a three-module, end-to-end pipeline designed to extract causal graphs, translate causal primitives into XDA semantics, and quantify the quantitative contribution of each explanation to a data fact. XInsight uses a set of design concepts and optimizations to address the inherent difficulties associated with integrating causality into XDA. Experiments on synthetic and real-world datasets as well as a user study demonstrate the highly promising capabilities of XInsight

arXiv.org e-Print Archive

Detecting local processing unit in drosophila brain by using network theory

Author: Chiang Annshyn
Lin Yenjen
Lo Chungchuan
Shi Dongmei
Shih Chitin
Publication venue
Publication date: 01/12/2015
Field of study

Community detection method in network theory was applied to the neuron network constructed from the image overlapping between neuron pairs to detect the Local Processing Unit (LPU) automatically in Drosophila brain. 26 communities consistent with the known LPUs, and 13 subdivisions were found. Besides, 45 tracts were detected and could be discriminated from the LPUs by analyzing the distribution of participation coefficient P. Furthermore, layer structures in fan-shaped body (FB) were observed which coincided with the images shot by the optical devices, and a total of 13 communities were proven closely related to FB. The method proposed in this work was proven effective to identify the LPU structure in Drosophila brain irrespectively of any subjective aspect, and could be applied to the relevant areas extensively

arXiv.org e-Print Archive

MPG.PuRe

Demonstration of InsightPilot: An LLM-Empowered Automated Data Exploration System

Author: Ding Rui
Han Shi
Ma Pingchuan
Wang Shuai
Zhang Dongmei
Publication venue
Publication date: 12/11/2023
Field of study

Exploring data is crucial in data analysis, as it helps users understand and interpret the data more effectively. However, performing effective data exploration requires in-depth knowledge of the dataset and expertise in data analysis techniques. Not being familiar with either can create obstacles that make the process time-consuming and overwhelming for data analysts. To address this issue, we introduce InsightPilot, an LLM (Large Language Model)-based, automated data exploration system designed to simplify the data exploration process. InsightPilot automatically selects appropriate analysis intents, such as understanding, summarizing, and explaining. Then, these analysis intents are concretized by issuing corresponding intentional queries (IQueries) to create a meaningful and coherent exploration sequence. In brief, an IQuery is an abstraction and automation of data analysis operations, which mimics the approach of data analysts and simplifies the exploration process for users. By employing an LLM to iteratively collaborate with a state-of-the-art insight engine via IQueries, InsightPilot is effective in analyzing real-world datasets, enabling users to gain valuable insights through natural language inquiries. We demonstrate the effectiveness of InsightPilot in a case study, showing how it can help users gain valuable insights from their datasets

arXiv.org e-Print Archive