556 research outputs found
Automated Refactoring of Nested-IF Formulae in Spreadsheets
Spreadsheets are the most popular end-user programming software, where
formulae act like programs and also have smells. One well recognized common
smell of spreadsheet formulae is nest-IF expressions, which have low
readability and high cognitive cost for users, and are error-prone during reuse
or maintenance. However, end users usually lack essential programming language
knowledge and skills to tackle or even realize the problem. The previous
research work has made very initial attempts in this aspect, while no effective
and automated approach is currently available.
This paper firstly proposes an AST-based automated approach to systematically
refactoring nest-IF formulae. The general idea is two-fold. First, we detect
and remove logic redundancy on the AST. Second, we identify higher-level
semantics that have been fragmented and scattered, and reassemble the syntax
using concise built-in functions. A comprehensive evaluation has been conducted
against a real-world spreadsheet corpus, which is collected in a leading IT
company for research purpose. The results with over 68,000 spreadsheets with 27
million nest-IF formulae reveal that our approach is able to relieve the smell
of over 99\% of nest-IF formulae. Over 50% of the refactorings have reduced
nesting levels of the nest-IFs by more than a half. In addition, a survey
involving 49 participants indicates that for most cases the participants prefer
the refactored formulae, and agree on that such automated refactoring approach
is necessary and helpful
Distinct host immune responses in recurrent vulvovaginal candidiasis and vulvovaginal candidiasis
Recurrent vulvovaginal candidiasis (RVVC) and vulvovaginal candidiasis (RVVC) are one of the most common gynecological infections, primarily caused by Candida species. Although risk factors of RVVC and VVC have been identified in many studies, antifungal immunological mechanisms are still not fully understood. We performed a 1-year prospective study in a local hospital to monitor 98 patients clinically diagnosed with gynecological Candida infection. The results showed that 20.41% (20/98) are with RVVC, and 79.59% (78/98) patients have VVC. C. albicans accounts for 90% and 96.1% of all strains isolated collected from RVVC and VVC patients, respectively. Antifungal susceptibility testing showed no significant difference in Candida species between RVVC and VVC patients. However, the serum levels of IFN-γ, TNF-α, and IL-17F in the RVVC group were significantly lower than those of the VVC group, while IL-4, IL-6, and IL-10 were higher in the RVVC patients than VVC patients. IL-17A and IL-2 levels were comparable between the two groups. Taken together, our results suggest that the host-immune responses, especially Th1/2 immunity, may play important roles in prognosis of RVVC and VVC
Identification of Free and Bound Exciton States and Their Phase-Dependent Trapping Behavior in Lead Halide Perovskites
In this work we probe the sub-gap energy states within polycrystalline and
single crystal lead halide perovskites to better understand their intrinsic
photophysics behaviors. Through combined temperature and intensity-dependent
optical measurements, we reveal the existence of both free and bound exciton
contributions within the sub-gap energy state manifold. The trapping and
recombination dynamics of these excitons is shown to be strongly dependent on
the structural phase of the perovskite. The orthorhombic phase exhibits
ultrafast exciton trapping and distinct trap emission, while the tetragonal
phase gives low monomolecular recombination velocity and capture cross-sections
(~10-18 cm2). Within the multiphonon transition scenario, this suppression in
charge trapping is caused by the increase in the charge capture activation
energy due to the reduction in electron-lattice interactions, which can be the
origin for the unexpected long carrier lifetime in these material systems.Comment: 5 figure
Towards Efficient Fine-tuning of Pre-trained Code Models: An Experimental Study and Beyond
Recently, fine-tuning pre-trained code models such as CodeBERT on downstream
tasks has achieved great success in many software testing and analysis tasks.
While effective and prevalent, fine-tuning the pre-trained parameters incurs a
large computational cost. In this paper, we conduct an extensive experimental
study to explore what happens to layer-wise pre-trained representations and
their encoded code knowledge during fine-tuning. We then propose efficient
alternatives to fine-tune the large pre-trained code model based on the above
findings. Our experimental study shows that (1) lexical, syntactic and
structural properties of source code are encoded in the lower, intermediate,
and higher layers, respectively, while the semantic property spans across the
entire model. (2) The process of fine-tuning preserves most of the code
properties. Specifically, the basic code properties captured by lower and
intermediate layers are still preserved during fine-tuning. Furthermore, we
find that only the representations of the top two layers change most during
fine-tuning for various downstream tasks. (3) Based on the above findings, we
propose Telly to efficiently fine-tune pre-trained code models via layer
freezing. The extensive experimental results on five various downstream tasks
demonstrate that training parameters and the corresponding time cost are
greatly reduced, while performances are similar or better. Replication package
including source code, datasets, and online Appendix is available at:
\url{https://github.com/DeepSoftwareAnalytics/Telly}.Comment: Accepted by ISSTA 2023 (The 32nd ACM SIGSOFT International Symposium
on Software Testing and Analysis
GPT4Table: Can Large Language Models Understand Structured Table Data? A Benchmark and Empirical Study
Large language models (LLMs) are becoming attractive as few-shot reasoners to
solve Natural Language (NL)-related tasks. However, there is still much to
learn about how well LLMs understand structured data, such as tables. While it
is true that tables can be used as inputs to LLMs with serialization, there is
a lack of comprehensive studies examining whether LLMs can truly comprehend
such data. In this paper, we try to understand this by designing a benchmark to
evaluate the structural understanding capabilities (SUC) of LLMs. The benchmark
we create includes seven tasks, each with its own unique challenges, \eg, cell
lookup, row retrieval, and size detection. We conduct a series of evaluations
on GPT-3.5 and GPT-4. We find that the performance varied depending on several
input choices, including table input format, content order, role prompting, and
partition marks. Drawing from the insights gained through the benchmark
evaluations, we propose \textit{self-augmentation} for effective structural
prompting, such as critical value / range identification using LLMs' internal
knowledge. When combined with carefully chosen input choices, these structural
prompting methods lead to promising improvements in LLM performance on a
variety of tabular tasks, \eg, TabFact(),
HybridQA(), SQA(), Feverous(),
and ToTTo(). We believe that our benchmark and proposed
prompting methods can serve as a simple yet generic selection for future
research.Comment: This paper has been accepted as a full paper at WSDM 202
XInsight: eXplainable Data Analysis Through The Lens of Causality
In light of the growing popularity of Exploratory Data Analysis (EDA),
understanding the underlying causes of the knowledge acquired by EDA is
crucial. However, it remains under-researched. This study promotes a
transparent and explicable perspective on data analysis, called eXplainable
Data Analysis (XDA). For this reason, we present XInsight, a general framework
for XDA. XInsight provides data analysis with qualitative and quantitative
explanations of causal and non-causal semantics. This way, it will
significantly improve human understanding and confidence in the outcomes of
data analysis, facilitating accurate data interpretation and decision making in
the real world. XInsight is a three-module, end-to-end pipeline designed to
extract causal graphs, translate causal primitives into XDA semantics, and
quantify the quantitative contribution of each explanation to a data fact.
XInsight uses a set of design concepts and optimizations to address the
inherent difficulties associated with integrating causality into XDA.
Experiments on synthetic and real-world datasets as well as a user study
demonstrate the highly promising capabilities of XInsight
Detecting local processing unit in drosophila brain by using network theory
Community detection method in network theory was applied to the neuron
network constructed from the image overlapping between neuron pairs to detect
the Local Processing Unit (LPU) automatically in Drosophila brain. 26
communities consistent with the known LPUs, and 13 subdivisions were found.
Besides, 45 tracts were detected and could be discriminated from the LPUs by
analyzing the distribution of participation coefficient P. Furthermore, layer
structures in fan-shaped body (FB) were observed which coincided with the
images shot by the optical devices, and a total of 13 communities were proven
closely related to FB. The method proposed in this work was proven effective to
identify the LPU structure in Drosophila brain irrespectively of any subjective
aspect, and could be applied to the relevant areas extensively
Demonstration of InsightPilot: An LLM-Empowered Automated Data Exploration System
Exploring data is crucial in data analysis, as it helps users understand and
interpret the data more effectively. However, performing effective data
exploration requires in-depth knowledge of the dataset and expertise in data
analysis techniques. Not being familiar with either can create obstacles that
make the process time-consuming and overwhelming for data analysts. To address
this issue, we introduce InsightPilot, an LLM (Large Language Model)-based,
automated data exploration system designed to simplify the data exploration
process. InsightPilot automatically selects appropriate analysis intents, such
as understanding, summarizing, and explaining. Then, these analysis intents are
concretized by issuing corresponding intentional queries (IQueries) to create a
meaningful and coherent exploration sequence. In brief, an IQuery is an
abstraction and automation of data analysis operations, which mimics the
approach of data analysts and simplifies the exploration process for users. By
employing an LLM to iteratively collaborate with a state-of-the-art insight
engine via IQueries, InsightPilot is effective in analyzing real-world
datasets, enabling users to gain valuable insights through natural language
inquiries. We demonstrate the effectiveness of InsightPilot in a case study,
showing how it can help users gain valuable insights from their datasets
- …