77 research outputs found
When Neural Code Completion Models Size up the Situation: Attaining Cheaper and Faster Completion through Dynamic Model Inference
Leveraging recent advancements in large language models, modern neural code
completion models have demonstrated the capability to generate highly accurate
code suggestions. However, their massive size poses challenges in terms of
computational costs and environmental impact, hindering their widespread
adoption in practical scenarios. Dynamic inference emerges as a promising
solution, as it allocates minimal computation during inference while
maintaining the model's performance. In this research, we explore dynamic
inference within the context of code completion. Initially, we conducted an
empirical investigation on GPT-2, focusing on the inference capabilities of
intermediate layers for code completion. We found that 54.4% of tokens can be
accurately generated using just the first layer, signifying significant
computational savings potential. Moreover, despite using all layers, the model
still fails to predict 14.5% of tokens correctly, and the subsequent
completions continued from them are rarely considered helpful, with only a 4.2%
Acceptance Rate. These findings motivate our exploration of dynamic inference
in code completion and inspire us to enhance it with a decision-making
mechanism that stops the generation of incorrect code. We thus propose a novel
dynamic inference method specifically tailored for code completion models. This
method aims not only to produce correct predictions with largely reduced
computation but also to prevent incorrect predictions proactively. Our
extensive evaluation shows that it can averagely skip 1.7 layers out of 16
layers in the models, leading to an 11.2% speedup with only a marginal 1.1%
reduction in ROUGE-L.Comment: Accepted to ICSE2
Learning Procedure-aware Video Representation from Instructional Videos and Their Narrations
The abundance of instructional videos and their narrations over the Internet
offers an exciting avenue for understanding procedural activities. In this
work, we propose to learn video representation that encodes both action steps
and their temporal ordering, based on a large-scale dataset of web
instructional videos and their narrations, without using human annotations. Our
method jointly learns a video representation to encode individual step
concepts, and a deep probabilistic model to capture both temporal dependencies
and immense individual variations in the step ordering. We empirically
demonstrate that learning temporal ordering not only enables new capabilities
for procedure reasoning, but also reinforces the recognition of individual
steps. Our model significantly advances the state-of-the-art results on step
classification (+2.8% / +3.3% on COIN / EPIC-Kitchens) and step forecasting
(+7.4% on COIN). Moreover, our model attains promising results in zero-shot
inference for step classification and forecasting, as well as in predicting
diverse and plausible steps for incomplete procedures. Our code is available at
https://github.com/facebookresearch/ProcedureVRL.Comment: Accepted to CVPR 202
Predicting adsorbed gas capacity of deep shales under high temperature and pressure: Experiments and modeling
Temperature and pressure conditions of deep shale are beyond experiment range, and the amount of adsorbed gas is difficult to determine. To predict the adsorbed gas content of deep shales under formation conditions, isothermal adsorption experiments and model building were conducted on shale samples from Longmaxi Formation in China. A temperature-dependent adsorption model based on the Langmuir equation is proposed, which can be well-fitted by observed isotherms with a high correlation coefficient. Based on the fitted parameters at 303.15 K, the isothermal adsorption curves at 333.15 K, 363.15 K, and 393.15 K are predicted, showing a good agreement with experimental curves available. Compared with previous prediction methods, the biggest advantage of the proposed method is that it can be carried out only based on one-time isothermal adsorption experiment. Based on the predictions, the downward trend of the excess adsorption curves will slow down under high temperature and pressure conditions, and when the pressure reaches a certain level (> 80 MPa), the temperature has little effect on the excess adsorption capacity. While for absolute adsorption, the gas adsorption reaches saturation much slowly at high temperature, it can also reach saturation under formation pressure. Under the burial depth of marine shale, temperature plays a major role in controlling the adsorbed gas, resulting in the decrease of adsorbed gas content in deep shale, and its ratio will further decrease as the depth increases.Cited as: Zhou, S., Wang, H., Li, B., Li, S., Sepehrnoori, K., Cai, J. Predicting adsorbed gas capacity of deep shales under high temperature and pressure: Experiments and modeling. Advances in Geo-Energy Research, 2022, 6(6): 482-491. https://doi.org/10.46690/ager.2022.06.0
Large Language Models are Few-Shot Summarizers: Multi-Intent Comment Generation via In-Context Learning
Code comment generation aims at generating natural language descriptions for
a code snippet to facilitate developers' program comprehension activities.
Despite being studied for a long time, a bottleneck for existing approaches is
that given a code snippet, they can only generate one comment while developers
usually need to know information from diverse perspectives such as what is the
functionality of this code snippet and how to use it. To tackle this
limitation, this study empirically investigates the feasibility of utilizing
large language models (LLMs) to generate comments that can fulfill developers'
diverse intents. Our intuition is based on the facts that (1) the code and its
pairwise comment are used during the pre-training process of LLMs to build the
semantic connection between the natural language and programming language, and
(2) comments in the real-world projects, which are collected for the
pre-training, usually contain different developers' intents. We thus postulate
that the LLMs can already understand the code from different perspectives after
the pre-training. Indeed, experiments on two large-scale datasets demonstrate
the rationale of our insights: by adopting the in-context learning paradigm and
giving adequate prompts to the LLM (e.g., providing it with ten or more
examples), the LLM can significantly outperform a state-of-the-art supervised
learning approach on generating comments with multiple intents. Results also
show that customized strategies for constructing the prompts and
post-processing strategies for reranking the results can both boost the LLM's
performances, which shed light on future research directions for using LLMs to
achieve comment generation.Comment: Accepted by the 46th International Conference on Software Engineering
(ICSE 2024
PEELER: Learning to Effectively Predict Flakiness without Running Tests
—Regression testing is a widely adopted approach to expose change-induced bugs as well as to verify the correctness/robustness of code in modern software development settings. Unfortunately, the occurrence of flaky tests leads to a significant increase in the cost of regression testing and eventually reduces the productivity of developers (i.e., their ability to find and fix real problems). State-of-the-art approaches leverage dynamic test information obtained through expensive re-execution of test
cases to effectively identify flaky tests. Towards accounting for scalability constraints, some recent approaches have built on static test case features, but fall short on effectiveness. In this paper, we introduce PEELER, a new fully static approach for predicting flaky tests through exploring a representation of test cases based on the data dependency relations. The predictor is then trained as a neural network based model, which achieves at the same time scalability (because it does not require any test execution), effectiveness (because it exploits relevant test dependency features), and practicality (because it can be applied in the wild to find new flaky tests). Experimental validation
on 17,532 test cases from 21 Java projects shows that PEELER outperforms the state-of-the-art FlakeFlagger by around 20 percentage points: we catch 22% more flaky tests while yielding
51% less false positives. Finally, in a live study with projects in-the-wild, we reported to developers 21 flakiness cases, among which 12 have already been confirmed by developers as being
indeed flaky
Natural Language to Code: How Far Are We?
peer reviewedA longstanding dream in software engineering research is to devise
e ective approaches for automating development tasks based on
developers’ informally-speci ed intentions. Such intentions are
generally in the form of natural language descriptions. In recent
literature, a number of approaches have been proposed to automate tasks such as code search and even code generation based on
natural language inputs. While these approaches vary in terms of
technical designs, their objective is the same: transforming a developer’s intention into source code. The literature, however, lacks a
comprehensive understanding towards the e ectiveness of existing techniques as well as their complementarity to each other. We
propose to ll this gap through a large-scale empirical study where
we systematically evaluate natural language to code techniques.
Speci cally, we consider six state-of-the-art techniques targeting
code search, and four targeting code generation. Through extensive
evaluations on a dataset of 22K+ natural language queries, our study
reveals the following major ndings: (1) code search techniques
based on model pre-training are so far the most e ective while
code generation techniques can also provide promising results;
(2) complementarity widely exists among the existing techniques;
and (3) combining the ten techniques together can enhance the performance for 35% compared with the most e ective standalone
technique. Finally, we propose a post-processing strategy to automatically integrate di erent techniques based on their generated
code. Experimental results show that our devised strategy is both
e ective and extensible
- …