16 research outputs found
Determinants of quality, latency, and amount of Stack Overflow answers about recent Android APIs.
Stack Overflow is a popular crowdsourced question and answer website for programming-related issues. It is an invaluable resource for software developers; on average, questions posted there get answered in minutes to an hour. Questions about well established topics, e.g., the coercion operator in C++, or the difference between canonical and class names in Java, get asked often in one form or another, and answered very quickly. On the other hand, questions on previously unseen or niche topics take a while to get a good answer. This is particularly the case with questions about current updates to or the introduction of new application programming interfaces (APIs). In a hyper-competitive online market, getting good answers to current programming questions sooner could increase the chances of an app getting released and used. So, can developers anyhow, e.g., hasten the speed to good answers to questions about new APIs? Here, we empirically study Stack Overflow questions pertaining to new Android APIs and their associated answers. We contrast the interest in these questions, their answer quality, and timeliness of their answers to questions about old APIs. We find that Stack Overflow answerers in general prioritize with respect to currentness: questions about new APIs do get more answers, but good quality answers take longer. We also find that incentives in terms of question bounties, if used appropriately, can significantly shorten the time and increase answer quality. Interestingly, no operationalization of bounty amount shows significance in our models. In practice, our findings confirm the value of bounties in enhancing expert participation. In addition, they show that the Stack Overflow style of crowdsourcing, for all its glory in providing answers about established programming knowledge, is less effective with new API questions
Improving Code Example Recommendations on Informal Documentation Using BERT and Query-Aware LSH: A Comparative Study
Our research investigates the recommendation of code examples to aid software
developers, a practice that saves developers significant time by providing
ready-to-use code snippets. The focus of our study is Stack Overflow, a
commonly used resource for coding discussions and solutions, particularly in
the context of the Java programming language.
We applied BERT, a powerful Large Language Model (LLM) that enables us to
transform code examples into numerical vectors by extracting their semantic
information. Once these numerical representations are prepared, we identify
Approximate Nearest Neighbors (ANN) using Locality-Sensitive Hashing (LSH). Our
research employed two variants of LSH: Random Hyperplane-based LSH and
Query-Aware LSH. We rigorously compared these two approaches across four
parameters: HitRate, Mean Reciprocal Rank (MRR), Average Execution Time, and
Relevance.
Our study revealed that the Query-Aware (QA) approach showed superior
performance over the Random Hyperplane-based (RH) method. Specifically, it
exhibited a notable improvement of 20% to 35% in HitRate for query pairs
compared to the RH approach. Furthermore, the QA approach proved significantly
more time-efficient, with its speed in creating hashing tables and assigning
data samples to buckets being at least four times faster. It can return code
examples within milliseconds, whereas the RH approach typically requires
several seconds to recommend code examples. Due to the superior performance of
the QA approach, we tested it against PostFinder and FaCoY, the
state-of-the-art baselines. Our QA method showed comparable efficiency proving
its potential for effective code recommendation
An Unsupervised Approach for Discovering Relevant Tutorial Fragments for APIs
Developers increasingly rely on API tutorials to facilitate software
development. However, it remains a challenging task for them to discover
relevant API tutorial fragments explaining unfamiliar APIs. Existing supervised
approaches suffer from the heavy burden of manually preparing corpus-specific
annotated data and features. In this study, we propose a novel unsupervised
approach, namely Fragment Recommender for APIs with PageRank and Topic model
(FRAPT). FRAPT can well address two main challenges lying in the task and
effectively determine relevant tutorial fragments for APIs. In FRAPT, a
Fragment Parser is proposed to identify APIs in tutorial fragments and replace
ambiguous pronouns and variables with related ontologies and API names, so as
to address the pronoun and variable resolution challenge. Then, a Fragment
Filter employs a set of nonexplanatory detection rules to remove
non-explanatory fragments, thus address the non-explanatory fragment
identification challenge. Finally, two correlation scores are achieved and
aggregated to determine relevant fragments for APIs, by applying both topic
model and PageRank algorithm to the retained fragments. Extensive experiments
over two publicly open tutorial corpora show that, FRAPT improves the
state-of-the-art approach by 8.77% and 12.32% respectively in terms of
F-Measure. The effectiveness of key components of FRAPT is also validated.Comment: 11 pages, 8 figures, In Proc. of 39rd IEEE International Conference
on Software Engineering (ICSE'17
Let's Discover More API Relations: A Large Language Model-based AI Chain for Unsupervised API Relation Inference
APIs have intricate relations that can be described in text and represented
as knowledge graphs to aid software engineering tasks. Existing relation
extraction methods have limitations, such as limited API text corpus and
affected by the characteristics of the input text.To address these limitations,
we propose utilizing large language models (LLMs) (e.g., GPT-3.5) as a neural
knowledge base for API relation inference. This approach leverages the entire
Web used to pre-train LLMs as a knowledge base and is insensitive to the
context and complexity of input texts. To ensure accurate inference, we design
our analytic flow as an AI Chain with three AI modules: API FQN Parser, API
Knowledge Extractor, and API Relation Decider. The accuracy of the API FQN
parser and API Relation Decider module are 0.81 and 0.83, respectively. Using
the generative capacity of the LLM and our approach's inference capability, we
achieve an average F1 value of 0.76 under the three datasets, significantly
higher than the state-of-the-art method's average F1 value of 0.40. Compared to
CoT-based method, our AI Chain design improves the inference reliability by
67%, and the AI-crowd-intelligence strategy enhances the robustness of our
approach by 26%
Analyzing the Predictability of Source Code and its Application in Creating Parallel Corpora for English-to-Code Statistical Machine Translation
Analyzing source code using computational linguistics and exploiting the linguistic properties of source code have recently become popular topics in the domain of software engineering. In the first part of the thesis, we study the predictability of source code and determine how well source code can be represented using language models developed for natural language processing. In the second part, we study how well English discussions of source code can be aligned with code elements to create parallel corpora for English-to-code statistical machine translation. This work is organized as a “manuscript” thesis whereby each core chapter constitutes a submitted paper.
The first part replicates recent works that have concluded that software is more repetitive and predictable, i.e. more natural, than English texts. We find that much of the apparent “naturalness”
is artificial and is the result of language specific tokens. For example, the syntax of a language, especially the separators e.g., semi-colons and brackets, make up for 59% of all uses of Java tokens in our corpus. Furthermore, 40% of all 2-grams end in a separator, implying that a model for autocompleting the next token, would have a trivial separator as top suggestion 40% of the time. By using the standard NLP practice of eliminating punctuation (e.g., separators) and stopwords (e.g., keywords) we find that code is less repetitive and predictable than was suggested by previous work. We replicate this result across 7 programming languages.
Continuing this work, we find that unlike the code written for a particular project, API code usage is similar across projects. For example a file is opened and closed in the same manner irrespective of domain. When we restrict our n-grams to those contained in the Java API we find that the entropy for 2-grams is significantly lower than the English corpus. This repetition perhaps explains the successful literature on API usage suggestion and autocompletion.
We then study the impact of the representation of code on repetition. The n-gram model assumes that the current token can be predicted by the sequence of n previous tokens. When we extract program graphs of size 2, 3, and 4 nodes we see that the abstract graph representation is much more concise and repetitive than the n-gram representations of the same code. This suggests that future work should focus on graphs that include control and data flow dependencies and not linear sequences of tokens.
The second part of this thesis focuses cleaning English and code corpora to aid in machine translation. Generating source code API sequences from an English query using Machine Translation (MT) has gained much interest in recent years. For any kind of MT, the model needs to be trained on a parallel corpus. We clean StackOverflow, one of the most popular online discussion forums for programmers, to generate a parallel English-Code corpora. We contrast three data cleaning approaches: standard NLP, title only, and software task. We evaluate the quality of each corpus for MT. We measure the corpus size, percentage of unique tokens, and per-word maximum likelihood
alignment entropy. While many works have shown that code is repetitive and predictable, we find that English discussions of code are also repetitive. Creating a maximum likelihood MT model, we find that English words map to a small number of specific code elements which partially explains the success of using StackOverflow for search and other tasks in the software engineering literature and paves the way for MT. Our scripts and corpora are publicly available