3,258 research outputs found
A Survey on Query-based API Recommendation
Application Programming Interfaces (APIs) are designed to help developers
build software more effectively. Recommending the right APIs for specific tasks
has gained increasing attention among researchers and developers in recent
years. To comprehensively understand this research domain, we have surveyed to
analyze API recommendation studies published in the last 10 years. Our study
begins with an overview of the structure of API recommendation tools.
Subsequently, we systematically analyze prior research and pose four key
research questions. For RQ1, we examine the volume of published papers and the
venues in which these papers appear within the API recommendation field. In
RQ2, we categorize and summarize the prevalent data sources and collection
methods employed in API recommendation research. In RQ3, we explore the types
of data and common data representations utilized by API recommendation
approaches. We also investigate the typical data extraction procedures and
collection approaches employed by the existing approaches. RQ4 delves into the
modeling techniques employed by API recommendation approaches, encompassing
both statistical and deep learning models. Additionally, we compile an overview
of the prevalent ranking strategies and evaluation metrics used for assessing
API recommendation tools. Drawing from our survey findings, we identify current
challenges in API recommendation research that warrant further exploration,
along with potential avenues for future research
Recommended from our members
Beyond Similar Code: Leveraging Social Coding Websites
Programmers often write code with similarity to existing code written somewhere. Code search tools can help developers find similar solutions and identify possible improvements. For code search tools, good search results rely on valid data collection. Social coding websites, such as Question & Answer forum Stack Overflow (SO) and project repository GitHub, are popular destinations when programmers look for how to achieve certain programming tasks. Over the years, SO and GitHub have accumulated an enormous knowledge base of, and around, code. Since these software artifacts are publicly available, it is possible to leverage them in code search tools. This dissertation explores the opportunities of leveraging software artifacts from the social coding websites in searching for not just similar, but related, code. Programmers query SO and GitHub extensively to search for suitable code for reuse, however, not much is known about the usability or quality of the available code from each website. This dissertation first investigates under what circumstances the software artifacts found in social coding websites can be leveraged for purposes other than their immediate use by developers. It points out a number of problems that need to be addressed before those artifacts can be leveraged for code search and development tools. Specifically, triviality, fragility, and duplication, dominate these artifacts. However, when these problems are addressed, there is still a considerable amount of good quality artifacts that can be leveraged.SO and GitHub are not only two separate data resources, moreover, they together, belong to a larger system of software development process: the same users that rely on facilities of GitHub often seeks support on SO for their problems, and return to GitHub to apply the knowledge acquired. This dissertation further studies the crossover of software artifacts between SO and GitHub, and categorizes the adaptations from a SO code snippet to its GitHub counterparts. Existing search tools only recommend other code locations that are syntactically or semantically similar to the given code but do not reason about other kinds of relevant code that a developer should also pay attention to, e.g., auxiliary code to accomplish a complete task. With the good quality software artifacts and crossover between the two systems available, this dissertation presents two approaches that leverage these artifacts in searching for related code. Aroma indexes GitHub projects, takes a partial code snippet as input, searches the corpus for methods containing the partial code snippet, and clusters and intersects the results of the search to recommend. Aroma is evaluated on randomly selected queries created from the GitHub corpus, as well as queries derived from SO code snippets. It recommends related code for error checking and handling, objects configuring, etc. Furthermore, a user study is conducted where industrial developers are asked to complete programming tasks using Aroma and provide feedback. The results indicate that Aroma is capable of retrieving and recommending relevant code snippets efficiently. CodeAid reuses the crossover between SO and GitHub and recommends related code outside of a method body. For each SO snippet as a query, CodeAid retrieves the co-occurring code fragments for its GitHub counterparts and clusters them to recommend common ones. 74% of the common co-occurring code fragments represent related functionality that should be included in code search results. Three major types of relevancy--complementary, supplementary, and alternative methods, are identified
Extracting Tasks from Customize Portal using Natural Language Processing
In software documentation, product knowledge and software requirement are very important to improve product quality. Within maintenance stage, reading of whole documentation of large corpus won’t be possible by developers. They need to receive software documentation i.e. (development, designing and testing etc.) in a short period of time. Important documents are able to record in software documentation. There live a space between information which developer wants and software documentation. To solve this problem, an approach for extracting relevant task that is based on heuristically matching the structure of the documentation under three phases of software documentation (i.e. documentation, development and testing) is described. Our main idea is that task is extracted automatically from the software documentation, freeing the developer easily get the required data from software documentation with customize portal using WordNet library and machine learning technique. And then the category of task can be generated easily from existing applications using natural language processing. Our approach use WordNet library to identify relevant tasks for calculating frequency of each word which allows developers in a piece of software to discover the word usage
Context-Sensitive Code Completion
Developers depend extensively on software frameworks and libraries to deliver the products on time. While these frameworks and libraries support software reuse, save development time, and reduce the possibility of introducing errors, they do not come without a cost. Developers need to learn and remember Application Programming Interfaces (APIs) for effectively using those frameworks and libraries. However, APIs are difficult to learn and use. This is mostly due to APIs being large in number, they may not be properly documented, and finally there exist complex relationships between various classes and methods that make APIs difficult to learn. To support developers using those APIs, this thesis focuses on the code completion feature of modern integrated development environments (IDEs). As a developer types code, a code completion system offers a list of completion proposals through a popup menu to navigate and select. This research aims to improve the current state of code completion systems in discovering APIs.
Towards this direction, a case study on tracking source code lines has been conducted to better understand capturing code context and to evaluate the benefits of using the simhash technique. Observations from the study have helped to develop a simple, context-sensitive method call completion technique, called CSCC. The technique is compared with a large number of existing code completion techniques. The notion of context proposed in CSCC can even outweigh graph-based statistical language models. Existing method call completion techniques leave the task of completing method parameters to developers. To address this issue, this thesis has investigated how developers complete method parameters. Based on the analysis, a method parameter completion technique, called PARC, has been developed. To date, the technique supports the largest number of expressions to complete method parameters. The technique has been implemented as an Eclipse plug-in that demonstrates the proof of the concept. To meet application-specific requirements, software frameworks need to be customized via extension points. It was observed that developers often pass a framework related object as an argument to an API call to customize default aspects of application frameworks. To enable such customizations, the object can be created by extending a framework class, implementing an interface, or changing the properties of the object via API calls. However, it is both a common and non-trivial task to find all the details related to the customizations. To address this issue, a technique has been developed, called FEMIR. The technique utilizes partial program analysis and graph mining technique to detect, group, and rank framework extension examples. The tool extends existing code completion infrastructure to inform developers about customization choices, enabling them to browse through extension points of a framework, and frequent usages of each point in terms of code examples. Findings from this research and proposed techniques have the potential to help developers to learn different aspects of APIs, thus ease software development, and improve the productivity of developers
Stepwise API usage assistance based on N-gram language models
Software development requires the use of external Application Programming Interfaces
(APIs) in order to reuse libraries and frameworks. Programmers often
struggle with unfamiliar APIs due to their lack of resources or less common design.
Such difficulties often lead to an incorrect sequences of API calls that may
not produce the desired outcome. Language models have shown the ability to
capture regularities in text as well as in code.
In this work we explore the use of n-gram language models and their ability to
capture regularities in API usage through an intrinsic and extrinsic evaluation of
these models on some of the most widely used APIs for the Java programming
language. To achieve this, several language models were trained over a source
code corpora containing several hundreds of GitHub Java projects that use the
desired APIs. In order to fully assess the performance of the language models, we
have selected APIs from multiple domains and vocabulary sizes.
This work allowed us to conclude that n-gram language models are able to capture
the API usage patterns due to their low perplexity values and their high overall
coverage, going up to 100% in some cases, which encouraged us to create a code
completion tool to help programmers stay in the right path when using unknown
APIs while allowing for some exploration.O desenvolvimento de software requer a utilização de Application Programming
Interfaces (APIs) externas com o objectivo de reutilizar bibliotecas e frameworks.
Muitas vezes, os programadores têm dificuldade em utilizar APIs desconhecidas,
devido à falta de recursos ou desenho fora do comum. Essas dificuldades provocam
inúmeras vezes sequências incorrectas de chamadas às APIs que poderão não
produzir o resultado desejado. Os modelos de língua mostraram-se capazes de
capturar regularidades em texto, bem como em código.
Neste trabalho é explorada a utilização de modelos de língua de n-gramas e a sua
capacidade de capturar regularidades na utilização de APIs, através de uma avaliação
intrínseca e extrínseca destes modelos em algumas das APIs mais utilizadas
na linguagem de programação Java. Para alcançar este objectivo, vários modelos
foram treinados sobre repositórios de código do GitHub, contendo centenas
de projectos Java que utilizam estas APIs. Com o objectivo de ter uma avaliação
completa do desempenho dos modelos de língua, foram seleccionadas APIs
de múltiplos domínios e tamanhos de vocabulário.
Este trabalho permite concluir que os modelos de língua de n-gramas são capazes
de capturar padrões de utilização de APIs devido aos seus baixos valores de perplexidade
e a sua alta cobertura, chegando a atingir 100% em alguns casos, o
que levou à criação de uma ferramenta de code completion para guiar os programadores
na utilização de uma API desconhecida, mas mantendo a possibilidade
de a explorar
Deep Learning for Code Intelligence: Survey, Benchmark and Toolkit
Code intelligence leverages machine learning techniques to extract knowledge
from extensive code corpora, with the aim of developing intelligent tools to
improve the quality and productivity of computer programming. Currently, there
is already a thriving research community focusing on code intelligence, with
efforts ranging from software engineering, machine learning, data mining,
natural language processing, and programming languages. In this paper, we
conduct a comprehensive literature review on deep learning for code
intelligence, from the aspects of code representation learning, deep learning
techniques, and application tasks. We also benchmark several state-of-the-art
neural models for code intelligence, and provide an open-source toolkit
tailored for the rapid prototyping of deep-learning-based code intelligence
models. In particular, we inspect the existing code intelligence models under
the basis of code representation learning, and provide a comprehensive overview
to enhance comprehension of the present state of code intelligence.
Furthermore, we publicly release the source code and data resources to provide
the community with a ready-to-use benchmark, which can facilitate the
evaluation and comparison of existing and future code intelligence models
(https://xcodemind.github.io). At last, we also point out several challenging
and promising directions for future research
- …