Search CORE

3,258 research outputs found

A Survey on Query-based API Recommendation

Author: Belle Alvine Boaye
Harzevili Nima Shiri
Jiang
Shi Lin
Wang Junjie
Wang Song
Wei Moshi
Yang Jinqiu
Zhen Ming
Publication venue
Publication date: 26/01/2024
Field of study

Application Programming Interfaces (APIs) are designed to help developers build software more effectively. Recommending the right APIs for specific tasks has gained increasing attention among researchers and developers in recent years. To comprehensively understand this research domain, we have surveyed to analyze API recommendation studies published in the last 10 years. Our study begins with an overview of the structure of API recommendation tools. Subsequently, we systematically analyze prior research and pose four key research questions. For RQ1, we examine the volume of published papers and the venues in which these papers appear within the API recommendation field. In RQ2, we categorize and summarize the prevalent data sources and collection methods employed in API recommendation research. In RQ3, we explore the types of data and common data representations utilized by API recommendation approaches. We also investigate the typical data extraction procedures and collection approaches employed by the existing approaches. RQ4 delves into the modeling techniques employed by API recommendation approaches, encompassing both statistical and deep learning models. Additionally, we compile an overview of the prevalent ranking strategies and evaluation metrics used for assessing API recommendation tools. Drawing from our survey findings, we identify current challenges in API recommendation research that warrant further exploration, along with potential avenues for future research

arXiv.org e-Print Archive

Recommended from our members

Beyond Similar Code: Leveraging Social Coding Websites

Author: Yang Di
Publication venue: eScholarship, University of California
Publication date: 01/01/2019
Field of study

Programmers often write code with similarity to existing code written somewhere. Code search tools can help developers find similar solutions and identify possible improvements. For code search tools, good search results rely on valid data collection. Social coding websites, such as Question & Answer forum Stack Overflow (SO) and project repository GitHub, are popular destinations when programmers look for how to achieve certain programming tasks. Over the years, SO and GitHub have accumulated an enormous knowledge base of, and around, code. Since these software artifacts are publicly available, it is possible to leverage them in code search tools. This dissertation explores the opportunities of leveraging software artifacts from the social coding websites in searching for not just similar, but related, code. Programmers query SO and GitHub extensively to search for suitable code for reuse, however, not much is known about the usability or quality of the available code from each website. This dissertation first investigates under what circumstances the software artifacts found in social coding websites can be leveraged for purposes other than their immediate use by developers. It points out a number of problems that need to be addressed before those artifacts can be leveraged for code search and development tools. Specifically, triviality, fragility, and duplication, dominate these artifacts. However, when these problems are addressed, there is still a considerable amount of good quality artifacts that can be leveraged.SO and GitHub are not only two separate data resources, moreover, they together, belong to a larger system of software development process: the same users that rely on facilities of GitHub often seeks support on SO for their problems, and return to GitHub to apply the knowledge acquired. This dissertation further studies the crossover of software artifacts between SO and GitHub, and categorizes the adaptations from a SO code snippet to its GitHub counterparts. Existing search tools only recommend other code locations that are syntactically or semantically similar to the given code but do not reason about other kinds of relevant code that a developer should also pay attention to, e.g., auxiliary code to accomplish a complete task. With the good quality software artifacts and crossover between the two systems available, this dissertation presents two approaches that leverage these artifacts in searching for related code. Aroma indexes GitHub projects, takes a partial code snippet as input, searches the corpus for methods containing the partial code snippet, and clusters and intersects the results of the search to recommend. Aroma is evaluated on randomly selected queries created from the GitHub corpus, as well as queries derived from SO code snippets. It recommends related code for error checking and handling, objects configuring, etc. Furthermore, a user study is conducted where industrial developers are asked to complete programming tasks using Aroma and provide feedback. The results indicate that Aroma is capable of retrieving and recommending relevant code snippets efficiently. CodeAid reuses the crossover between SO and GitHub and recommends related code outside of a method body. For each SO snippet as a query, CodeAid retrieves the co-occurring code fragments for its GitHub counterparts and clusters them to recommend common ones. 74% of the common co-occurring code fragments represent related functionality that should be included in code search results. Three major types of relevancy--complementary, supplementary, and alternative methods, are identified

eScholarship - University of California

Extracting Tasks from Customize Portal using Natural Language Processing

Author: Prachi Rayate, Prof. Devendra Singh Thakore
Publication venue: 'Auricle Technologies, Pvt., Ltd.'
Publication date: 30/04/2016
Field of study

In software documentation, product knowledge and software requirement are very important to improve product quality. Within maintenance stage, reading of whole documentation of large corpus won’t be possible by developers. They need to receive software documentation i.e. (development, designing and testing etc.) in a short period of time. Important documents are able to record in software documentation. There live a space between information which developer wants and software documentation. To solve this problem, an approach for extracting relevant task that is based on heuristically matching the structure of the documentation under three phases of software documentation (i.e. documentation, development and testing) is described. Our main idea is that task is extracted automatically from the software documentation, freeing the developer easily get the required data from software documentation with customize portal using WordNet library and machine learning technique. And then the category of task can be generated easily from existing applications using natural language processing. Our approach use WordNet library to identify relevant tasks for calculating frequency of each word which allows developers in a piece of software to discover the word usage

International Journal on Recent and Innovation Trends in Computing and Communication

Context-Sensitive Code Completion

Author: Asaduzzaman Muhammad 1983-
Publication venue: 'University of Saskatchewan Library'
Publication date: 26/04/2018
Field of study

Developers depend extensively on software frameworks and libraries to deliver the products on time. While these frameworks and libraries support software reuse, save development time, and reduce the possibility of introducing errors, they do not come without a cost. Developers need to learn and remember Application Programming Interfaces (APIs) for effectively using those frameworks and libraries. However, APIs are difficult to learn and use. This is mostly due to APIs being large in number, they may not be properly documented, and finally there exist complex relationships between various classes and methods that make APIs difficult to learn. To support developers using those APIs, this thesis focuses on the code completion feature of modern integrated development environments (IDEs). As a developer types code, a code completion system offers a list of completion proposals through a popup menu to navigate and select. This research aims to improve the current state of code completion systems in discovering APIs. Towards this direction, a case study on tracking source code lines has been conducted to better understand capturing code context and to evaluate the benefits of using the simhash technique. Observations from the study have helped to develop a simple, context-sensitive method call completion technique, called CSCC. The technique is compared with a large number of existing code completion techniques. The notion of context proposed in CSCC can even outweigh graph-based statistical language models. Existing method call completion techniques leave the task of completing method parameters to developers. To address this issue, this thesis has investigated how developers complete method parameters. Based on the analysis, a method parameter completion technique, called PARC, has been developed. To date, the technique supports the largest number of expressions to complete method parameters. The technique has been implemented as an Eclipse plug-in that demonstrates the proof of the concept. To meet application-specific requirements, software frameworks need to be customized via extension points. It was observed that developers often pass a framework related object as an argument to an API call to customize default aspects of application frameworks. To enable such customizations, the object can be created by extending a framework class, implementing an interface, or changing the properties of the object via API calls. However, it is both a common and non-trivial task to find all the details related to the customizations. To address this issue, a technique has been developed, called FEMIR. The technique utilizes partial program analysis and graph mining technique to detect, group, and rank framework extension examples. The tool extends existing code completion infrastructure to inform developers about customization choices, enabling them to browse through extension points of a framework, and frequent usages of each point in terms of code examples. Findings from this research and proposed techniques have the potential to help developers to learn different aspects of APIs, thus ease software development, and improve the productivity of developers

eCommons@USASK

University of Saskatchewan Research Archive

Stepwise API usage assistance based on N-gram language models

Author: Prendi Gonçalo Queiroga
Publication venue
Publication date: 01/01/2015
Field of study

Software development requires the use of external Application Programming Interfaces (APIs) in order to reuse libraries and frameworks. Programmers often struggle with unfamiliar APIs due to their lack of resources or less common design. Such difficulties often lead to an incorrect sequences of API calls that may not produce the desired outcome. Language models have shown the ability to capture regularities in text as well as in code. In this work we explore the use of n-gram language models and their ability to capture regularities in API usage through an intrinsic and extrinsic evaluation of these models on some of the most widely used APIs for the Java programming language. To achieve this, several language models were trained over a source code corpora containing several hundreds of GitHub Java projects that use the desired APIs. In order to fully assess the performance of the language models, we have selected APIs from multiple domains and vocabulary sizes. This work allowed us to conclude that n-gram language models are able to capture the API usage patterns due to their low perplexity values and their high overall coverage, going up to 100% in some cases, which encouraged us to create a code completion tool to help programmers stay in the right path when using unknown APIs while allowing for some exploration.O desenvolvimento de software requer a utilização de Application Programming Interfaces (APIs) externas com o objectivo de reutilizar bibliotecas e frameworks. Muitas vezes, os programadores têm dificuldade em utilizar APIs desconhecidas, devido à falta de recursos ou desenho fora do comum. Essas dificuldades provocam inúmeras vezes sequências incorrectas de chamadas às APIs que poderão não produzir o resultado desejado. Os modelos de língua mostraram-se capazes de capturar regularidades em texto, bem como em código. Neste trabalho é explorada a utilização de modelos de língua de n-gramas e a sua capacidade de capturar regularidades na utilização de APIs, através de uma avaliação intrínseca e extrínseca destes modelos em algumas das APIs mais utilizadas na linguagem de programação Java. Para alcançar este objectivo, vários modelos foram treinados sobre repositórios de código do GitHub, contendo centenas de projectos Java que utilizam estas APIs. Com o objectivo de ter uma avaliação completa do desempenho dos modelos de língua, foram seleccionadas APIs de múltiplos domínios e tamanhos de vocabulário. Este trabalho permite concluir que os modelos de língua de n-gramas são capazes de capturar padrões de utilização de APIs devido aos seus baixos valores de perplexidade e a sua alta cobertura, chegando a atingir 100% em alguns casos, o que levou à criação de uma ferramenta de code completion para guiar os programadores na utilização de uma API desconhecida, mas mantendo a possibilidade de a explorar

Repositório Institucional do ISCTE-IUL

Deep Learning for Code Intelligence: Survey, Benchmark and Toolkit

Author: Bi Zhangqian
He Yang
Jin Hai
Sui Yulei
Wan Yao
Xu Guandong
Yu Philip S.
Zhang Hongyu
Zhang Jianguo
Publication venue
Publication date: 30/12/2023
Field of study

Code intelligence leverages machine learning techniques to extract knowledge from extensive code corpora, with the aim of developing intelligent tools to improve the quality and productivity of computer programming. Currently, there is already a thriving research community focusing on code intelligence, with efforts ranging from software engineering, machine learning, data mining, natural language processing, and programming languages. In this paper, we conduct a comprehensive literature review on deep learning for code intelligence, from the aspects of code representation learning, deep learning techniques, and application tasks. We also benchmark several state-of-the-art neural models for code intelligence, and provide an open-source toolkit tailored for the rapid prototyping of deep-learning-based code intelligence models. In particular, we inspect the existing code intelligence models under the basis of code representation learning, and provide a comprehensive overview to enhance comprehension of the present state of code intelligence. Furthermore, we publicly release the source code and data resources to provide the community with a ready-to-use benchmark, which can facilitate the evaluation and comparison of existing and future code intelligence models (https://xcodemind.github.io). At last, we also point out several challenging and promising directions for future research

arXiv.org e-Print Archive