398 research outputs found

    Mapping Language to Code in Programmatic Context

    Full text link
    Source code is rarely written in isolation. It depends significantly on the programmatic context, such as the class that the code would reside in. To study this phenomenon, we introduce the task of generating class member functions given English documentation and the programmatic context provided by the rest of the class. This task is challenging because the desired code can vary greatly depending on the functionality the class provides (e.g., a sort function may or may not be available when we are asked to "return the smallest element" in a particular member variable list). We introduce CONCODE, a new large dataset with over 100,000 examples consisting of Java classes from online code repositories, and develop a new encoder-decoder architecture that models the interaction between the method documentation and the class environment. We also present a detailed error analysis suggesting that there is significant room for future work on this task.Comment: Accepted at EMNLP 201

    Source-code Summarization of Java Methods Using Control-Flow Graphs

    Get PDF
    Source-code summarization aims to generate natural-language summaries for software artifacts (e.g., method and class). % Researchers have been exploring source-code summarization as one research area in software engineering. Various research works showed the use of text-retrieval-based techniques, heuristic-based techniques, and data-driven techniques for source-code summarization. In data-driven techniques, researchers used a sequence of source-code tokens and other representations of source code (e.g., application programming interface (API) sequences and abstract syntax tree (AST)) as an input to source-code summarization models. According to the current published literature in source-code summarization, researchers have not explored the use of a sequence extracted from control-flow graph that shows a contextual relationship between program instructions based on control-flow relationships for source-code summarization models. In this work, we employ control-flow graph representations to increase the prediction accuracy of a bi-directional long-short term memory (LSTM) source-code summarization model in terms of describing the functionality of Java methods. We use an attention-based bi-directional LSTM sequence-to-sequence model to show the use of linearized control-flow graph sequences alongside a sequence of source-code tokens. We compared our model with the current state-of-the-art and with or without a linearized control-flow graph. We created a source-code summarization dataset to train and evaluate our approach and conducted expert and automatic evaluations. In the expert evaluation, the participants gave rating for summaries generated by each model in terms of correctly describing the functionality of a Java method. Our models outperformed the state-of-the-art in terms of the mean average-rating. Also, the expert evaluation showed us the model benefit from the structural information. In the automatic evaluation, we found that the use of control-flow graphs does not increase the prediction accuracy of a bi-directional LSTM model in terms of BLEU score compared to a bi-directional LSTM model that does not use control-flow graphs. However, we found our source-code summarization approach that uses a control-flow graph as an additional representation better than encoding AST in graph neural networks. Overall, we improved the state-of-the-art for method summarization with our models that take sequence of method tokens with and without a control-flow graph

    Technical Debt Analysis and Project Architecturization of a Jenkins Platform based on Groovy

    Get PDF
    Actualment, el Deute Tècnic (DT) és un problema latent a la gran majoria de projectes software. A causa del ràpid creixement del mercat, la visió empresarial està cada cop més enfocada a reduir el time-to-market del producte, deixant de banda la qualitat interna del seu codi. Per això, el cost global anual de mantenir aquest codi de mala qualitat, puja aproximadament a 81.000 € milions. La tesi se centra a analitzar profundament una plataforma corporativa amb molt DT i definir-ne una nova arquitectura, tenint en compte els seus requeriments i prioritzant la qualitat del producte mentre es redueix el seu deute tècnic. Per aconseguir això, es faran servir tècniques de refactorització, implementació de noves funcionalitats i la definició de protocols interns per a l'equip. A la tesi queden documentats els passos a seguir per analitzar i rearquitecturitzar un projecte amb unes característiques similars. A més, es crea una forta consciència sobre el deute tècnic i els seus problemes, una qüestió que afecta directament el codi i indirectament la salut mental dels seus desenvolupadors.Currently, Technical Debt (TD) is a latent problem in the vast majority of software projects. Due to the rapid growth of the market, its business vision is focusing on reducing the time-to-market of the product, leaving aside the internal quality of its code. As a result, the global annual cost of maintaining such poor quality code comes to approximately $85 billion. The thesis focuses on deeply analyzing a corporate platform with a heavy TD and defining a new architecture for it based on its requirements, prioritizing the quality of the product while reducing its technical debt. To achieve this, I will use refactoring techniques, implementation of new functionalities and the definition of internal protocols for the team. In the thesis, the steps to follow to analyze and re-architect a project with similar characteristics are documented. In addition, strong awareness is raised regarding the technical debt and its problems, an issue that directly affects the code and indirectly impacts the mental health of its developers
    • …
    corecore