40 research outputs found

    Sharing and Exploiting Requirement Decisions

    Get PDF
    Continuous software engineering is an agile development process that puts particular emphasis on the incremental implementation of requirements and their rapid validation through user feedback. This involves frequent and incremental decision making, which needs to be shared within the team. Requirements engineers and developers have to share their decision knowledge since the decisions made are particularly important for future requirements. It has been a vision for long that important decision knowledge gets documented and shared. However, several reasons hinder requirements engineers and developers from doing this, for example, the intrusiveness and overhead of the documentation. Current software development tools provide opportunities to minimize the overhead. Issue tracking and version control systems offer lightweight documentation locations, such as issue comments, commit messages, and code comments. With ConDec, we develop tool support for the continuous management of decision knowledge that uses techniques for natural language processing and integrates into tools that developers often use, for example, into the issue tracking system Jira. In this work, we focus on how ConDec enables requirements engineers and developers to share and exploit decision knowledge regarding requirements

    Improving the Robustness to Data Inconsistency between Training and Testing for Code Completion by Hierarchical Language Model

    Full text link
    In the field of software engineering, applying language models to the token sequence of source code is the state-of-art approach to build a code recommendation system. The syntax tree of source code has hierarchical structures. Ignoring the characteristics of tree structures decreases the model performance. Current LSTM model handles sequential data. The performance of LSTM model will decrease sharply if the noise unseen data is distributed everywhere in the test suite. As code has free naming conventions, it is common for a model trained on one project to encounter many unknown words on another project. If we set many unseen words as UNK just like the solution in natural language processing, the number of UNK will be much greater than the sum of the most frequently appeared words. In an extreme case, just predicting UNK at everywhere may achieve very high prediction accuracy. Thus, such solution cannot reflect the true performance of a model when encountering noise unseen data. In this paper, we only mark a small number of rare words as UNK and show the prediction performance of models under in-project and cross-project evaluation. We propose a novel Hierarchical Language Model (HLM) to improve the robustness of LSTM model to gain the capacity about dealing with the inconsistency of data distribution between training and testing. The newly proposed HLM takes the hierarchical structure of code tree into consideration to predict code. HLM uses BiLSTM to generate embedding for sub-trees according to hierarchies and collects the embedding of sub-trees in context to predict next code. The experiments on inner-project and cross-project data sets indicate that the newly proposed Hierarchical Language Model (HLM) performs better than the state-of-art LSTM model in dealing with the data inconsistency between training and testing and achieves averagely 11.2\% improvement in prediction accuracy

    Continuous Management of Requirement Decisions Using the ConDec Tools

    Get PDF
    Context and motivation: While eliciting, prioritizing, and implementing requirements, requirements engineers and developers continuously make decisions. They establish important decision knowledge that needs to be documented and exploited, i.e., thoroughly managed, so that it contributes to the evolution and future changes of a software system. Question/problem: The management of decision knowledge is difficult for various reasons: 1) The documentation process is an additional effort, i.e., it is intrusive in the development process. 2) The documented knowledge can be of low quality in terms of completeness and consistency. 3) It might be distributed across many documentation locations, such as issue comments and commit messages, and thus difficult to access and use. Principal ideas/results: Continuous software engineering (CSE) emerged as a development process that involves frequent, incremental decision making, implementation, and validation of requirements. During CSE, requirements engineers and developers implicitly document decision knowledge during established practices, such as committing code, working with issues in an issue tracking system, or conducting meetings. That means that CSE offers opportunities for the non-intrusive capturing of decision knowledgein various documentation locations. Contribution: We develop the ConDec tools (https://se.ifi.uni-heidelberg.de/condec.html) that support requirements engineers and developers in documenting and exploiting decision knowledge directly within the tools they use, such as issue tracking and wiki systems, related to various software artifacts, such as requirements and code, and during change impact analysis

    Linking Rare and Popular Tags in CQA Sites

    Get PDF
    Community Question Answer (CQA) sites are popular means for sharing knowledge in the form of questions and answers. These sites rely on tags for many purposes, such as content organi- zation, question routing, content searching, etc. Each CQA site has thousands of tags, making it challenging for the users to manually annotate their question posts with the appropriate tags. Understanding the semantic relationships between tags could aid in the tagging process, thereby properly routing the questions to the experts. Although it is relatively easier to mine the se- mantic relationships amongst the frequently used popular tags, it is difficult to do so for the less commonly used rare tags due to a lack of information about them. Most often, the rare tags are specific concepts subsumed by popular tags. For the questions to be routed to the right experts, they must be annotated with a proper mix of both popular and rare tags. In this paper, we pro- pose a novel approach to mine the semantic relationships between the rare and the popular tags. In addition, we show that the methods that are proposed to mine semantic relationships between popular tags cannot be used for rare tags. Specifically, we identify the top-k popular tags that are semantically related to a given rare tag, which is done using a set of semantic and topological features. Extensive evaluations on CQA datasets show the superiority of our proposed method over state-of-the-art methods

    A survey on software defect prediction using deep learning

    Full text link
    Defect prediction is one of the key challenges in software development and programming language research for improving software quality and reliability. The problem in this area is to properly identify the defective source code with high accuracy. Developing a fault prediction model is a challenging problem, and many approaches have been proposed throughout history. The recent breakthrough in machine learning technologies, especially the development of deep learning techniques, has led to many problems being solved by these methods. Our survey focuses on the deep learning techniques for defect prediction. We analyse the recent works on the topic, study the methods for automatic learning of the semantic and structural features from the code, discuss the open problems and present the recent trends in the field. © 2021 by the authors. Licensee MDPI, Basel, Switzerland

    FedCSD: A Federated Learning Based Approach for Code-Smell Detection

    Full text link
    This paper proposes a Federated Learning Code Smell Detection (FedCSD) approach that allows organizations to collaboratively train federated ML models while preserving their data privacy. These assertions have been supported by three experiments that have significantly leveraged three manually validated datasets aimed at detecting and examining different code smell scenarios. In experiment 1, which was concerned with a centralized training experiment, dataset two achieved the lowest accuracy (92.30%) with fewer smells, while datasets one and three achieved the highest accuracy with a slight difference (98.90% and 99.5%, respectively). This was followed by experiment 2, which was concerned with cross-evaluation, where each ML model was trained using one dataset, which was then evaluated over the other two datasets. Results from this experiment show a significant drop in the model's accuracy (lowest accuracy: 63.80\%) where fewer smells exist in the training dataset, which has a noticeable reflection (technical debt) on the model's performance. Finally, the last and third experiments evaluate our approach by splitting the dataset into 10 companies. The ML model was trained on the company's site, then all model-updated weights were transferred to the server. Ultimately, an accuracy of 98.34% was achieved by the global model that has been trained using 10 companies for 100 training rounds. The results reveal a slight difference in the global model's accuracy compared to the highest accuracy of the centralized model, which can be ignored in favour of the global model's comprehensive knowledge, lower training cost, preservation of data privacy, and avoidance of the technical debt problem.Comment: 17 pages, 7 figures, Journal pape

    Identifying developers’ habits and expectations in copy and paste programming practice

    Full text link
    Máster Universitario en Investigación e Innovación en Inteligencia Computacional y Sistemas InteractivosBoth novice and experienced developers rely more and more in external sources of code to include into their programs by copy and paste code snippets. This behavior differs from the traditional software design approach where cohesion was achieved via a conscious design effort. Due to this fact, it is essential to know how copy and paste programming practices are actually carried out, so that IDEs (Integrated Development Environments) and code recommenders can be designed to fit with developer expectations and habit

    DEVELOPMENT OF PROBLEM-SPECIFIC MODELING LANGUAGE TO SUPPORT SOFTWARE VARIABILITY IN "SMART HOME" SYSTEMS

    Get PDF
    Building conceptual models for software design, in particular for high-tech applications such as smart home systems, is a complex task that significantly affects the efficiency of their development processes. One of the innovative methods of solving this problem is the use of domain-specific modeling languages (DSMLs), which can reduce the time and other project resources required to create such systems. The subject of research in this paper is approaches to the development of DSML for Smart Home systems as a separate class of Internet of Things systems. The purpose of this work is to propose an approach to the development of DSMLs based on a model of variability of the properties of such a system. The following tasks are being solved: analysis of some existing approaches to the creation of DSMLs; construction of a multifaceted classification of requirements for them, application of these requirements to the design of the syntax of a specific DSML-V for the creation of variable software in smart home systems; development of a technological scheme and quantitative metrics for experimental evaluation of the effectiveness of the proposed approach. The following methods are used: variability modeling based on the property model, formal notations for describing the syntax of the DSML-V language, and the use of the open CASE tool metaDepth. Results: a multifaceted classification of requirements for a broad class of DSML languages is built; the basic syntactic constructions of the DSML-V language are developed to support the properties of software variability of "Smart Home" systems; a formal description of such syntax in the Backus-Naur notation is given; a technological scheme for compiling DSML-V specifications into the syntax of the language of the open CASE tool metaDepth is created; the effectiveness of the proposed approach using quantitative metrics is experimentally investigated. Conclusions: the proposed method of developing a specialized problem-oriented language for smart home systems allows for multilevel modeling of the variability properties of its software components and provides an increase in the efficiency of programming such models by about 14% compared to existing approaches
    corecore