40 research outputs found
Sharing and Exploiting Requirement Decisions
Continuous software engineering is an agile development process that puts particular emphasis on the incremental implementation of requirements and their rapid validation through user feedback. This involves frequent and incremental decision making, which needs to be shared within the team. Requirements engineers and developers have to share their decision knowledge since the decisions made are particularly important for future requirements. It has been a vision for long that important decision knowledge gets documented and shared. However, several reasons hinder requirements engineers and developers from doing this, for example, the intrusiveness and overhead of the documentation. Current software development tools provide opportunities to minimize the overhead. Issue tracking and version control systems offer lightweight documentation locations, such as issue comments, commit messages, and code comments. With ConDec, we develop tool support for the continuous management of decision knowledge that uses techniques for natural language processing and integrates into tools that developers often use, for example, into the issue tracking system Jira. In this work, we focus on how ConDec enables requirements engineers and developers to share and exploit decision knowledge regarding requirements
Improving the Robustness to Data Inconsistency between Training and Testing for Code Completion by Hierarchical Language Model
In the field of software engineering, applying language models to the token
sequence of source code is the state-of-art approach to build a code
recommendation system. The syntax tree of source code has hierarchical
structures. Ignoring the characteristics of tree structures decreases the model
performance. Current LSTM model handles sequential data. The performance of
LSTM model will decrease sharply if the noise unseen data is distributed
everywhere in the test suite. As code has free naming conventions, it is common
for a model trained on one project to encounter many unknown words on another
project. If we set many unseen words as UNK just like the solution in natural
language processing, the number of UNK will be much greater than the sum of the
most frequently appeared words. In an extreme case, just predicting UNK at
everywhere may achieve very high prediction accuracy. Thus, such solution
cannot reflect the true performance of a model when encountering noise unseen
data. In this paper, we only mark a small number of rare words as UNK and show
the prediction performance of models under in-project and cross-project
evaluation. We propose a novel Hierarchical Language Model (HLM) to improve the
robustness of LSTM model to gain the capacity about dealing with the
inconsistency of data distribution between training and testing. The newly
proposed HLM takes the hierarchical structure of code tree into consideration
to predict code. HLM uses BiLSTM to generate embedding for sub-trees according
to hierarchies and collects the embedding of sub-trees in context to predict
next code. The experiments on inner-project and cross-project data sets
indicate that the newly proposed Hierarchical Language Model (HLM) performs
better than the state-of-art LSTM model in dealing with the data inconsistency
between training and testing and achieves averagely 11.2\% improvement in
prediction accuracy
Continuous Management of Requirement Decisions Using the ConDec Tools
Context and motivation: While eliciting, prioritizing, and implementing requirements, requirements engineers and developers continuously make decisions. They establish important decision knowledge that needs to be documented and exploited, i.e., thoroughly managed, so that it contributes to the evolution and future changes of a software system.
Question/problem: The management of decision knowledge is difficult for various reasons: 1) The documentation process is an additional effort, i.e., it is intrusive in the development process. 2) The documented knowledge can be of low quality in terms of completeness and consistency. 3) It might be distributed across many documentation locations, such as issue comments and commit messages, and thus difficult to access and use.
Principal ideas/results: Continuous software engineering (CSE) emerged as a development process that involves frequent, incremental decision making, implementation, and validation of requirements. During CSE, requirements engineers and developers implicitly document decision knowledge during established practices, such as committing code, working with issues in an issue tracking system, or conducting meetings. That means that CSE offers opportunities for the non-intrusive capturing of decision knowledgein various documentation locations.
Contribution: We develop the ConDec tools (https://se.ifi.uni-heidelberg.de/condec.html) that support requirements engineers and developers in documenting and exploiting decision knowledge directly within the tools they use, such as issue tracking and wiki systems, related to various software artifacts, such as requirements and code, and during change impact analysis
Linking Rare and Popular Tags in CQA Sites
Community Question Answer (CQA) sites are popular means for sharing knowledge in the form of questions and answers. These sites rely on tags for many purposes, such as content organi- zation, question routing, content searching, etc. Each CQA site has thousands of tags, making it challenging for the users to manually annotate their question posts with the appropriate tags. Understanding the semantic relationships between tags could aid in the tagging process, thereby properly routing the questions to the experts. Although it is relatively easier to mine the se- mantic relationships amongst the frequently used popular tags, it is difficult to do so for the less commonly used rare tags due to a lack of information about them. Most often, the rare tags are specific concepts subsumed by popular tags. For the questions to be routed to the right experts, they must be annotated with a proper mix of both popular and rare tags. In this paper, we pro- pose a novel approach to mine the semantic relationships between the rare and the popular tags. In addition, we show that the methods that are proposed to mine semantic relationships between popular tags cannot be used for rare tags. Specifically, we identify the top-k popular tags that are semantically related to a given rare tag, which is done using a set of semantic and topological features. Extensive evaluations on CQA datasets show the superiority of our proposed method over state-of-the-art methods
A survey on software defect prediction using deep learning
Defect prediction is one of the key challenges in software development and programming language research for improving software quality and reliability. The problem in this area is to properly identify the defective source code with high accuracy. Developing a fault prediction model is a challenging problem, and many approaches have been proposed throughout history. The recent breakthrough in machine learning technologies, especially the development of deep learning techniques, has led to many problems being solved by these methods. Our survey focuses on the deep learning techniques for defect prediction. We analyse the recent works on the topic, study the methods for automatic learning of the semantic and structural features from the code, discuss the open problems and present the recent trends in the field. © 2021 by the authors. Licensee MDPI, Basel, Switzerland
FedCSD: A Federated Learning Based Approach for Code-Smell Detection
This paper proposes a Federated Learning Code Smell Detection (FedCSD)
approach that allows organizations to collaboratively train federated ML models
while preserving their data privacy. These assertions have been supported by
three experiments that have significantly leveraged three manually validated
datasets aimed at detecting and examining different code smell scenarios. In
experiment 1, which was concerned with a centralized training experiment,
dataset two achieved the lowest accuracy (92.30%) with fewer smells, while
datasets one and three achieved the highest accuracy with a slight difference
(98.90% and 99.5%, respectively). This was followed by experiment 2, which was
concerned with cross-evaluation, where each ML model was trained using one
dataset, which was then evaluated over the other two datasets. Results from
this experiment show a significant drop in the model's accuracy (lowest
accuracy: 63.80\%) where fewer smells exist in the training dataset, which has
a noticeable reflection (technical debt) on the model's performance. Finally,
the last and third experiments evaluate our approach by splitting the dataset
into 10 companies. The ML model was trained on the company's site, then all
model-updated weights were transferred to the server. Ultimately, an accuracy
of 98.34% was achieved by the global model that has been trained using 10
companies for 100 training rounds. The results reveal a slight difference in
the global model's accuracy compared to the highest accuracy of the centralized
model, which can be ignored in favour of the global model's comprehensive
knowledge, lower training cost, preservation of data privacy, and avoidance of
the technical debt problem.Comment: 17 pages, 7 figures, Journal pape
Identifying developers’ habits and expectations in copy and paste programming practice
Máster Universitario en Investigación e Innovación en
Inteligencia Computacional y Sistemas InteractivosBoth novice and experienced developers rely more and more in external
sources of code to include into their programs by copy and paste code snippets. This
behavior differs from the traditional software design approach where cohesion was
achieved via a conscious design effort. Due to this fact, it is essential to know how copy
and paste programming practices are actually carried out, so that IDEs (Integrated
Development Environments) and code recommenders can be designed to fit with
developer expectations and habit
DEVELOPMENT OF PROBLEM-SPECIFIC MODELING LANGUAGE TO SUPPORT SOFTWARE VARIABILITY IN "SMART HOME" SYSTEMS
Building conceptual models for software design, in particular for high-tech applications such as smart home systems, is a complex task that significantly affects the efficiency of their development processes. One of the innovative methods of solving this problem is the use of domain-specific modeling languages (DSMLs), which can reduce the time and other project resources required to create such systems. The subject of research in this paper is approaches to the development of DSML for Smart Home systems as a separate class of Internet of Things systems. The purpose of this work is to propose an approach to the development of DSMLs based on a model of variability of the properties of such a system. The following tasks are being solved: analysis of some existing approaches to the creation of DSMLs; construction of a multifaceted classification of requirements for them, application of these requirements to the design of the syntax of a specific DSML-V for the creation of variable software in smart home systems; development of a technological scheme and quantitative metrics for experimental evaluation of the effectiveness of the proposed approach. The following methods are used: variability modeling based on the property model, formal notations for describing the syntax of the DSML-V language, and the use of the open CASE tool metaDepth. Results: a multifaceted classification of requirements for a broad class of DSML languages is built; the basic syntactic constructions of the DSML-V language are developed to support the properties of software variability of "Smart Home" systems; a formal description of such syntax in the Backus-Naur notation is given; a technological scheme for compiling DSML-V specifications into the syntax of the language of the open CASE tool metaDepth is created; the effectiveness of the proposed approach using quantitative metrics is experimentally investigated. Conclusions: the proposed method of developing a specialized problem-oriented language for smart home systems allows for multilevel modeling of the variability properties of its software components and provides an increase in the efficiency of programming such models by about 14% compared to existing approaches