Search CORE

112 research outputs found

Supporting the Maintenance of Identifier Names: A Holistic Approach to High-Quality Automated Identifier Naming

Author: Peruma Anthony S
Publication venue: RIT Scholar Works
Publication date: 01/06/2022
Field of study

A considerable part of the source code is identifier names-- unique lexical tokens that provide information about entities, and entity interactions, within the code. Identifier names provide human-readable descriptions of classes, functions, variables, etc. Poor or ambiguous identifier names (i.e., names that do not correctly describe the code behavior they are associated with) will lead developers to spend more time working towards understanding the code\u27s behavior. Bad naming can also have detrimental effects on tools that rely on natural language clues; degrading the quality of their output and making them unreliable. Additionally, misinterpretations of the code, caused by poor names, can result in the injection of quality issues into the system under maintenance. Thus, improved identifier naming increases developer effectiveness, higher-quality software, and higher-quality software analysis tools. In this dissertation, I establish several novel concepts that help measure and improve the quality of identifiers. The output of this dissertation work is a set of identifier name appraisal and quality tools that integrate into the developer workflow. Through a sequence of empirical studies, I have formulated a series of heuristics and linguistic patterns to evaluate the quality of identifier names in the code and provide naming structure recommendations. I envision and working towards supporting developers in integrating my contributions, discussed in this dissertation, into their development workflow to significantly improve the process of crafting and maintaining high-quality identifier names in the source code

RIT Scholar Works

On the Generation, Structure, and Semantics of Grammar Patterns in Source Code Identifiers

Author: Alsuhaibani Reem S
Decker Michael J
Hill Emily
Kaushik Dishant
Mkaouer Mohamed Wiem
Newman , Christian D.
Peruma Anthony
Publication venue: RIT Scholar Works
Publication date: 15/07/2020
Field of study

Identifier names are the atoms of program comprehension. Weak identifier names decrease developer productivity and degrade the performance of automated approaches that leverage identifier names in source code analysis; threatening many of the advantages which stand to be gained from advances in artificial intelligence and machine learning. Therefore, it is vital to support developers in naming and renaming identifiers. In this paper, we extend our prior work, which studies the primary method through which names evolve: rename refactorings. In our prior work, we contextualize rename changes by examining commit messages and other refactorings. In this extension, we further consider data type changes which co-occur with these renames, with a goal of understanding how data type changes influence the structure and semantics of renames. In the long term, the outcomes of this study will be used to support research into: (1) recommending when a rename should be applied, (2) recommending how to rename an identifier, and (3) developing a model that describes how developers mentally synergize names using domain and project knowledge. We provide insights into how our data can support rename recommendation and analysis in the future, and reflect on the significant challenges, highlighted by our study, for future research in recommending renames

arXiv.org e-Print Archive

RIT Scholar Works

Toward an Effective Automated Tracing Process

Author: Mahmoud Anas Mohammad
Publication venue: Scholars Junction
Publication date: 28/04/2014
Field of study

Traceability is defined as the ability to establish, record, and maintain dependency relations among various software artifacts in a software system, in both a forwards and backwards direction, throughout the multiple phases of the project’s life cycle. The availability of traceability information has been proven vital to several software engineering activities such as program comprehension, impact analysis, feature location, software reuse, and verification and validation (V&V). The research on automated software traceability has noticeably advanced in the past few years. Various methodologies and tools have been proposed in the literature to provide automatic support for establishing and maintaining traceability information in software systems. This movement is motivated by the increasing attention traceability has been receiving as a critical element of any rigorous software development process. However, despite these major advances, traceability implementation and use is still not pervasive in industry. In particular, traceability tools are still far from achieving performance levels that are adequate for practical applications. Such low levels of accuracy require software engineers working with traceability tools to spend a considerable amount of their time verifying the generated traceability information, a process that is often described as tedious, exhaustive, and error-prone. Motivated by these observations, and building upon a growing body of work in this area, in this dissertation we explore several research directions related to enhancing the performance of automated tracing tools and techniques. In particular, our work addresses several issues related to the various aspects of the IR-based automated tracing process, including trace link retrieval, performance enhancement, and the role of the human in the process. Our main objective is to achieve performance levels, in terms of accuracy, efficiency, and usability, that are adequate for practical applications, and ultimately to accomplish a successful technology transfer from research to industry

Mississippi State University Libraries ETD database

Scholars Junction - Mississippi State University Institutional Repository

Automatic Detection and Classification of Identifier Renamings

Author: Mousavi Eshkevari Laleh
Publication venue
Publication date: 01/12/2015
Field of study

RÉSUMÉ Le lexique du code source joue un rôle primordial dans la maintenabilité des logiciels. Un lexique pauvre peut induire à une mauvaise compréhension du programme et à l'augmentation des erreurs du logiciel. Il est donc important que les développeurs maintiennent le lexique de leur code source en renommant les identifiants afin qu'ils reflètent les concepts qu'ils expriment. Dans cette thèse, nous étudions le lexique et proposons une approche pour détecter et classifier les renommages des identifiants dans le code source. La détection des renommages est basée sur la combinaison de deux techniques: la différenciation des codes sources et l'analyse de flux de données. Tandis que le classificateur de renommage utilise une base de données ontologique et un analyseur syntaxique du langage naturel pour classer les renommages selon la taxonomie que nous avons défini. Afin d'évaluer l'exactitude et l'exhaustivité du détecteur de renommage, nous avons réalisé une étude empirique sur l’historique de cinq programmes Java open-source. Les résultats de cette étude rapportent une précision de 88% et un rappel 92%. Nous avons également mené une étude exploratoire qui analyse et discute comment les identifiants sont renommés, selon la taxonomie proposée, dans les cinq programmes Java de l’étude précédente. Les résultats de cette étude exploratoire montrent qu’il existe des renommages dans chaque dimension de notre taxonomie. Afin d’appliquer l’approche proposée aux programmes PHP, nous avons adapte notre détecteur de renommages pour prendre en compte les caractéristiques inhérentes à ces programmes. Une étude préliminaire effectuée sur trois programmes PHP montre que notre approche est applicable aux programmes PHP. Cependant, ces programmes ont des tendances de renommages différentes de celles observées dans les programmes Java. Cette thèse propose deux résultats. Tout d'abord, la détection et la classification des renommages et un outil, qui peut être utilisé pour documenter les renommages. Les développeurs seront en mesure de, par exemple, rechercher des méthodes qui font partie de l’interface de programmation car celles-ci impactent les applications clientes. Ils pourront également identifier les incohérences entre le nom et la fonctionnalité d'une entité en cas de renommage dit risqué comme lors d’un renommage vers un antonyme. Deuxièmement, les résultats de nos études nous fournissent des leçons qui constituent une base de connaissance et de conseils pouvant aider les développeurs à éviter des renommages inappropriés ou inutiles et ainsi maintenir la cohérence du lexique de leur code source.----------ABSTRACT Source code lexicon plays a paramount role in software maintainability: a poor lexicon can lead to poor comprehensibility and increase software fault-proneness. For this reason, developers should maintain their source code lexicon by renaming identifiers when they do not reflect the concepts that they should express. In this thesis, we study lexicon and propose an approach to detect and classify identifier renamings in source code. The renaming detection is based on a combination of source code differencing and data flow analysis, while the renaming classifier uses an ontological database and a natural language parser to classify renamings according to a taxonomy we define. We report a study—conducted on the evolution history of five open-source Java programs—aimed at evaluating the accuracy and completeness of the renaming detector. The study reports a precision of 88% and a recall of 92%. In addition, we report an exploratory study investigating and discussing how identifiers are renamed in the five Java programs, according to our taxonomy. Moreover, we report the challenges and applicability of the proposed approach to PHP programs and report our preliminary results of renaming detection and classification for three programs. This thesis provides two outcomes. First, the renaming detection and classification approach and tool, which can be used for documenting renamings. Developers will be able to, for example, look up methods that are part of the public API (as they impact client applications), or look for inconsistencies between the name and the implementation of an entity that underwent a high risk renaming (e.g., towards the opposite meaning). Second, pieces of actionable knowledge, based on our qualitative study of renamings, that provide advice on how to avoid some unnecessary renamings

PolyPublie

Towards Improving the Code Lexicon and its Consistency

Author: Arnaoudova Venera
Publication venue
Publication date: 01/08/2014
Field of study

RÉSUMÉ La compréhension des programmes est une activité clé au cours du développement et de la maintenance des logiciels. Bien que ce soit une activité fréquente—même plus fré- quente que l’écriture de code—la compréhension des programmes est une activité difficile et la difficulté augmente avec la taille et la complexité des programmes. Le plus souvent, les mesures structurelles—telles que la taille et la complexité—sont utilisées pour identifier ces programmes complexes et sujets aux bogues. Cependant, nous savons que l’information linguistique contenue dans les identifiants et les commentaires—c’est-à-dire le lexique du code source—font partie des facteurs qui influent la complexité psychologique des programmes, c’est-à-dire les facteurs qui rendent les programmes difficiles à comprendre et à maintenir par des humains. Dans cette thèse, nous apportons la preuve que les mesures évaluant la qualité du lexique du code source sont un atout pour l’explication et la prédiction des bogues. En outre, la qualité des identifiants et des commentaires peut ne pas être suffisante pour révéler les bogues si on les considère en isolation—dans sa théorie sur la compréhension de programmes par exemple, Brooks avertit qu’il peut arriver que les commentaires et le code soient en contradiction. C’est pourquoi nous adressons le problème de la contradiction et, plus généralement, d’incompatibilité du lexique en définissant un catalogue d’Antipatrons Linguistiques (LAs), que nous définissons comme des mauvaises pratiques dans le choix des identifiants résultant en incohérences entre le nom, l’implémentation et la documentation d’une entité de programmation. Nous évaluons empiriquement les LAs par des développeurs de code propriétaire et libre et montrons que la majorité des développeurs les perçoivent comme mauvaises pratiques et par conséquent elles doivent être évitées. Nous distillons aussi un sous-ensemble de LAs canoniques que les développeurs perçoivent particulièrement inacceptables ou pour lesquelles ils ont entrepris des actions. En effet, nous avons découvert que 10% des exemples contenant les LAs ont été supprimés par les développeurs après que nous les leur ayons présentés. Les explications des développeurs et la forte proportion de LAs qui n’ont pas encore été résolus suggèrent qu’il peut y avoir d’autres facteurs qui influent sur la décision d’éliminer les LAs, qui est d’ailleurs souvent fait par le moyen de renommage. Ainsi, nous menons une enquête auprès des développeurs et montrons que plusieurs facteurs peuvent empêcher les développeurs de renommer. Ces résultats suggèrent qu’il serait plus avantageux de souligner les LAs et autres mauvaises pratiques lexicales quand les développeurs écrivent du code source—par exemple en utilisant notre plugin LAPD Checkstyle détectant des LAs—de sorte que l’amélioration puisse se faire sur la volée et sans impacter le reste du code.----------ABSTRACT Program comprehension is a key activity during software development and maintenance. Although frequently performed—even more often than actually writing code—program comprehension is a challenging activity. The difficulty to understand a program increases with its size and complexity and as a result the comprehension of complex programs, in the best- case scenario, more time consuming when compared to simple ones but it can also lead to introducing faults in the program. Hence, structural properties such as size and complexity are often used to identify complex and fault prone programs. However, from early theories studying developers’ behavior while understanding a program, we know that the textual in- formation contained in identifiers and comments—i.e., the source code lexicon—is part of the factors that affect the psychological complexity of a program, i.e., factors that make a program difficult to understand and maintain by humans. In this dissertation we provide evidence that metrics evaluating the quality of source code lexicon are an asset for software fault explanation and prediction. Moreover, the quality of identifiers and comments considered in isolation may not be sufficient to reveal flaws—in his theory about the program understanding process for example, Brooks warns that it may happen that comments and code are contradictory. Consequently, we address the problem of contradictory, and more generally of inconsistent, lexicon by defining a catalog of Linguistic Antipatterns (LAs), i.e., poor practices in the choice of identifiers resulting in inconsistencies among the name, implementation, and documentation of a programming entity. Then, we empirically evaluate the relevance of LAs—i.e., how important they are—to industrial and open-source developers. Overall, results indicate that the majority of the developers perceives LAs as poor practices and therefore must be avoided. We also distill a subset of canonical LAs that developers found particularly unacceptable or for which they undertook an action. In fact, we discovered that 10% of the examples containing LAs were removed by developers after we pointed them out. Developers’ explanations and the large proportion of yet unresolved LAs suggest that there may be other factors that impact the decision of removing LAs, which is often done through renaming. We conduct a survey with developers and show that renaming is not a straightforward activity and that there are several factors preventing developers from renaming. These results suggest that it would be more beneficial to highlight LAs and other lexicon bad smells as developers write source code—e.g., using our LAPD Checkstyle plugin detecting LAs—so that the improvement can be done on-the-fly without impacting other program entities

PolyPublie

Model refactoring by example: A multi‐objective search based software engineering approach

Author: Bechikh
Biermann
Deb
Deligiannis
Douglas
Du Bois
El-Boussaidi
El-Boussaidi
Fowler
Ghannem
Ghannem
Ghannem
Ghannem
Harman
Howe
Kalboussi
Kessentini
Kessentini
Kessentini
Kessentini
Kessentini
Li
Mansoor
Mariani
Marinescu
Mens
Mens
Mens
Miryung
Moha
Mäntylä
O' Cinnéide
O' Keeffe
O' Keeffe
Ouni
Qayum
Rachmawati
Ragnhild
Seng
Wilcoxon
Zhang
Zhang
Publication venue: 'Wiley'
Publication date: 01/04/2018
Field of study

Declarative rules are frequently used in model refactoring in order to detect refactoring opportunities and to apply the appropriate ones. However, a large number of rules is required to obtain a complete specification of refactoring opportunities. Companies usually have accumulated examples of refactorings from past maintenance experiences. Based on these observations, we consider the model refactoring problem as a multi objective problem by suggesting refactoring sequences that aim to maximize both structural and textual similarity between a given model (the model to be refactored) and a set of poorly designed models in the base of examples (models that have undergone some refactorings) and minimize the structural similarity between a given model and a set of well‐designed models in the base of examples (models that do not need any refactoring). To this end, we use the Non‐dominated Sorting Genetic Algorithm (NSGA‐II) to find a set of representative Pareto optimal solutions that present the best trade‐off between structural and textual similarities of models. The validation results, based on 8 real world models taken from open‐source projects, confirm the effectiveness of our approach, yielding refactoring recommendations with an average correctness of over 80%. In addition, our approach outperforms 5 of the state‐of‐the‐art refactoring approaches.Peer Reviewedhttps://deepblue.lib.umich.edu/bitstream/2027.42/143783/1/smr1916.pdfhttps://deepblue.lib.umich.edu/bitstream/2027.42/143783/2/smr1916_am.pd

Crossref

Deep Blue Documents at the University of Michigan

User Review Analysis for Requirement Elicitation: Thesis and the framework prototype's source code

Author: Wang Juan
Publication venue: Oxford Brookes University
Publication date: 01/01/2020
Field of study

Online reviews are an important channel for requirement elicitation. However, requirement engineers face challenges when analysing online user reviews, such as data volumes, technical supports, existing techniques, and legal barriers. Juan Wang proposes a framework solving user review analysis problems for the purpose of requirement elicitation that sets up a channel from downloading user reviews to structured analysis data. The main contributions of her work are: (1) the thesis proposed a framework to solve the user review analysis problem for requirement elicitation; (2) the prototype of this framework proves its feasibility; (3) the experiments prove the effectiveness and efficiency of this framework. This resource here is the latest version of Juan Wang's PhD thesis "User Review Analysis for Requirement Elicitation" and all the source code of the prototype for the framework as the results of her thesis

Oxford Brookes University: RADAR