Search CORE

67 research outputs found

Streamlining code smells: Using collective intelligence and visualization

Author: Reis José Vicente Pereira dos
Publication venue
Publication date: 09/09/2022
Field of study

Context. Code smells are seen as major source of technical debt and, as such, should be detected and removed. Code smells have long been catalogued with corresponding mitigating solutions called refactoring operations. However, while the latter are supported in current IDEs (e.g., Eclipse), code smells detection scaffolding has still many limitations. Researchers argue that the subjectiveness of the code smells detection process is a major hindrance to mitigate the problem of smells-infected code. Objective. This thesis presents a new approach to code smells detection that we have called CrowdSmelling and the results of a validation experiment for this approach. The latter is based on supervised machine learning techniques, where the wisdom of the crowd (of software developers) is used to collectively calibrate code smells detection algorithms, thereby lessening the subjectivity issue. Method. In the context of three consecutive years of a Software Engineering course, a total “crowd” of around a hundred teams, with an average of three members each, classified the presence of 3 code smells (Long Method, God Class, and Feature Envy) in Java source code. These classifications were the basis of the oracles used for training six machine learning algorithms. Over one hundred models were generated and evaluated to determine which machine learning algorithms had the best performance in detecting each of the aforementioned code smells. Results. Good performances were obtained for God Class detection (ROC=0.896 for Naive Bayes) and Long Method detection (ROC=0.870 for AdaBoostM1), but much lower for Feature Envy (ROC=0.570 for Random Forrest). Conclusions. Obtained results suggest that Crowdsmelling is a feasible approach for the detection of code smells, but further validation experiments are required to cover more code smells and to increase external validityContexto. Os cheiros de código são a principal causa de dívida técnica (technical debt), como tal, devem ser detectados e removidos. Os cheiros de código já foram há muito tempo catalogados juntamente com as correspondentes soluções mitigadoras chamadas operações de refabricação (refactoring). No entanto, embora estas últimas sejam suportadas nas IDEs actuais (por exemplo, Eclipse), a deteção de cheiros de código têm ainda muitas limitações. Os investigadores argumentam que a subjectividade do processo de deteção de cheiros de código é um dos principais obstáculo à mitigação do problema da qualidade do código. Objectivo. Esta tese apresenta uma nova abordagem à detecção de cheiros de código, a que chamámos CrowdSmelling, e os resultados de uma experiência de validação para esta abordagem. A nossa abordagem de CrowdSmelling baseia-se em técnicas de aprendizagem automática supervisionada, onde a sabedoria da multidão (dos programadores de software) é utilizada para calibrar colectivamente algoritmos de detecção de cheiros de código, diminuindo assim a questão da subjectividade. Método. Em três anos consecutivos, no âmbito da Unidade Curricular de Engenharia de Software, uma "multidão", num total de cerca de uma centena de equipas, com uma média de três membros cada, classificou a presença de 3 cheiros de código (Long Method, God Class, and Feature Envy) em código fonte Java. Estas classificações foram a base dos oráculos utilizados para o treino de seis algoritmos de aprendizagem automática. Mais de cem modelos foram gerados e avaliados para determinar quais os algoritmos de aprendizagem de máquinas com melhor desempenho na detecção de cada um dos cheiros de código acima mencionados. Resultados. Foram obtidos bons desempenhos na detecção do God Class (ROC=0,896 para Naive Bayes) e na detecção do Long Method (ROC=0,870 para AdaBoostM1), mas muito mais baixos para Feature Envy (ROC=0,570 para Random Forrest). Conclusões. Os resultados obtidos sugerem que o Crowdsmelling é uma abordagem viável para a detecção de cheiros de código, mas são necessárias mais experiências de validação para cobrir mais cheiros de código e para aumentar a validade externa

Repositório Institucional do ISCTE-IUL

Are Multi-language Design Smells Fault-prone? An Empirical Study

Author: Abidi Mouna
Khomh Foutse
Openja Moses
Rahman Md Saidur
Publication venue
Publication date: 02/11/2020
Field of study

Nowadays, modern applications are developed using components written in different programming languages. These systems introduce several advantages. However, as the number of languages increases, so does the challenges related to the development and maintenance of these systems. In such situations, developers may introduce design smells (i.e., anti-patterns and code smells) which are symptoms of poor design and implementation choices. Design smells are defined as poor design and coding choices that can negatively impact the quality of a software program despite satisfying functional requirements. Studies on mono-language systems suggest that the presence of design smells affects code comprehension, thus making systems harder to maintain. However, these studies target only mono-language systems and do not consider the interaction between different programming languages. In this paper, we present an approach to detect multi-language design smells in the context of JNI systems. We then investigate the prevalence of those design smells. Specifically, we detect 15 design smells in 98 releases of nine open-source JNI projects. Our results show that the design smells are prevalent in the selected projects and persist throughout the releases of the systems. We observe that in the analyzed systems, 33.95% of the files involving communications between Java and C/C++ contains occurrences of multi-language design smells. Some kinds of smells are more prevalent than others, e.g., Unused Parameters, Too Much Scattering, Unused Method Declaration. Our results suggest that files with multi-language design smells can often be more associated with bugs than files without these smells, and that specific smells are more correlated to fault-proneness than others

arXiv.org e-Print Archive

PolyPublie

On the Fault-Proneness of Javascript Code Smells

Author: Saboury Amir
Publication venue
Publication date: 01/12/2016
Field of study

RÉSUMÉ JavaScript est un langage de script qui a gagné beaucoup en popularité cette dernière décennie. Initialement utilisé exclusivement pour le développementWeb côté client, il a évolué pour devenir l’un des langages de programmation les plus populaires. Les développeurs l’utilisent aujourd’hui aussi bien pour le développement Web côté client que côté serveur. Comme pour les applications écrites dans d’autres langages de programmation, le code JavaScript peut contenir des mauvaises odeurs, qui sont des mauvais choix de conception ou d’implémentation pouvant affecter négativement la maintenabilité et la qualité des applications. Dans ce mémoire, nous étudions les mauvaises odeurs de code dans les applications serveurs JavaScript dans le but de comprendre l’impact des mauvaises odeurs de code sur la fiabilité des applications JavaScript. Grâce à des modèles d’analyse de survie, nous examinons le risque d’occurrence de fautes dans les fichiers contenant des mauvaises odeurs de code et les fichiers ne contenant pas de mauvaise odeur de code. Au total, nous avons analysé 12 mauvaises odeurs de code contenues dans 537 versions de cinq bibliothèques JavaScript parmi les plus populaires : express, grunt, bower, less.js et request. Les résultats obtenus montrent qu’en moyenne, le risque d’occurrence de fautes dans les fichiers sans mauvaise odeur de code est 65% inférieur à celui des fichiers contenants des mauvaises odeurs de code. Parmi les mauvaises odeurs étudiées “Variable Reassign” et “Assignment in conditional statements” sont celles qui présentent le plus grand risque d’occurrence de fautes. Afin de comprendre la perception des développeurs vis-à-vis des 12 mauvaises odeurs de code étudiés, nous avons effectué un sondage auprès de 1484 développeurs JavaScript. Les résultats montrent que les développeurs considèrent les mauvaises odeurs de code “Nested Callbacks,” “Variable Re-assign” et “Long Parameter List” comme étant de sérieux problèmes de conception qui entravent la maintenabilité et la fiabilité des applications JavaScript. Une évaluation qui corrobore les résultats de notre analyse quantitative. Globalement, les mauvaises odeurs de code augmentent le risque d’occurrence de fautes dans les applications JavaScript. Nous recommandons aux développeurs de les corriger tôt avant la mise en marché de leurs applications. ----------ABSTRACT JavaScript is a scripting programming language that has gained a lot of popularity this past decade. Initially used exclusively for client-side web development, it has evolved to become one of the most popular programming languages, with developers now using it for both client-side and server-side application development. Similar to pplications written in other programming languages, JavaScript applications contain code smells, which are poor design choices and implementation that can negatively impact maintainability and quality. In this thesis, we investigate code smells in JavaScript applications with the aim to understand how they affect the fault-proneness of software. We detect 12 types of code smells in 537 releases of five popular JavaScript libraries (i.e., express, grunt, bower, less.js, and request) and perform a survival analysis, comparing the time until the occurrence a fault, in files containing code smells and files without code smells. Results show that (1) on average, files without code smells have hazard rates 65% lower than files with code smells, and (2) Among the studied smells, “Variable Re-assign” and “Assignment In Conditional statements” code smells have the highest hazard rates. Additionally, we conduct a survey with 1,484 JavaScript developers to understand the perception of developers towards our studied code smells. We found that developers consider “Nested Callbacks,” “Variable Re-assign,” and “Long Parameter List” code smells to be serious design problems that hinder the intainability and reliability of applications ; assessment in line with the findings of our quantitative analysis. Overall, code smells affect negatively the fault-proneness of JavaScript applications. Therefore, developers should consider tracking and removing them early on before the release of software to the public

PolyPublie

Code smells detection and visualization: A systematic literature review

Author: Abreu Fernando Brito e
Anslow Craig
Carneiro Glauco de Figueiredo
Reis José Pereira dos
Publication venue
Publication date: 16/12/2020
Field of study

Context: Code smells (CS) tend to compromise software quality and also demand more effort by developers to maintain and evolve the application throughout its life-cycle. They have long been catalogued with corresponding mitigating solutions called refactoring operations. Objective: This SLR has a twofold goal: the first is to identify the main code smells detection techniques and tools discussed in the literature, and the second is to analyze to which extent visual techniques have been applied to support the former. Method: Over 83 primary studies indexed in major scientific repositories were identified by our search string in this SLR. Then, following existing best practices for secondary studies, we applied inclusion/exclusion criteria to select the most relevant works, extract their features and classify them. Results: We found that the most commonly used approaches to code smells detection are search-based (30.1%), and metric-based (24.1%). Most of the studies (83.1%) use open-source software, with the Java language occupying the first position (77.1%). In terms of code smells, God Class (51.8%), Feature Envy (33.7%), and Long Method (26.5%) are the most covered ones. Machine learning techniques are used in 35% of the studies. Around 80% of the studies only detect code smells, without providing visualization techniques. In visualization-based approaches several methods are used, such as: city metaphors, 3D visualization techniques. Conclusions: We confirm that the detection of CS is a non trivial task, and there is still a lot of work to be done in terms of: reducing the subjectivity associated with the definition and detection of CS; increasing the diversity of detected CS and of supported programming languages; constructing and sharing oracles and datasets to facilitate the replication of CS detection and visualization techniques validation experiments.Comment: submitted to ARC

arXiv.org e-Print Archive

Towards Usable API Documentation

Author: Khan Junaed Younus
Publication venue: University of Calgary
Publication date: 01/07/2023
Field of study

The learning and usage of an API is supported by documentation. Like source code, API documentation is itself a software product. Several research results show that bad design in API documentation can make the reuse of API features difficult. Indeed, similar to code smells, poorly designed API documentation can also exhibit 'smells'. Such documentation smells can be described as bad documentation styles that do not necessarily produce incorrect documentation but make the documentation difficult to understand and use. This thesis aims to enhance API documentation usability by addressing such documentation smells in three phases. In the first phase, we developed a catalog of five API documentation smells consulting literature on API documentation issues and online developer discussion. We validated their presence in the real world by creating a benchmark of 1K official Java API documentation units and conducting a survey of 21 developers. The developers confirmed that these smells hinder their productivity and called for automatic detection and fixing. In the second phase, we developed machine-learning models to detect the smells using the 1K benchmark, however, they performed poorly when evaluated on larger and more diverse documentation sources. We explored more advanced models; employed re-training and hyperparameter tuning to further improve the performance. Our best-performing model, RoBERTa, achieved F1-scores of 0.71-0.93 in detecting different smells. In the third phase, we first focused on evaluating the feasibility and impact of fixing various smells in the eyes of practitioners. Through a second survey of 30 practitioners, we found that fixing the lazy smell was perceived as the most feasible and impactful. However, there was no universal consensus on whether and how other smells can/should be fixed. Finally, we proposed a two-stage pipeline for fixing lazy documentation, involving additional textual description and documentation-specific code example generation. Our approach utilized a large language model, GPT- 3, to generate enhanced documentation based on non-lazy examples and to produce code examples. The generated code examples were refined iteratively until they were error-free. Our technique demonstrated a high success rate with a significant number of lazy documentation instances being fixed and error-free code examples being generated

PRISM: University of Calgary Digital Repository

Crowdsmelling: The use of collective knowledge in code smells detection

Author: Abreu Fernando Brito e
Carneiro Glauco de Figueiredo
Reis José Pereira dos
Publication venue
Publication date: 23/12/2020
Field of study

Code smells are seen as major source of technical debt and, as such, should be detected and removed. However, researchers argue that the subjectiveness of the code smells detection process is a major hindrance to mitigate the problem of smells-infected code. We proposed the crowdsmelling approach based on supervised machine learning techniques, where the wisdom of the crowd (of software developers) is used to collectively calibrate code smells detection algorithms, thereby lessening the subjectivity issue. This paper presents the results of a validation experiment for the crowdsmelling approach. In the context of three consecutive years of a Software Engineering course, a total "crowd" of around a hundred teams, with an average of three members each, classified the presence of 3 code smells (Long Method, God Class, and Feature Envy) in Java source code. These classifications were the basis of the oracles used for training six machine learning algorithms. Over one hundred models were generated and evaluated to determine which machine learning algorithms had the best performance in detecting each of the aforementioned code smells. Good performances were obtained for God Class detection (ROC=0.896 for Naive Bayes) and Long Method detection (ROC=0.870 for AdaBoostM1), but much lower for Feature Envy (ROC=0.570 for Random Forrest). Obtained results suggest that crowdsmelling is a feasible approach for the detection of code smells, but further validation experiments are required to cover more code smells and to increase external validity

arXiv.org e-Print Archive

Code smells detection and visualization: A systematic literature review

Author: Anslow C.
Brito e Abreu F.
Carneiro G.
Pereira dos Reis J.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2022
Field of study

Context: Code smells (CS) tend to compromise software quality and also demand more effort by developers to maintain and evolve the application throughout its life-cycle. They have long been cataloged with corresponding mitigating solutions called refactoring operations. Objective: This SLR has a twofold goal: the first is to identify the main code smells detection techniques and tools discussed in the literature, and the second is to analyze to which extent visual techniques have been applied to support the former. Method: Over 83 primary studies indexed in major scientific repositories were identified by our search string in this SLR. Then, following existing best practices for secondary studies, we applied inclusion/exclusion criteria to select the most relevant works, extract their features and classify them. Results: We found that the most commonly used approaches to code smells detection are search-based (30.1%), and metric-based (24.1%). Most of the studies (83.1%) use open-source software, with the Java language occupying the first position (77.1%). In terms of code smells, God Class (51.8%), Feature Envy (33.7%), and Long Method (26.5%) are the most covered ones. Machine learning techniques are used in 35% of the studies. Around 80% of the studies only detect code smells, without providing visualization techniques. In visualization-based approaches, several methods are used, such as city metaphors, 3D visualization techniques. Conclusions: We confirm that the detection of CS is a non-trivial task, and there is still a lot of work to be done in terms of: reducing the subjectivity associated with the definition and detection of CS; increasing the diversity of detected CS and of supported programming languages; constructing and sharing oracles and datasets to facilitate the replication of CS detection and visualization techniques validation experiments.info:eu-repo/semantics/acceptedVersio

Repositório Institucional do ISCTE-IUL

Evolution, survival and anomalies

Author: Abreu Fernando Brito e
Rio Américo
Publication venue: 'Elsevier BV'
Publication date: 01/01/2023
Field of study

Rio, A., & Abreu, F. B. E. (2023). PHP code smells in web apps: Evolution, survival and anomalies. Journal of Systems and Software, 200, 1-23. [111644]. https://doi.org/10.1016/j.jss.2023.111644Abstract Context: Code smells are symptoms of poor design, leading to future problems, such as reduced maintainability. Therefore, it becomes necessary to understand their evolution and how long they stay in code. This paper presents a longitudinal study on the evolution and survival of code smells (CS) for web apps built with PHP, the most widely used server-side programming language in web development and seldom studied. Objectives: We aimed to discover how CS evolve and what is their survival/lifespan in typical PHP web apps. Does CS survival depend on their scope or app life period? Are there sudden variations (anomalies) in the density of CS through the evolution of web apps? Method: We analyzed the evolution of 18 CS in 12 PHP web applications and compared it with changes in app and team size. We characterized the distribution of CS and used survival analysis techniques to study CS’ lifespan. We specialized the survival studies into localized (specific location) and scattered CS (spanning multiple classes/methods) categories. We further split the observations for each web app into two consecutive time frames. As for the CS evolution anomalies, we standardized their detection criteria. Results: The CS density trend along the evolution of PHP web apps is mostly stable, with variations, and correlates with the developer’s numbers. We identified the smells that survived the most. CS live an average of about 37% of the life of the applications, almost 4 years on average in our study; around 61% of CS introduced are removed. Most applications have different survival times for localized and scattered CS, and localized CS have a shorter life. The CS survival time is shorter and more CS are introduced and removed in the first half of the life of the applications. We found anomalies in the evolution of 5 apps and show how a graphical representation of sudden variations found in the evolution of CS unveils the story of a development project. Conclusion: CS stay a long time in code. The removal rate is low and did not change substantially in recent years. An effort should be made to avoid this bad behavior and change the CS density trend to decrease.publishersversionepub_ahead_of_prin

Repositório Institucional do ISCTE-IUL

Repositório da Universidade Nova de Lisboa