9 research outputs found

    State of Refactoring Adoption: Towards Better Understanding Developer Perception of Refactoring

    Get PDF
    Context: Refactoring is the art of improving the structural design of a software system without altering its external behavior. Today, refactoring has become a well-established and disciplined software engineering practice that has attracted a significant amount of research presuming that refactoring is primarily motivated by the need to improve system structures. However, recent studies have shown that developers may incorporate refactoring strategies in other development-related activities that go beyond improving the design especially with the emerging challenges in contemporary software engineering. Unfortunately, these studies are limited to developer interviews and a reduced set of projects. Objective: We aim at exploring how developers document their refactoring activities during the software life cycle. We call such activity Self-Affirmed Refactoring (SAR), which is an indication of the developer-related refactoring events in the commit messages. After that, we propose an approach to identify whether a commit describes developer-related refactoring events, to classify them according to the refactoring common quality improvement categories. To complement this goal, we aim to reveal insights into how reviewers develop a decision about accepting or rejecting a submitted refactoring request, what makes such review challenging, and how to the efficiency of refactoring code review. Method: Our empirically driven study follows a mixture of qualitative and quantitative methods. We text mine refactoring-related documentation, then we develop a refactoring taxonomy, and automatically classify a large set of commits containing refactoring activities, and identify, among the various quality models presented in the literature, the ones that are more in-line with the developer\u27s vision of quality optimization, when they explicitly mention that they are refactoring to improve them to obtain an enhanced understanding of the motivation behind refactoring. After that, we performed an industrial case study with professional developers at Xerox to study the motivations, documentation practices, challenges, verification, and implications of refactoring activities during code review. Result: We introduced SAR taxonomy on how developers document their refactoring strategies in commit messages and proposed a SAR model to automate the detection of refactoring. Our survey with code reviewers has revealed several difficulties related to understanding the refactoring intent and implications on the functional and non-functional aspects of the software. Conclusion: Our SAR taxonomy and model, can work in conjunction with refactoring detectors, to report any early inconsistency between refactoring types and their documentation and can serve as a solid background for various empirical investigations. In light of our findings of the industrial case study, we recommended a procedure to properly document refactoring activities, as part of our survey feedback

    Refactoring for Reuse: An Empirical Study

    Get PDF
    Refactoring is the de-facto practice to optimize software health. While several studies propose refactoring strategies to optimize software design through applying design patterns and removing design defects, little is known about how developers actually refactor their code to improve its reuse. Therefore, we extract, from 1,828 open source projects, a set of refactorings that were intended to improve the software reusability. We analyze the impact of reusability refactorings on the state-of-the-art reusability metrics, and we compare the distribution of reusability refactoring types, with the distribution of the remaining mainstream refactorings. Overall, we found that the distribution of refactoring types, applied in the context of reusability, is different from the distribution of refactoring types in mainstream development. In the refactorings performed to improve reusability, source files are subject to more design level types of refactorings. Reusability refactorings significantly impact, high-level code elements, such as packages, classes, and methods, while typical refactorings, impact all code elements, including identifiers, and parameters. These findings provide practical insights into the current practice of refactoring in the context of code reuse involving the act of refactoring

    Selenite: Scaffolding Online Sensemaking with Comprehensive Overviews Elicited from Large Language Models

    Full text link
    Sensemaking in unfamiliar domains can be challenging, demanding considerable user effort to compare different options with respect to various criteria. Prior research and our formative study found that people would benefit from reading an overview of an information space upfront, including the criteria others previously found useful. However, existing sensemaking tools struggle with the "cold-start" problem -- it not only requires significant input from previous users to generate and share these overviews, but such overviews may also turn out to be biased and incomplete. In this work, we introduce a novel system, Selenite, which leverages Large Language Models (LLMs) as reasoning machines and knowledge retrievers to automatically produce a comprehensive overview of options and criteria to jumpstart users' sensemaking processes. Subsequently, Selenite also adapts as people use it, helping users find, read, and navigate unfamiliar information in a systematic yet personalized manner. Through three studies, we found that Selenite produced accurate and high-quality overviews reliably, significantly accelerated users' information processing, and effectively improved their overall comprehension and sensemaking experience.Comment: Accepted to CHI 202

    Holistic recommender systems for software engineering

    Get PDF
    The knowledge possessed by developers is often not sufficient to overcome a programming problem. Short of talking to teammates, when available, developers often gather additional knowledge from development artifacts (e.g., project documentation), as well as online resources. The web has become an essential component in the modern developer’s daily life, providing a plethora of information from sources like forums, tutorials, Q&A websites, API documentation, and even video tutorials. Recommender Systems for Software Engineering (RSSE) provide developers with assistance to navigate the information space, automatically suggest useful items, and reduce the time required to locate the needed information. Current RSSEs consider development artifacts as containers of homogeneous information in form of pure text. However, text is a means to represent heterogeneous information provided by, for example, natural language, source code, interchange formats (e.g., XML, JSON), and stack traces. Interpreting the information from a pure textual point of view misses the intrinsic heterogeneity of the artifacts, thus leading to a reductionist approach. We propose the concept of Holistic Recommender Systems for Software Engineering (H-RSSE), i.e., RSSEs that go beyond the textual interpretation of the information contained in development artifacts. Our thesis is that modeling and aggregating information in a holistic fashion enables novel and advanced analyses of development artifacts. To validate our thesis we developed a framework to extract, model and analyze information contained in development artifacts in a reusable meta- information model. We show how RSSEs benefit from a meta-information model, since it enables customized and novel analyses built on top of our framework. The information can be thus reinterpreted from an holistic point of view, preserving its multi-dimensionality, and opening the path towards the concept of holistic recommender systems for software engineering

    Thinking FORTH: a language and philosophy for solving problems

    Get PDF
    XIV, 313 p. ; 24 cmLibro ElectrónicoThinking Forth is a book about the philosophy of problem solving and programming style, applied to the unique programming language Forth. Published first in 1984, it could be among the timeless classics of computer books, such as Fred Brooks' The Mythical Man-Month and Donald Knuth's The Art of Computer Programming. Many software engineering principles discussed here have been rediscovered in eXtreme Programming, including (re)factoring, modularity, bottom-up and incremental design. Here you'll find all of those and more - such as the value of analysis and design - described in Leo Brodie's down-to-earth, humorous style, with illustrations, code examples, practical real life applications, illustrative cartoons, and interviews with Forth's inventor, Charles H. Moore as well as other Forth thinkers. If you program in Forth, this is a must-read book. If you don't, the fundamental concepts are universal: Thinking Forth is meant for anyone interested in writing software to solve problems. The concepts go beyond Forth, but the simple beauty of Forth throws those concepts into stark relief. So flip open the book, and read all about the philosophy of Forth, analysis, decomposition, problem solving, style and conventions, factoring, handling data, and minimizing control structures. But be prepared: you may not be able to put it down. This book has been scanned, OCR'd, typeset in LaTeX, and brought back to print (and your monitor) by a collaborative effort under a Creative Commons license. http://thinking-forth.sourceforge.net/The Philosophy of Forth An Armchair History of Software Elegance; The Superficiality of Structure; Looking Back, and Forth; Component Programming; Hide From Whom?; Hiding the Construction of Data Structures; But Is It a High-Level Language?; The Language of Design; The Language of Performance; Summary; References Analysis The Nine Phases of the Programming Cycle; The Iterative Approach; The Value of Planning; The Limitations of Planning; The Analysis Phase; Defining the Interfaces; Defining the Rules; Defining the Data Structures; Achieving Simplicity; Budgeting and Scheduling; Reviewing the Conceptual Model; References Preliminary Design/Decomposition Decomposition by Component; Example: A Tiny Editor; Maintaining a Component-based Application; Designing and Maintaining a Traditional Application; The Interface Component; Decomposition by Sequential Complexity; The Limits of Level Thinking; Summary; For Further Thinking; Detailed Design/Problem Solving Problem-Solving Techniques; Interview with a Software Inventor; Detailed Design; Forth Syntax; Algorithms and Data Structures; Calculations vs. Data Structures vs. Logic; Solving a Problem: Computing Roman Numerals; Summary; References; For Further Thinking Implementation: Elements of Forth Style Listing Organization; Screen Layout; Comment Conventions; Vertical Format vs. Horizontal Format; Choosing Names: The Art; Naming Standards: The Science; More Tips for Readability; Summary; References Factoring Factoring Techniques; Factoring Criteria; Compile-Time Factoring; The Iterative Approach in Implementation; References Handling Data: Stacks and States The Stylish Stack; The Stylish Return Stack; The Problem With Variables; Local and Global Variables/Initialization; Saving and Restoring a State; Application Stacks; Sharing Components; The State Table; Vectored Execution; Using DOER/MAKE; Summary; References Minimizing Control Structures What’s So Bad about Control Structures?; How to Eliminate Control Structures; A Note on Tricks; Summary; References; For Further Thinking Forth’s Effect on Thinking Appendix A Overview of Forth (For Newcomers); Appendix B Defining DOER/MAKE; Appendix C Other Utilities Described in This Book; Appendix D Answers to “Further Thinking” Problems; Appendix E Summary of Style Conventions; Inde

    Supporting complex workflows for data-intensive discovery reliably and efficiently

    Get PDF
    Scientific workflows have emerged as well-established pillars of large-scale computational science and appeared as torchbearers to formalize and structure a massive amount of complex heterogeneous data and accelerate scientific progress. Scientists of diverse domains can analyze their data by constructing scientific workflows as a useful paradigm to manage complex scientific computations. A workflow can analyze terabyte-scale datasets, contain numerous individual tasks, and coordinate between heterogeneous tasks with the help of scientific workflow management systems (SWfMSs). However, even for expert users, workflow creation is a complex task due to the dramatic growth of tools and data heterogeneity. Scientists are now more willing to publicly share scientific datasets and analysis pipelines in the interest of open science. As sharing of research data and resources increases in scientific communities, scientists can reuse existing workflows shared in several workflow repositories. Unfortunately, several challenges can prevent scientists from reusing those workflows, which hurts the purpose of the community-oriented knowledge base. In this thesis, we first identify the repositories that scientists use to share and reuse scientific workflows. Among several repositories, we find Galaxy repositories have numerous workflows, and Galaxy is the mostly used SWfMS. After selecting the Galaxy repositories, we attempt to explore the workflows and encounter several challenges in reusing them. We classify the reusability status (reusable/nonreusable). Based on the effort level, we further categorize the reusable workflows (reusable without modification, easily reusable, moderately difficult to reuse, and difficult to reuse). Upon failure, we record the associated challenges that prevent reusability. We also list the actions upon success. The challenges preventing reusability include tool upgrading, tool support unavailability, design flaws, incomplete workflows, failure to load a workflow, etc. We need to perform several actions to overcome the challenges. The actions include identifying proper input datasets, updating/upgrading tools, finding alternative tools support for obsolete tools, debugging to find the issue creating tools and connections and solving them, modifying tools connections, etc. Such challenges and our action list offer guidelines to future workflow composers to create better workflows with enhanced reusability. A SWfMS stores provenance data at different phases of a workflow life cycle, which can help workflow construction. This provenance data allows reproducibility and knowledge reuse in the scientific community. But, this provenance information is usually many times larger than the workflow and input data, and managing provenance data is growing in complexity with large-scale applications. In our second study, we document the challenges of provenance management and reuse in e-science, focusing primarily on scientific workflow approaches by exploring different SWfMSs and provenance management systems. We also investigate the ways to overcome the challenges. Creating a workflow is difficult but essential for data-intensive complex analysis, and the existing workflows have several challenges to be reused, so in our third study, we build a recommendation system to recommend tool(s) using machine learning approaches to help scientists create optimal, error-free, and efficient workflows by using existing reusable workflows in Galaxy workflow repositories. The findings from our studies and proposed techniques have the potential to simplify the data-intensive analysis, ensuring reliability and efficiency

    Data Science Techniques for Modelling Execution Tracing Quality

    Get PDF
    This research presents how to handle a research problem when the research variables are still unknown, and no quantitative study is possible; how to identify the research variables, to be able to perform a quantitative research, how to collect data by means of the research variables identified, and how to carry out modelling with the considerations of the specificities of the problem domain. In addition, validation is also encompassed in the scope of modelling in the current study. Thus, the work presented in this thesis comprises the typical stages a complex data science problem requires, including qualitative and quantitative research, data collection, modelling of vagueness and uncertainty, and the leverage of artificial intelligence to gain such insights, which are impossible with traditional methods. The problem domain of the research conducted encompasses software product quality modelling, and assessment, with particular focus on execution tracing quality. The terms execution tracing quality and logging are used interchangeably throughout the thesis. The research methods and mathematical tools used allow considering uncertainty and vagueness inherently associated with the quality measurement and assessment process through which reality can be approximated more appropriately in comparison to plain statistical modelling techniques. Furthermore, the modelling approach offers direct insights into the problem domain by the application of linguistic rules, which is an additional advantage. The thesis reports (1) an in-depth investigation of all the identified software product quality models, (2) a unified summary of the identified software product quality models with their terminologies and concepts, (3) the identification of the variables influencing execution tracing quality, (4) the quality model constructed to describe execution tracing quality, and (5) the link of the constructed quality model to the quality model of the ISO/IEC 25010 standard, with the possibility of tailoring to specific project needs. Further work, outside the frames of this PhD thesis, would also be useful as presented in the study: (1) to define application-project profiles to assist tailoring the quality model for execution tracing to specific application and project domains, and (2) to approximate the present quality model for execution tracing, within defined bounds, by simpler mathematical approaches. In conclusion, the research contributes to (1) supporting the daily work of software professionals, who need to analyse execution traces; (2) raising awareness that execution tracing quality has a huge impact on software development, software maintenance and on the professionals involved in the different stages of the software development life-cycle; (3) providing a framework in which the present endeavours for log improvements can be placed, and (4) suggesting an extension of the ISO/IEC 25010 standard by linking the constructed quality model to that. In addition, in the scope of the qualitative research methodology, the current PhD thesis contributes to the knowledge of research methods with determining a saturation point in the course of the data collection process