9 research outputs found
State of Refactoring Adoption: Towards Better Understanding Developer Perception of Refactoring
Context: Refactoring is the art of improving the structural design of a software system without altering its external behavior. Today, refactoring has become a well-established and disciplined software engineering practice that has attracted a significant amount of research presuming that refactoring is primarily motivated by the need to improve system structures. However, recent studies have shown that developers may incorporate refactoring strategies in other development-related activities that go beyond improving the design especially with the emerging challenges in contemporary software engineering. Unfortunately, these studies are limited to developer interviews and a reduced set of projects. Objective: We aim at exploring how developers document their refactoring activities during the software life cycle. We call such activity Self-Affirmed Refactoring (SAR), which is an indication of the developer-related refactoring events in the commit messages. After that, we propose an approach to identify whether a commit describes developer-related refactoring events, to classify them according to the refactoring common quality improvement categories. To complement this goal, we aim to reveal insights into how reviewers develop a decision about accepting or rejecting a submitted refactoring request, what makes such review challenging, and how to the efficiency of refactoring code review. Method: Our empirically driven study follows a mixture of qualitative and quantitative methods. We text mine refactoring-related documentation, then we develop a refactoring taxonomy, and automatically classify a large set of commits containing refactoring activities, and identify, among the various quality models presented in the literature, the ones that are more in-line with the developer\u27s vision of quality optimization, when they explicitly mention that they are refactoring to improve them to obtain an enhanced understanding of the motivation behind refactoring. After that, we performed an industrial case study with professional developers at Xerox to study the motivations, documentation practices, challenges, verification, and implications of refactoring activities during code review. Result: We introduced SAR taxonomy on how developers document their refactoring strategies in commit messages and proposed a SAR model to automate the detection of refactoring. Our survey with code reviewers has revealed several difficulties related to understanding the refactoring intent and implications on the functional and non-functional aspects of the software. Conclusion: Our SAR taxonomy and model, can work in conjunction with refactoring detectors, to report any early inconsistency between refactoring types and their documentation and can serve as a solid background for various empirical investigations. In light of our findings of the industrial case study, we recommended a procedure to properly document refactoring activities, as part of our survey feedback
Refactoring for Reuse: An Empirical Study
Refactoring is the de-facto practice to optimize software health. While several studies propose refactoring strategies to optimize software design through applying design patterns and removing design defects, little is known about how developers actually refactor their code to improve its reuse. Therefore, we extract, from 1,828 open source projects, a set of refactorings that were intended to improve the software reusability. We analyze the impact of reusability refactorings on the state-of-the-art reusability metrics, and we compare the distribution of reusability refactoring types, with the distribution of the remaining mainstream refactorings. Overall, we found that the distribution of refactoring types, applied in the context of reusability, is different from the distribution of refactoring types in mainstream development. In the refactorings performed to improve reusability, source files are subject to more design level types of refactorings. Reusability refactorings significantly impact, high-level code elements, such as packages, classes, and methods, while typical refactorings, impact all code elements, including identifiers, and parameters. These findings provide practical insights into the current practice of refactoring in the context of code reuse involving the act of refactoring
Selenite: Scaffolding Online Sensemaking with Comprehensive Overviews Elicited from Large Language Models
Sensemaking in unfamiliar domains can be challenging, demanding considerable
user effort to compare different options with respect to various criteria.
Prior research and our formative study found that people would benefit from
reading an overview of an information space upfront, including the criteria
others previously found useful. However, existing sensemaking tools struggle
with the "cold-start" problem -- it not only requires significant input from
previous users to generate and share these overviews, but such overviews may
also turn out to be biased and incomplete. In this work, we introduce a novel
system, Selenite, which leverages Large Language Models (LLMs) as reasoning
machines and knowledge retrievers to automatically produce a comprehensive
overview of options and criteria to jumpstart users' sensemaking processes.
Subsequently, Selenite also adapts as people use it, helping users find, read,
and navigate unfamiliar information in a systematic yet personalized manner.
Through three studies, we found that Selenite produced accurate and
high-quality overviews reliably, significantly accelerated users' information
processing, and effectively improved their overall comprehension and
sensemaking experience.Comment: Accepted to CHI 202
Holistic recommender systems for software engineering
The knowledge possessed by developers is often not sufficient to overcome a programming problem. Short of talking to teammates, when available, developers often gather additional knowledge from development artifacts (e.g., project documentation), as well as online resources. The web has become an essential component in the modern developer’s daily life, providing a plethora of information from sources like forums, tutorials, Q&A websites, API documentation, and even video tutorials. Recommender Systems for Software Engineering (RSSE) provide developers with assistance to navigate the information space, automatically suggest useful items, and reduce the time required to locate the needed information. Current RSSEs consider development artifacts as containers of homogeneous information in form of pure text. However, text is a means to represent heterogeneous information provided by, for example, natural language, source code, interchange formats (e.g., XML, JSON), and stack traces. Interpreting the information from a pure textual point of view misses the intrinsic heterogeneity of the artifacts, thus leading to a reductionist approach. We propose the concept of Holistic Recommender Systems for Software Engineering (H-RSSE), i.e., RSSEs that go beyond the textual interpretation of the information contained in development artifacts. Our thesis is that modeling and aggregating information in a holistic fashion enables novel and advanced analyses of development artifacts. To validate our thesis we developed a framework to extract, model and analyze information contained in development artifacts in a reusable meta- information model. We show how RSSEs benefit from a meta-information model, since it enables customized and novel analyses built on top of our framework. The information can be thus reinterpreted from an holistic point of view, preserving its multi-dimensionality, and opening the path towards the concept of holistic recommender systems for software engineering
Thinking FORTH: a language and philosophy for solving problems
XIV, 313 p. ; 24 cmLibro ElectrĂłnicoThinking Forth is a book about the philosophy of problem solving and programming style, applied to the unique programming language Forth. Published first in 1984, it could be among the timeless classics of computer books, such as Fred Brooks' The Mythical Man-Month and Donald Knuth's The Art of Computer Programming. Many software engineering principles discussed here have been rediscovered in eXtreme Programming, including (re)factoring, modularity, bottom-up and incremental design. Here you'll find all of those and more - such as the value of analysis and design - described in Leo Brodie's down-to-earth, humorous style, with illustrations, code examples, practical real life applications, illustrative cartoons, and interviews with Forth's inventor, Charles H. Moore as well as other Forth thinkers. If you program in Forth, this is a must-read book. If you don't, the fundamental concepts are universal: Thinking Forth is meant for anyone interested in writing software to solve problems. The concepts go beyond Forth, but the simple beauty of Forth throws those concepts into stark relief. So flip open the book, and read all about the philosophy of Forth, analysis, decomposition, problem solving, style and conventions, factoring, handling data, and minimizing control structures. But be prepared: you may not be able to put it down. This book has been scanned, OCR'd, typeset in LaTeX, and brought back to print (and your monitor) by a collaborative effort under a Creative Commons license. http://thinking-forth.sourceforge.net/The Philosophy of Forth
An Armchair History of Software Elegance; The Superficiality of Structure; Looking Back, and Forth; Component Programming; Hide From Whom?; Hiding the Construction of Data Structures; But Is It a High-Level Language?; The Language of Design; The Language of Performance; Summary; References
Analysis
The Nine Phases of the Programming Cycle; The Iterative Approach; The Value of Planning; The Limitations of Planning; The Analysis Phase; Defining the Interfaces; Defining the Rules; Defining the Data Structures; Achieving Simplicity; Budgeting and Scheduling; Reviewing the Conceptual Model; References
Preliminary Design/Decomposition
Decomposition by Component; Example: A Tiny Editor; Maintaining a Component-based Application; Designing and Maintaining a Traditional Application; The Interface Component; Decomposition by Sequential Complexity; The Limits of Level Thinking; Summary; For Further Thinking;
Detailed Design/Problem Solving
Problem-Solving Techniques; Interview with a Software Inventor; Detailed Design; Forth Syntax; Algorithms and Data Structures; Calculations vs. Data Structures vs. Logic; Solving a Problem: Computing Roman Numerals; Summary; References; For Further Thinking
Implementation: Elements of Forth Style
Listing Organization; Screen Layout; Comment Conventions; Vertical Format vs. Horizontal Format; Choosing Names: The Art; Naming Standards: The Science; More Tips for Readability; Summary; References
Factoring
Factoring Techniques; Factoring Criteria; Compile-Time Factoring; The Iterative Approach in Implementation; References
Handling Data: Stacks and States
The Stylish Stack; The Stylish Return Stack; The Problem With Variables; Local and Global Variables/Initialization; Saving and Restoring a State; Application Stacks; Sharing Components; The State Table; Vectored Execution; Using DOER/MAKE; Summary; References
Minimizing Control Structures
What’s So Bad about Control Structures?; How to Eliminate Control Structures; A Note on Tricks; Summary; References; For Further Thinking
Forth’s Effect on Thinking
Appendix A Overview of Forth (For Newcomers); Appendix B Defining DOER/MAKE; Appendix C Other Utilities Described in This Book; Appendix D Answers to “Further Thinking” Problems; Appendix E Summary of Style Conventions;
Inde
Supporting complex workflows for data-intensive discovery reliably and efficiently
Scientific workflows have emerged as well-established pillars of large-scale computational science and appeared as torchbearers to formalize and structure a massive amount of complex heterogeneous data and accelerate scientific progress. Scientists of diverse domains can analyze their data by constructing scientific workflows as a useful paradigm to manage complex scientific computations. A workflow can analyze terabyte-scale datasets, contain numerous individual tasks, and coordinate between heterogeneous tasks with the help of scientific workflow management systems (SWfMSs). However, even for expert users, workflow creation is a complex task due to the dramatic growth of tools and data heterogeneity. Scientists are now more willing to publicly share scientific datasets and analysis pipelines in the interest of open science. As sharing of research data and resources increases in scientific communities, scientists can reuse existing workflows shared in several workflow repositories. Unfortunately, several challenges can prevent scientists from reusing those workflows, which hurts the purpose of the community-oriented knowledge base. In this thesis, we first identify the repositories that scientists use to share and reuse scientific workflows. Among several repositories, we find Galaxy repositories have numerous workflows, and Galaxy is the mostly used SWfMS. After selecting the Galaxy repositories, we attempt to explore the workflows and encounter several challenges in reusing them. We classify the reusability status (reusable/nonreusable). Based on the effort level, we further categorize the reusable workflows (reusable without modification, easily reusable, moderately difficult to reuse, and difficult to reuse). Upon failure, we record the associated challenges that prevent reusability. We also list the actions upon success. The challenges preventing reusability include tool upgrading, tool support unavailability, design flaws, incomplete workflows, failure to load a workflow, etc. We need to perform several actions to overcome the challenges. The actions include identifying proper input datasets, updating/upgrading tools, finding alternative tools support for obsolete tools, debugging to find the issue creating tools and connections and solving them, modifying tools connections, etc. Such challenges and our action list offer guidelines to future workflow composers to create better workflows with enhanced reusability. A SWfMS stores provenance data at different phases of a workflow life cycle, which can help workflow construction. This provenance data allows reproducibility and knowledge reuse in the scientific community. But, this provenance information is usually many times larger than the workflow and input data, and managing provenance data is growing in complexity with large-scale applications. In our second study, we document the challenges of provenance management and reuse in e-science, focusing primarily on scientific workflow approaches by exploring different SWfMSs and provenance management systems. We also investigate the ways to overcome the challenges. Creating a workflow is difficult but essential for data-intensive complex analysis, and the existing workflows have several challenges to be reused, so in our third study, we build a recommendation system to recommend tool(s) using machine learning approaches to help scientists create optimal, error-free, and efficient workflows by using existing reusable workflows in Galaxy workflow repositories. The findings from our studies and proposed techniques have the potential to simplify the data-intensive analysis, ensuring reliability and efficiency
Data Science Techniques for Modelling Execution Tracing Quality
This research presents how to handle a research problem when the research variables are still unknown, and
no quantitative study is possible; how to identify the research variables, to be able to perform a quantitative
research, how to collect data by means of the research variables identified, and how to carry out modelling
with the considerations of the specificities of the problem domain. In addition, validation is also encompassed
in the scope of modelling in the current study. Thus, the work presented in this thesis comprises the typical
stages a complex data science problem requires, including qualitative and quantitative research, data collection,
modelling of vagueness and uncertainty, and the leverage of artificial intelligence to gain such insights, which
are impossible with traditional methods.
The problem domain of the research conducted encompasses software product quality modelling, and assessment,
with particular focus on execution tracing quality. The terms execution tracing quality and logging are
used interchangeably throughout the thesis.
The research methods and mathematical tools used allow considering uncertainty and vagueness inherently
associated with the quality measurement and assessment process through which reality can be approximated
more appropriately in comparison to plain statistical modelling techniques. Furthermore, the modelling approach
offers direct insights into the problem domain by the application of linguistic rules, which is an additional
advantage.
The thesis reports (1) an in-depth investigation of all the identified software product quality models, (2) a
unified summary of the identified software product quality models with their terminologies and concepts, (3)
the identification of the variables influencing execution tracing quality, (4) the quality model constructed to
describe execution tracing quality, and (5) the link of the constructed quality model to the quality model of the
ISO/IEC 25010 standard, with the possibility of tailoring to specific project needs.
Further work, outside the frames of this PhD thesis, would also be useful as presented in the study: (1) to define
application-project profiles to assist tailoring the quality model for execution tracing to specific application and
project domains, and (2) to approximate the present quality model for execution tracing, within defined bounds,
by simpler mathematical approaches.
In conclusion, the research contributes to (1) supporting the daily work of software professionals, who need
to analyse execution traces; (2) raising awareness that execution tracing quality has a huge impact on software
development, software maintenance and on the professionals involved in the different stages of the software
development life-cycle; (3) providing a framework in which the present endeavours for log improvements
can be placed, and (4) suggesting an extension of the ISO/IEC 25010 standard by linking the constructed
quality model to that. In addition, in the scope of the qualitative research methodology, the current PhD thesis
contributes to the knowledge of research methods with determining a saturation point in the course of the data
collection process