794 research outputs found

    Aspect of Code Cloning Towards Software Bug and Imminent Maintenance: A Perspective on Open-source and Industrial Mobile Applications

    Get PDF
    As a part of the digital era of microtechnology, mobile application (app) development is evolving with lightning speed to enrich our lives and bring new challenges and risks. In particular, software bugs and failures cost trillions of dollars every year, including fatalities such as a software bug in a self-driving car that resulted in a pedestrian fatality in March 2018 and the recent Boeing-737 Max tragedies that resulted in hundreds of deaths. Software clones (duplicated fragments of code) are also found to be one of the crucial factors for having bugs or failures in software systems. There have been many significant studies on software clones and their relationships to software bugs for desktop-based applications. Unfortunately, while mobile apps have become an integral part of today’s era, there is a marked lack of such studies for mobile apps. In order to explore this important aspect, in this thesis, first, we studied the characteristics of software bugs in the context of mobile apps, which might not be prevalent for desktop-based apps such as energy-related (battery drain while using apps) and compatibility-related (different behaviors of same app in different devices) bugs/issues. Using Support Vector Machine (SVM), we classified about 3K mobile app bug reports of different open-source development sites into four categories: crash, energy, functionality and security bug. We then manually examined a subset of those bugs and found that over 50% of the bug-fixing code-changes occurred in clone code. There have been a number of studies with desktop-based software systems that clearly show the harmful impacts of code clones and their relationships to software bugs. Given that there is a marked lack of such studies for mobile apps, in our second study, we examined 11 open-source and industrial mobile apps written in two different languages (Java and Swift) and noticed that clone code is more bug-prone than non-clone code and that industrial mobile apps have a higher code clone ratio than open-source mobile apps. Furthermore, we correlated our study outcomes with those of existing desktop based studies and surveyed 23 mobile app developers to validate our findings. Along with validating our findings from the survey, we noticed that around 95% of the developers usually copy/paste (code cloning) code fragments from the popular Crowd-sourcing platform, Stack Overflow (SO) to their projects and that over 75% of such developers experience bugs after such activities (the code cloning from SO). Existing studies with desktop-based systems also showed that while SO is one of the most popular online platforms for code reuse (and code cloning), SO code fragments are usually toxic in terms of software maintenance perspective. Thus, in the third study of this thesis, we studied the consequences of code cloning from SO in different open source and industrial mobile apps. We observed that closed-source industrial apps even reused more SO code fragments than open-source mobile apps and that SO code fragments were more change-prone (such as bug) than non-SO code fragments. We also experienced that SO code fragments were related to more bugs in industrial projects than open-source ones. Our studies show how we could efficiently and effectively manage clone related software bugs for mobile apps by utilizing the positive sides of code cloning while overcoming (or at least minimizing) the negative consequences of clone fragments

    Characterizing and Detecting Duplicate Logging Code Smells

    Get PDF
    Developers rely on software logs for a wide variety of tasks, such as debugging, testing, program comprehension, verification, and performance analysis. Despite the importance of logs, prior studies show that there is no industrial standard on how to write logging statements. Recent research on logs often only considers the appropriateness of a log as an individual item (e.g., one single logging statement); while logs are typically analyzed in tandem. In this thesis, we focus on studying duplicate logging statements, which are logging statements that have the same static text message. Such duplications in the text message are potential indications of logging code smells, which may affect developers’ understanding of the dynamic view of the system. We manually studied over 3K duplicate logging statements and their surrounding code in four large-scale open source systems: Hadoop, CloudStack, ElasticSearch, and Cassandra. We uncovered five patterns of duplicate logging code smells. For each instance of the code smell, we further manually identify the problematic (i.e., require fixes) and justifiable (i.e., do not require fixes) cases. Then, we contact developers in order to verify our manual study result. We integrated our manual study result and developers’ feedback into our automated static analysis tool, DLFinder, which automatically detects problematic duplicate logging code smells. We evaluated DLFinder on the four manually studied systems and four additional systems: Kafka, Flink, Camel and Wicket. In total, combining the results of DLFinder and our manual analysis, we reported 91 problematic code smell instances to developers and all of them have been fixed. This thesis provides an initial step on creating a logging guideline for developers to improve the quality of logging code. DLFinder is also able to detect duplicate logging code smells with high precision and recall

    Analysis of Human Affect and Bug Patterns to Improve Software Quality and Security

    Get PDF
    The impact of software is ever increasing as more and more systems are being software operated. Despite the usefulness of software, many instances software failures have been causing tremendous losses in lives and dollars. Software failures take place because of bugs (i.e., faults) in the software systems. These bugs cause the program to malfunction or crash and expose security vulnerabilities exploitable by malicious hackers. Studies confirm that software defects and vulnerabilities appear in source code largely due to the human mistakes and errors of the developers. Human performance is impacted by the underlying development process and human affects, such as sentiment and emotion. This thesis examines these human affects of software developers, which have drawn recent interests in the community. For capturing developers’ sentimental and emotional states, we have developed several software tools (i.e., SentiStrength-SE, DEVA, and MarValous). These are novel tools facilitating automatic detection of sentiments and emotions from the software engineering textual artifacts. Using such an automated tool, the developers’ sentimental variations are studied with respect to the underlying development tasks (e.g., bug-fixing, bug-introducing), development periods (i.e., days and times), team sizes and project sizes. We expose opportunities for exploiting developers’ sentiments for higher productivity and improved software quality. While developers’ sentiments and emotions can be leveraged for proactive and active safeguard in identifying and minimizing software bugs, this dissertation also includes in-depth studies of the relationship among various bug patterns, such as software defects, security vulnerabilities, and code smells to find actionable insights in minimizing software bugs and improving software quality and security. Bug patterns are exposed through mining software repositories and bug databases. These bug patterns are crucial in localizing bugs and security vulnerabilities in software codebase for fixing them, predicting portions of software susceptible to failure or exploitation by hackers, devising techniques for automated program repair, and avoiding code constructs and coding idioms that are bug-prone. The software tools produced from this thesis are empirically evaluated using standard measurement metrics (e.g., precision, recall). The findings of all the studies are validated with appropriate tests for statistical significance. Finally, based on our experience and in-depth analysis of the present state of the art, we expose avenues for further research and development towards a holistic approach for developing improved and secure software systems

    Deep Learning Software Repositories

    Get PDF
    Bridging the abstraction gap between artifacts and concepts is the essence of software engineering (SE) research problems. SE researchers regularly use machine learning to bridge this gap, but there are three fundamental issues with traditional applications of machine learning in SE research. Traditional applications are too reliant on labeled data. They are too reliant on human intuition, and they are not capable of learning expressive yet efficient internal representations. Ultimately, SE research needs approaches that can automatically learn representations of massive, heterogeneous, datasets in situ, apply the learned features to a particular task and possibly transfer knowledge from task to task. Improvements in both computational power and the amount of memory in modern computer architectures have enabled new approaches to canonical machine learning tasks. Specifically, these architectural advances have enabled machines that are capable of learning deep, compositional representations of massive data depots. The rise of deep learning has ushered in tremendous advances in several fields. Given the complexity of software repositories, we presume deep learning has the potential to usher in new analytical frameworks and methodologies for SE research and the practical applications it reaches. This dissertation examines and enables deep learning algorithms in different SE contexts. We demonstrate that deep learners significantly outperform state-of-the-practice software language models at code suggestion on a Java corpus. Further, these deep learners for code suggestion automatically learn how to represent lexical elements. We use these representations to transmute source code into structures for detecting similar code fragments at different levels of granularity—without declaring features for how the source code is to be represented. Then we use our learning-based framework for encoding fragments to intelligently select and adapt statements in a codebase for automated program repair. In our work on code suggestion, code clone detection, and automated program repair, everything for representing lexical elements and code fragments is mined from the source code repository. Indeed, our work aims to move SE research from the art of feature engineering to the science of automated discovery

    Identifying developers’ habits and expectations in copy and paste programming practice

    Full text link
    Máster Universitario en Investigación e Innovación en Inteligencia Computacional y Sistemas InteractivosBoth novice and experienced developers rely more and more in external sources of code to include into their programs by copy and paste code snippets. This behavior differs from the traditional software design approach where cohesion was achieved via a conscious design effort. Due to this fact, it is essential to know how copy and paste programming practices are actually carried out, so that IDEs (Integrated Development Environments) and code recommenders can be designed to fit with developer expectations and habit

    Towards Collaborative Scientific Workflow Management System

    Get PDF
    The big data explosion phenomenon has impacted several domains, starting from research areas to divergent of business models in recent years. As this intensive amount of data opens up the possibilities of several interesting knowledge discoveries, over the past few years divergent of research domains have undergone the shift of trend towards analyzing those massive amount data. Scientific Workflow Management System (SWfMS) has gained much popularity in recent years in accelerating those data-intensive analyses, visualization, and discoveries of important information. Data-intensive tasks are often significantly time-consuming and complex in nature and hence SWfMSs are designed to efficiently support the specification, modification, execution, failure handling, and monitoring of the tasks in a scientific workflow. As far as the complexity, dimension, and volume of data are concerned, their effective analysis or management often become challenging for an individual and requires collaboration of multiple scientists instead. Hence, the notion of 'Collaborative SWfMS' was coined - which gained significant interest among researchers in recent years as none of the existing SWfMSs directly support real-time collaboration among scientists. In terms of collaborative SWfMSs, consistency management in the face of conflicting concurrent operations of the collaborators is a major challenge for its highly interconnected document structure among the computational modules - where any minor change in a part of the workflow can highly impact the other part of the collaborative workflow for the datalink relation among them. In addition to the consistency management, studies show several other challenges that need to be addressed towards a successful design of collaborative SWfMSs, such as sub-workflow composition and execution by different sub-groups, relationship between scientific workflows and collaboration models, sub-workflow monitoring, seamless integration and access control of the workflow components among collaborators and so on. In this thesis, we propose a locking scheme to facilitate consistency management in collaborative SWfMSs. The proposed method works by locking workflow components at a granular attribute level in addition to supporting locks on a targeted part of the collaborative workflow. We conducted several experiments to analyze the performance of the proposed method in comparison to related existing methods. Our studies show that the proposed method can reduce the average waiting time of a collaborator by up to 36% while increasing the average workflow update rate by up to 15% in comparison to existing descendent modular level locking techniques for collaborative SWfMSs. We also propose a role-based access control technique for the management of collaborative SWfMSs. We leverage the Collaborative Interactive Application Methodology (CIAM) for the investigation of role-based access control in the context of collaborative SWfMSs. We present our proposed method with a use-case of Plant Phenotyping and Genotyping research domain. Recent study shows that the collaborative SWfMSs often different sets of opportunities and challenges. From our investigations on existing research works towards collaborative SWfMSs and findings of our prior two studies, we propose an architecture of collaborative SWfMSs. We propose - SciWorCS - a Collaborative Scientific Workflow Management System as a proof of concept of the proposed architecture; which is the first of its kind to the best of our knowledge. We present several real-world use-cases of scientific workflows using SciWorCS. Finally, we conduct several user studies using SciWorCS comprising different real-world scientific workflows (i.e., from myExperiment) to understand the user behavior and styles of work in the context of collaborative SWfMSs. In addition to evaluating SciWorCS, the user studies reveal several interesting facts which can significantly contribute in the research domain, as none of the existing methods considered such empirical studies, and rather relied only on computer generated simulated studies for evaluation

    Software Maintenance At Commit-Time

    Get PDF
    Software maintenance activities such as debugging and feature enhancement are known to be challenging and costly, which explains an ever growing line of research in software maintenance areas including mining software repository, default prevention, clone detection, and bug reproduction. The main goal is to improve the productivity of software developers as they undertake maintenance tasks. Existing tools, however, operate in an offline fashion, i.e., after the changes to the systems have been made. Studies have shown that software developers tend to be reluctant to use these tools as part of a continuous development process. This is because they require installation and training, hindering their integration with developers’ workflow, which in turn limits their adoption. In this thesis, we propose novel approaches to support software developers at commit-time. As part of the developer’s workflow, a commit marks the end of a given task. We show how commits can be used to catch unwanted modifications to the system, and prevent the introduction of clones and bugs, before these modifications reach the central code repository. We also propose a bug reproduction technique that is based on model checking and crash traces. Furthermore, we propose a new way for classifying bugs based on the location of fixes that can serve as the basis for future research in this field of study. The techniques proposed in this thesis have been tested on over 400 open and closed (industrial) systems, resulting in high levels of precision and recall. They are also scalable and non-intrusive

    Fundamental Approaches to Software Engineering

    Get PDF
    This open access book constitutes the proceedings of the 25th International Conference on Fundamental Approaches to Software Engineering, FASE 2022, which was held during April 4-5, 2022, in Munich, Germany, as part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2022. The 17 regular papers presented in this volume were carefully reviewed and selected from 64 submissions. The proceedings also contain 3 contributions from the Test-Comp Competition. The papers deal with the foundations on which software engineering is built, including topics like software engineering as an engineering discipline, requirements engineering, software architectures, software quality, model-driven development, software processes, software evolution, AI-based software engineering, and the specification, design, and implementation of particular classes of systems, such as (self-)adaptive, collaborative, AI, embedded, distributed, mobile, pervasive, cyber-physical, or service-oriented applications

    Beyond Traditional Software Development: Studying and Supporting the Role of Reusing Crowdsourced Knowledge in Software Development

    Get PDF
    As software development is becoming increasingly complex, developers often need to reuse others’ code or knowledge made available online to tackle problems encountered during software development and maintenance. This phenomenon of using others' code or knowledge, often found on online forums, is referred to as crowdsourcing. A good example of crowdsourcing is posting a coding question on the Stack Overflow website and having others contribute code that solves that question. Recently, the phenomenon of crowdsourcing has attracted much attention from researchers and practitioners and recent studies show that crowdsourcing improves productivity and reduces time-to-market. However, like any solution, crowdsourcing brings with it challenges such as quality, maintenance, and even legal issues. The research presented in this thesis presents the result of a series of large-scale empirical studies involving some of the most popular crowdsourcing platforms such as Stack Overflow, Node Package Manager (npm), and Python Package Index (PyPI). The focus of these empirical studies is to investigate the role of reusing crowdsourcing knowledge and more particularly crowd code in the software development process. We first present two empirical studies on the reuse of knowledge from crowdsourcing platforms namely Stack Overflow. We found that reusing knowledge from this crowdsourcing platform has the potential to assist software development practices, specifically through source code reuse. However, relying on such crowdsourced knowledge might also negatively affect the quality of the software projects. Second, we empirically examine the type of development knowledge constructed on crowdsourcing platforms. We examine the use of trivial packages on npm and PyPI platforms. We found that trivial packages are common and developers tend to use them because they provide them with well tested and implemented code. However, developers are concerned about the maintenance overhead of these trivial packages due to the extra dependencies that trivial packages introduce. Finally, we used the gained knowledge to propose a pragmatic solution to improve the efficiency of relying on the crowd in software development. We proposed a rule-based technique that automatically detects commits that can skip the continuous integration process. We evaluate the performance of the proposed technique on a dataset of open-source Java projects. Our results show that continuous integration can be used to improve the efficiency of the reused code from crowdsourcing platforms. Among the findings of this thesis are that the way software is developed has changed dramatically. Developers rely on crowdsourcing to address problems encountered during software development and maintenance. The results presented in this thesis provides new insights on how knowledge from these crowdsourced platforms is reused in software systems and how some of this knowledge can be better integrated into current software development processes and best practices
    • …
    corecore