11 research outputs found

    Social aspects of collaboration in online software communities

    Get PDF

    Toxic Code Snippets on Stack Overflow

    Get PDF
    Online code clones are code fragments that are copied from software projects or online sources to Stack Overflow as examples. Due to an absence of a checking mechanism after the code has been copied to Stack Overflow, they can become toxic code snippets, e.g., they suffer from being outdated or violating the original software license. We present a study of online code clones on Stack Overflow and their toxicity by incorporating two developer surveys and a large-scale code clone detection. A survey of 201 high-reputation Stack Overflow answerers (33% response rate) showed that 131 participants (65%) have ever been notified of outdated code and 26 of them (20%) rarely or never fix the code. 138 answerers (69%) never check for licensing conflicts between their copied code snippets and Stack Overflow?s CC BY-SA 3.0. A survey of 87 Stack Overflow visitors shows that they experienced several issues from Stack Overflow answers: mismatched solutions, outdated solutions, incorrect solutions, and buggy code. 85% of them are not aware of CC BY-SA 3.0 license enforced by Stack Overflow, and 66% never check for license conflicts when reusing code snippets. Our clone detection found online clone pairs between 72,365 Java code snippets on Stack Overflow and 111 open source projects in the curated Qualitas corpus. We analysed 2,289 non-trivial online clone candidates. Our investigation revealed strong evidence that 153 clones have been copied from a Qualitas project to Stack Overflow. We found 100 of them (66%) to be outdated, of which 10 were buggy and harmful for reuse. Furthermore, we found 214 code snippets that could potentially violate the license of their original software and appear 7,112 times in 2,427 GitHub projects

    Understanding and Mitigating Flaky Software Test Cases

    Get PDF
    A flaky test is a test case that can pass or fail without changes to the test case code or the code under test. They are a wide-spread problem with serious consequences for developers and researchers alike. For developers, flaky tests lead to time wasted debugging spurious failures, tempting them to ignore future failures. While unreliable, flaky tests can still indicate genuine issues in the code under test, so ignoring them can lead to bugs being missed. The non-deterministic behaviour of flaky tests is also a major snag to continuous integration, where a single flaky test can fail an entire build. For researchers, flaky tests challenge the assumption that a test failure implies a bug, an assumption that many fundamental techniques in software engineering research rely upon, including test acceleration, mutation testing, and fault localisation. Despite increasing research interest in the topic, open problems remain. In particular, there has been relatively little attention paid to the views and experiences of developers, despite a considerable body of empirical work. This is essential to guide the focus of research into areas that are most likely to be beneficial to the software engineering industry. Furthermore, previous automated techniques for detecting flaky tests are typically either based on exhaustively rerunning test cases or machine learning classifiers. The prohibitive runtime of the rerunning approach and the demonstrably poor inter-project generalisability of classifiers leaves practitioners with a stark choice when it comes to automatically detecting flaky tests. In response to these challenges, I set two high-level goals for this thesis: (1) to enhance the understanding of the manifestation, causes, and impacts of flaky tests; and (2) to develop and empirically evaluate efficient automated techniques for mitigating flaky tests. In pursuit of these goals, this thesis makes five contributions: (1) a comprehensive systematic literature review of 76 published papers; (2) a literature-guided survey of 170 professional software developers; (3) a new feature set for encoding test cases in machine learning-based flaky test detection; (4) a novel approach for reducing the time cost of rerunning-based techniques for detecting flaky tests by combining them with machine learning classifiers; and (5) an automated technique that detects and classifies existing flaky tests in a project and produces reusable project-specific machine learning classifiers able to provide fast and accurate predictions for future test cases in that project

    Towards an Improved Understanding of Software Vulnerability Assessment Using Data-Driven Approaches

    Get PDF
    Software Vulnerabilities (SVs) can expose software systems to cyber-attacks, potentially causing enormous financial and reputational damage for organizations. There have been significant research efforts to detect these SVs so that developers can promptly fix them. However, fixing SVs is complex and time-consuming in practice, and thus developers usually do not have sufficient time and resources to fix all SVs at once. As a result, developers often need SV information, such as exploitability, impact, and overall severity, to prioritize fixing more critical SVs. Such information required for fixing planning and prioritization is typically provided in the SV assessment step of the SV lifecycle. Recently, data-driven methods have been increasingly proposed to automate SV assessment tasks. However, there are still numerous shortcomings with the existing studies on data-driven SV assessment that would hinder their application in practice. This PhD thesis aims to contribute to the growing literature in data-driven SV assessment by investigating and addressing the constant changes in SV data as well as the lacking considerations of source code and developers’ needs for SV assessment that impede the practical applicability of the field. Particularly, we have made the following five contributions in this thesis. (1) We systematize the knowledge of data-driven SV assessment to reveal the best practices of the field and the main challenges affecting its application in practice. Subsequently, we propose various solutions to tackle these challenges to better support the real-world applications of data-driven SV assessment. (2) We first demonstrate the existence of the concept drift (changing data) issue in descriptions of SV reports that current studies have mostly used for predicting the Common Vulnerability Scoring System (CVSS) metrics. We augment report-level SV assessment models with subwords of terms extracted from SV descriptions to help the models more effectively capture the semantics of ever-increasing SVs. (3) We also identify that SV reports are usually released after SV fixing. Thus, we propose using vulnerable code to enable earlier SV assessment without waiting for SV reports. We are the first to use Machine Learning techniques to predict CVSS metrics on the function level leveraging vulnerable statements directly causing SVs and their context in code functions. The performance of our function-level SV assessment models is promising, opening up research opportunities in this new direction. (4) To facilitate continuous integration of software code nowadays, we present a novel deep multi-task learning model, DeepCVA, to simultaneously and efficiently predict multiple CVSS assessment metrics on the commit level, specifically using vulnerability-contributing commits. DeepCVA is the first work that enables practitioners to perform SV assessment as soon as vulnerable changes are added to a codebase, supporting just-in-time prioritization of SV fixing. (5) Besides code artifacts produced from a software project of interest, SV assessment tasks can also benefit from SV crowdsourcing information on developer Question and Answer (Q&A) websites. We automatically retrieve large-scale security/SVrelated posts from these Q&A websites. We then apply a topic modeling technique on these posts to distill developers’ real-world SV concerns that can be used for data-driven SV assessment. Overall, we believe that this thesis has provided evidence-based knowledge and useful guidelines for researchers and practitioners to automate SV assessment using data-driven approaches.Thesis (Ph.D.) -- University of Adelaide, School of Computer Science, 202

    Socio-technical systems for loosely bound cooperation

    Get PDF
    Thesis (S.M.)--Massachusetts Institute of Technology, School of Architecture and Planning, Program in Media Arts and Sciences, 2009.Cataloged from PDF version of thesis.Includes bibliographical references (p. 118-123).This thesis introduces a programming environment entitled Share that is designed to support and encourage loosely bound cooperation between individuals within communities of practice through the sharing of code. Loosely bound cooperation refers to the opportunity members of communities have to assist and share resources with one another while maintaining their autonomy and independent practice. We contrast this model with forms of collaboration that enable large numbers of distributed individuals to collaborate on large scale works where they are guided by a shared vision of what they are collectively trying to achieve. Our hypothesis is that providing fine-grained, publicly visible attribution of code sharing activity within a community can provide socially motivated encouragement for participation as well as pragmatic value of being able to better track downstream use and changes to contributions that an individual makes. We shall present a discussion of loosely bound collaborative practice in various creative domains and the technological and social factors that contribute to the salience of these forms of cooperation today as well as discussing the motivational factors associated with open source development and how they differ in the case of cooperating individuals who do not share a project. We will also present an overview of the design of our tool and the objectives that guided its design and a discussion of a small-scale deployment of our prototype among members of a particular community of practice.by Yannick Mahugnon Assogba.S.M

    Big Code Applications and Approaches

    Get PDF
    The availability of a huge amount of source code from code archives and open-source projects opens up the possibility to merge machine learning, programming languages, and software engineering research fields. This area is often referred to as Big Code where programming languages are treated instead of natural languages while different features and patterns of code can be exploited to perform many useful tasks and build supportive tools. Among all the possible applications which can be developed within the area of Big Code, the work presented in this research thesis mainly focuses on two particular tasks: the Programming Language Identification (PLI) and the Software Defect Prediction (SDP) for source codes. Programming language identification is commonly needed in program comprehension and it is usually performed directly by developers. However, when it comes at big scales, such as in widely used archives (GitHub, Software Heritage), automation of this task is desirable. To accomplish this aim, the problem is analyzed from different points of view (text and image-based learning approaches) and different models are created paying particular attention to their scalability. Software defect prediction is a fundamental step in software development for improving quality and assuring the reliability of software products. In the past, defects were searched by manual inspection or using automatic static and dynamic analyzers. Now, the automation of this task can be tackled using learning approaches that can speed up and improve related procedures. Here, two models have been built and analyzed to detect some of the commonest bugs and errors at different code granularity levels (file and method levels). Exploited data and models’ architectures are analyzed and described in detail. Quantitative and qualitative results are reported for both PLI and SDP tasks while differences and similarities concerning other related works are discussed

    Security considerations in the open source software ecosystem

    Get PDF
    Open source software plays an important role in the software supply chain, allowing stakeholders to utilize open source components as building blocks in their software, tooling, and infrastructure. But relying on the open source ecosystem introduces unique challenges, both in terms of security and trust, as well as in terms of supply chain reliability. In this dissertation, I investigate approaches, considerations, and encountered challenges of stakeholders in the context of security, privacy, and trustworthiness of the open source software supply chain. Overall, my research aims to empower and support software experts with the knowledge and resources necessary to achieve a more secure and trustworthy open source software ecosystem. In the first part of this dissertation, I describe a research study investigating the security and trust practices in open source projects by interviewing 27 owners, maintainers, and contributors from a diverse set of projects to explore their behind-the-scenes processes, guidance and policies, incident handling, and encountered challenges, finding that participants’ projects are highly diverse in terms of their deployed security measures and trust processes, as well as their underlying motivations. More on the consumer side of the open source software supply chain, I investigated the use of open source components in industry projects by interviewing 25 software developers, architects, and engineers to understand their projects’ processes, decisions, and considerations in the context of external open source code, finding that open source components play an important role in many of the industry projects, and that most projects have some form of company policy or best practice for including external code. On the side of end-user focused software, I present a study investigating the use of software obfuscation in Android applications, which is a recommended practice to protect against plagiarism and repackaging. The study leveraged a multi-pronged approach including a large-scale measurement, a developer survey, and a programming experiment, finding that only 24.92% of apps are obfuscated by their developer, that developers do not fear theft of their own apps, and have difficulties obfuscating their own apps. Lastly, to involve end users themselves, I describe a survey with 200 users of cloud office suites to investigate their security and privacy perceptions and expectations, with findings suggesting that users are generally aware of basic security implications, but lack technical knowledge for envisioning some threat models. The key findings of this dissertation include that open source projects have highly diverse security measures, trust processes, and underlying motivations. That the projects’ security and trust needs are likely best met in ways that consider their individual strengths, limitations, and project stage, especially for smaller projects with limited access to resources. That open source components play an important role in industry projects, and that those projects often have some form of company policy or best practice for including external code, but developers wish for more resources to better audit included components. This dissertation emphasizes the importance of collaboration and shared responsibility in building and maintaining the open source software ecosystem, with developers, maintainers, end users, researchers, and other stakeholders alike ensuring that the ecosystem remains a secure, trustworthy, and healthy resource for everyone to rely on

    Computational Intelligence and Human- Computer Interaction: Modern Methods and Applications

    Get PDF
    The present book contains all of the articles that were accepted and published in the Special Issue of MDPI’s journal Mathematics titled "Computational Intelligence and Human–Computer Interaction: Modern Methods and Applications". This Special Issue covered a wide range of topics connected to the theory and application of different computational intelligence techniques to the domain of human–computer interaction, such as automatic speech recognition, speech processing and analysis, virtual reality, emotion-aware applications, digital storytelling, natural language processing, smart cars and devices, and online learning. We hope that this book will be interesting and useful for those working in various areas of artificial intelligence, human–computer interaction, and software engineering as well as for those who are interested in how these domains are connected in real-life situations

    Chatbots for Modelling, Modelling of Chatbots

    Full text link
    Tesis Doctoral inédita leída en la Universidad Autónoma de Madrid, Escuela Politécnica Superior, Departamento de Ingeniería Informática. Fecha de Lectura: 28-03-202

    Transforming Conservation

    Get PDF
    There are severe problems with the decision-making processes currently widely used, leading to ineffective use of evidence, faulty decisions, wasting of resources and the erosion of public and political support. In this book an international team of experts provide solutions. The transformation suggested includes rethinking how evidence is assessed, combined, communicated and used in decision-making; using effective methods when asking experts to make judgements (i.e. avoiding just asking an expert or a group of experts!); using a structured process for making decisions that incorporate the evidence and having effective processes for learning from actions. In each case, the specific problem with decision making is described with a range of practical solutions. Adopting this approach to decision-making requires societal change so detailed suggestions are made for transforming organisations, governments, businesses, funders and philanthropists. The practical suggestions include twelve downloadable checklists. The vision of the authors is to transform conservation so it is more effective, more cost-efficient, learns from practice and is more attractive to funders. However, the lessons of this important book go well beyond conservation to decision-makers in any field
    corecore