324 research outputs found

    Profiling Developers Through the Lens of Technical Debt

    Full text link
    Context: Technical Debt needs to be managed to avoid disastrous consequences, and investigating developers' habits concerning technical debt management is invaluable information in software development. Objective: This study aims to characterize how developers manage technical debt based on the code smells they induce and the refactorings they apply. Method: We mined a publicly-available Technical Debt dataset for Git commit information, code smells, coding violations, and refactoring activities for each developer of a selected project. Results: By combining this information, we profile developers to recognize prolific coders, highlight activities that discriminate among developer roles (reviewer, lead, architect), and estimate coding maturity and technical debt tolerance

    Improving Software Project Health Using Machine Learning

    Get PDF
    In recent years, systems that would previously live on different platforms have been integrated under a single umbrella. The increased use of GitHub, which offers pull-requests, issue trackingand version history, and its integration with other solutions such as Gerrit, or Travis, as well as theresponse from competitors, created development environments that favour agile methodologiesby increasingly automating non-coding tasks: automated build systems, automated issue triagingetc. In essence, source-code hosting platforms shifted to continuous integration/continuousdelivery (CI/CD) as a service. This facilitated a shift in development paradigms, adherents ofagile methodology can now adopt a CI/CD infrastructure more easily. This has also created large,publicly accessible sources of source-code together with related project artefacts: GHTorrent andsimilar datasets now offer programmatic access to the whole of GitHub. Project health encompasses traceability, documentation, adherence to coding conventions,tasks that reduce maintenance costs and increase accountability, but may not directly impactfeatures. Overfocus on health can slow velocity (new feature delivery) so the Agile Manifestosuggests developers should travel light — forgo tasks focused on a project health in favourof higher feature velocity. Obviously, injudiciously following this suggestion can undermine aproject’s chances for success. Simultaneously, this shift to CI/CD has allowed the proliferation of Natural Language orNatural Language and Formal Language textual artefacts that are programmatically accessible:GitHub and their competitors allow API access to their infrastructure to enable the creation ofCI/CD bots. This suggests that approaches from Natural Language Processing and MachineLearning are now feasible and indeed desirable. This thesis aims to (semi-)automate tasks forthis new paradigm and its attendant infrastructure by bringing to the foreground the relevant NLPand ML techniques. Under this umbrella, I focus on three synergistic tasks from this domain: (1) improving theissue-pull-request traceability, which can aid existing systems to automatically curate the issuebacklog as pull-requests are merged; (2) untangling commits in a version history, which canaid the beforementioned traceability task as well as improve the usability of determining a faultintroducing commit, or cherry-picking via tools such as git bisect; (3) mixed-text parsing, to allowbetter API mining and open new avenues for project-specific code-recommendation tools

    Nineteenth-Century Scientific American Illustrations and the Development of American Mechanical Drawing

    Get PDF
    This project, sponsored by the American Antiquarian Society, involves work on the database that allows researchers to search illustrations of inventions in the early volumes of Scientific American (1845-1869). This database has been actively developed in a series of WPI IQPs over the last twelve years. The current project focuses on improving the website user interface. This report also discusses the history of mechanical drawing pedagogy and the adoption of professional methods for producing mechanical drawings in America during the 1800s

    Studying the Characteristics of AIOps Projects on GitHub

    Full text link
    Artificial Intelligence for IT Operations (AIOps) leverages AI approaches to handle the massive data generated during the operations of software systems. Prior works have proposed various AIOps solutions to support different tasks in system operations and maintenance (e.g., anomaly detection). In this work, we investigate open-source AIOps projects in-depth to understand the characteristics of AIOps in practice. We first carefully identify a set of AIOps projects from GitHub and analyze their repository metrics (e.g., the used programming languages). Then, we qualitatively study the projects to understand their input data, analysis techniques, and goals. Finally, we analyze the quality of these projects using different quality metrics, such as the number of bugs. We also sample two sets of baseline projects from GitHub: a random sample of machine learning projects, and a random sample of general purpose projects. We compare different metrics of our identified AIOps projects with these baselines. Our results show a recent and growing interest in AIOps solutions. However, the quality metrics indicate that AIOps projects suffer from more issues than our baseline projects. We also pinpoint the most common issues in AIOps approaches and discuss the possible solutions to overcome them. Our findings help practitioners and researchers understand the current state of AIOps practices and sheds light to different ways to improve AIOps weak aspects. To the best of our knowledge, this work is the first to characterize open source AIOps projects.Comment: 31 pages, 6 pages of references, 8 figures, 12 table

    On the Way to SBOMs: Investigating Design Issues and Solutions in Practice

    Full text link
    Software Bill of Materials (SBOM), offers improved transparency and supply chain security by providing a machine-readable inventory of software components used. With the rise in software supply chain attacks, the SBOM has attracted attention from both academia and industry. This paper presents a study on the practice of SBOM, based on the analysis of 4,786 GitHub discussions from 510 SBOM-related projects. Our study identifies key topics, challenges, and solutions associated with effective SBOM usage. We also highlight commonly used tools and frameworks for generating SBOMs, along with their respective strengths and limitations. Our research underscores the importance of SBOMs in software development and the need for their widespread adoption to enhance supply chain security. Additionally, the insights gained from our study can inform future research and development in this field

    A large-scale comparative analysis of Coding Standard conformance in Open-Source Data Science projects

    Full text link
    Background: Meeting the growing industry demand for Data Science requires cross-disciplinary teams that can translate machine learning research into production-ready code. Software engineering teams value adherence to coding standards as an indication of code readability, maintainability, and developer expertise. However, there are no large-scale empirical studies of coding standards focused specifically on Data Science projects. Aims: This study investigates the extent to which Data Science projects follow code standards. In particular, which standards are followed, which are ignored, and how does this differ to traditional software projects? Method: We compare a corpus of 1048 Open-Source Data Science projects to a reference group of 1099 non-Data Science projects with a similar level of quality and maturity. Results: Data Science projects suffer from a significantly higher rate of functions that use an excessive numbers of parameters and local variables. Data Science projects also follow different variable naming conventions to non-Data Science projects. Conclusions: The differences indicate that Data Science codebases are distinct from traditional software codebases and do not follow traditional software engineering conventions. Our conjecture is that this may be because traditional software engineering conventions are inappropriate in the context of Data Science projects.Comment: 11 pages, 7 figures. To appear in ESEM 2020. Updated based on peer revie
    • …
    corecore