4 research outputs found

    Python Coding Style Compliance on Stack Overflow

    Get PDF
    Software developers all over the world use Stack Overflow (SO) to interact and exchange code snippets. Research also uses SO to harvest code snippets for use with recommendation systems. However, previous work has shown that code on SO may have quality issues, such as security or license problems. We analyse Python code on SO to determine its coding style compliance. From 1,962,535 code snippets tagged with 'python', we extracted 407,097 snippets of at least 6 statements of Python code. Surprisingly, 93.87% of the extracted snippets contain style violations, with an average of 0.7 violations per statement and a huge number of snippets with a considerably higher ratio. Researchers and developers should, therefore, be aware that code snippets on SO may not representative of good coding style. Furthermore, while user reputation seems to be unrelated to coding style compliance, for posts with vote scores in the range between -10 and 20, we found a strong correlation (r = -0.87, p <; 10^-7) between the vote score a post received and the average number of violations per statement for snippets in such posts

    A large-scale comparative analysis of Coding Standard conformance in Open-Source Data Science projects

    Full text link
    Background: Meeting the growing industry demand for Data Science requires cross-disciplinary teams that can translate machine learning research into production-ready code. Software engineering teams value adherence to coding standards as an indication of code readability, maintainability, and developer expertise. However, there are no large-scale empirical studies of coding standards focused specifically on Data Science projects. Aims: This study investigates the extent to which Data Science projects follow code standards. In particular, which standards are followed, which are ignored, and how does this differ to traditional software projects? Method: We compare a corpus of 1048 Open-Source Data Science projects to a reference group of 1099 non-Data Science projects with a similar level of quality and maturity. Results: Data Science projects suffer from a significantly higher rate of functions that use an excessive numbers of parameters and local variables. Data Science projects also follow different variable naming conventions to non-Data Science projects. Conclusions: The differences indicate that Data Science codebases are distinct from traditional software codebases and do not follow traditional software engineering conventions. Our conjecture is that this may be because traditional software engineering conventions are inappropriate in the context of Data Science projects.Comment: 11 pages, 7 figures. To appear in ESEM 2020. Updated based on peer revie

    Factors affecting Technical Debt Raw data from a systematic literature map

    Get PDF
    "This document presents the complete list of references that have been short listed during the systematic review process carried out during the months of April-September 2012. The objective of the systematic review was to identify current research trends in technical debt and to explore the relationship between technical debt measures and agile software development. This documents includes 352 references that are categorized according to their relevance to technical debt research." [Abstract

    Eliminating Code Duplication in Cascading Style Sheets

    Get PDF
    Cascading Style Sheets (i.e., CSS) is the standard styling language, widely used for defining the presentation semantics of user interfaces for web, mobile and desktop applications. Despite its popularity, CSS has not received much attention from academia. Indeed, developing and maintaining CSS code is rather challenging, due to the inherent language design shortcomings, the interplay of CSS with other programming languages (e.g., HTML and JavaScript), the lack of empirically- evaluated coding best-practices, and immature tool support. As a result, the quality of CSS code bases is poor in many cases. In this thesis, we focus on one of the major issues found in CSS code bases, i.e., the duplicated code. In a large, representative dataset of CSS code, we found an average of 68% duplication in style declarations. To alleviate this, we devise techniques for refactoring CSS code (i.e., grouping style declarations into new style rules), or migrating CSS code to take advantage of the code abstraction features provided by CSS preprocessor languages (i.e., superset languages for CSS that augment it by adding extra features that facilitate code maintenance). Specifically for the migration transformations, we attempt to align the resulting code with manually-developed code, by relying on the knowledge gained by conducting an empirical study on the use of CSS preprocessors, which revealed the common coding practices of the developers who use CSS preprocessor languages. To guarantee the behavior preservation of the proposed transformations, we come up with a list of preconditions that should be met, and also describe a lightweight testing technique. By applying a large number of transformations on several web sites and web applications, it is shown that the transformations are indeed presentation-preserving, and can effectively reduce the amount of duplicated code in CSS
    corecore