324 research outputs found
Profiling Developers Through the Lens of Technical Debt
Context: Technical Debt needs to be managed to avoid disastrous consequences,
and investigating developers' habits concerning technical debt management is
invaluable information in software development. Objective: This study aims to
characterize how developers manage technical debt based on the code smells they
induce and the refactorings they apply. Method: We mined a publicly-available
Technical Debt dataset for Git commit information, code smells, coding
violations, and refactoring activities for each developer of a selected
project. Results: By combining this information, we profile developers to
recognize prolific coders, highlight activities that discriminate among
developer roles (reviewer, lead, architect), and estimate coding maturity and
technical debt tolerance
Improving Software Project Health Using Machine Learning
In recent years, systems that would previously live on different platforms have been integrated under a single umbrella. The increased use of GitHub, which offers pull-requests, issue trackingand version history, and its integration with other solutions such as Gerrit, or Travis, as well as theresponse from competitors, created development environments that favour agile methodologiesby increasingly automating non-coding tasks: automated build systems, automated issue triagingetc. In essence, source-code hosting platforms shifted to continuous integration/continuousdelivery (CI/CD) as a service. This facilitated a shift in development paradigms, adherents ofagile methodology can now adopt a CI/CD infrastructure more easily. This has also created large,publicly accessible sources of source-code together with related project artefacts: GHTorrent andsimilar datasets now offer programmatic access to the whole of GitHub. Project health encompasses traceability, documentation, adherence to coding conventions,tasks that reduce maintenance costs and increase accountability, but may not directly impactfeatures. Overfocus on health can slow velocity (new feature delivery) so the Agile Manifestosuggests developers should travel light — forgo tasks focused on a project health in favourof higher feature velocity. Obviously, injudiciously following this suggestion can undermine aproject’s chances for success. Simultaneously, this shift to CI/CD has allowed the proliferation of Natural Language orNatural Language and Formal Language textual artefacts that are programmatically accessible:GitHub and their competitors allow API access to their infrastructure to enable the creation ofCI/CD bots. This suggests that approaches from Natural Language Processing and MachineLearning are now feasible and indeed desirable. This thesis aims to (semi-)automate tasks forthis new paradigm and its attendant infrastructure by bringing to the foreground the relevant NLPand ML techniques. Under this umbrella, I focus on three synergistic tasks from this domain: (1) improving theissue-pull-request traceability, which can aid existing systems to automatically curate the issuebacklog as pull-requests are merged; (2) untangling commits in a version history, which canaid the beforementioned traceability task as well as improve the usability of determining a faultintroducing commit, or cherry-picking via tools such as git bisect; (3) mixed-text parsing, to allowbetter API mining and open new avenues for project-specific code-recommendation tools
Nineteenth-Century Scientific American Illustrations and the Development of American Mechanical Drawing
This project, sponsored by the American Antiquarian Society, involves work on the database that allows researchers to search illustrations of inventions in the early volumes of Scientific American (1845-1869). This database has been actively developed in a series of WPI IQPs over the last twelve years. The current project focuses on improving the website user interface. This report also discusses the history of mechanical drawing pedagogy and the adoption of professional methods for producing mechanical drawings in America during the 1800s
Studying the Characteristics of AIOps Projects on GitHub
Artificial Intelligence for IT Operations (AIOps) leverages AI approaches to
handle the massive data generated during the operations of software systems.
Prior works have proposed various AIOps solutions to support different tasks in
system operations and maintenance (e.g., anomaly detection). In this work, we
investigate open-source AIOps projects in-depth to understand the
characteristics of AIOps in practice. We first carefully identify a set of
AIOps projects from GitHub and analyze their repository metrics (e.g., the used
programming languages). Then, we qualitatively study the projects to understand
their input data, analysis techniques, and goals. Finally, we analyze the
quality of these projects using different quality metrics, such as the number
of bugs. We also sample two sets of baseline projects from GitHub: a random
sample of machine learning projects, and a random sample of general purpose
projects. We compare different metrics of our identified AIOps projects with
these baselines. Our results show a recent and growing interest in AIOps
solutions. However, the quality metrics indicate that AIOps projects suffer
from more issues than our baseline projects. We also pinpoint the most common
issues in AIOps approaches and discuss the possible solutions to overcome them.
Our findings help practitioners and researchers understand the current state of
AIOps practices and sheds light to different ways to improve AIOps weak
aspects. To the best of our knowledge, this work is the first to characterize
open source AIOps projects.Comment: 31 pages, 6 pages of references, 8 figures, 12 table
On the Way to SBOMs: Investigating Design Issues and Solutions in Practice
Software Bill of Materials (SBOM), offers improved transparency and supply
chain security by providing a machine-readable inventory of software components
used. With the rise in software supply chain attacks, the SBOM has attracted
attention from both academia and industry. This paper presents a study on the
practice of SBOM, based on the analysis of 4,786 GitHub discussions from 510
SBOM-related projects. Our study identifies key topics, challenges, and
solutions associated with effective SBOM usage. We also highlight commonly used
tools and frameworks for generating SBOMs, along with their respective
strengths and limitations. Our research underscores the importance of SBOMs in
software development and the need for their widespread adoption to enhance
supply chain security. Additionally, the insights gained from our study can
inform future research and development in this field
A large-scale comparative analysis of Coding Standard conformance in Open-Source Data Science projects
Background: Meeting the growing industry demand for Data Science requires
cross-disciplinary teams that can translate machine learning research into
production-ready code. Software engineering teams value adherence to coding
standards as an indication of code readability, maintainability, and developer
expertise. However, there are no large-scale empirical studies of coding
standards focused specifically on Data Science projects. Aims: This study
investigates the extent to which Data Science projects follow code standards.
In particular, which standards are followed, which are ignored, and how does
this differ to traditional software projects? Method: We compare a corpus of
1048 Open-Source Data Science projects to a reference group of 1099 non-Data
Science projects with a similar level of quality and maturity. Results: Data
Science projects suffer from a significantly higher rate of functions that use
an excessive numbers of parameters and local variables. Data Science projects
also follow different variable naming conventions to non-Data Science projects.
Conclusions: The differences indicate that Data Science codebases are distinct
from traditional software codebases and do not follow traditional software
engineering conventions. Our conjecture is that this may be because traditional
software engineering conventions are inappropriate in the context of Data
Science projects.Comment: 11 pages, 7 figures. To appear in ESEM 2020. Updated based on peer
revie
- …