20 research outputs found

    Training future ML engineers: a project-based course on MLOps

    Get PDF
    Recently, the proliferation of commercial ML-based services has given rise to new job roles, such as ML engineers. Despite being highly sought-after in the job market, ML engineers are difficult to recruit, possibly due to the lack of specialized academic curricula for this position at universities. To address this gap, in the past two years, we have supplemented traditional Computer Science and Data Science university courses with a project-based course on MLOps focused on the fundamental skills required of ML engineers. In this paper, we present an overview of the course by showcasing a couple of sample projects developed by our students. Additionally, we share the lessons learned from offering the course at two different institutions.This work is partially supported by the NRRP Initiative – Next Generation EU ("FAIR - Future Artificial Intelligence Research", code PE00000013, CUP H97G22000210007); the Complementary National Plan PNC-I.1 ("DARE - DigitAl lifelong pRevEntion initiative", code PNC0000002, CUP B53C22006420001), and the project TED2021- 130923B-I00, funded by MCIN/AEI/10.13039/50110001 1033 and the European Union Next Generation EU/PRTR.Peer ReviewedPostprint (author's final draft

    CiTAR - Preserving Software-based Research

    Get PDF
    In contrast to books or published articles, pure digital output of research projects is more fragile and, thus, more difficult to preserve and more difficult to be made available and to be reused by a wider research community. Not only does a fast-growing format diversity in research data sets require additional software preservation but also today’s computer assisted research disciplines increasingly devote significant resources into creating new digital resources and software-based methods. In order to adapt FAIR data principles, especially to ensure re-usability of a wide variety of research outputs, novel ways for preservation of software and additional digital resources are required as well as their integration into existing research data management strategies. This article addresses preservation challenges and preservation options of containers and virtual machines to encapsulate software-based research methods as portable and preservable software-based research resources, provides a preservation plan as well as an implementation. &nbsp

    Restoring Execution Environments of Jupyter Notebooks

    Get PDF
    More than ninety percent of published Jupyter notebooks do not state dependencies on external packages. This makes them non-executable and thus hinders reproducibility of scientific results. We present SnifferDog, an approach that 1) collects the APIs of Python packages and versions, creating a database of APIs; 2) analyzes notebooks to determine candidates for required packages and versions; and 3) checks which packages are required to make the notebook executable (and ideally, reproduce its stored results). In its evaluation, we show that SnifferDog precisely restores execution environments for the largest majority of notebooks, making them immediately executable for end users.Comment: to be published in the 43rd ACM/IEEE International Conference on Software Engineering (ICSE 2021

    MIDST: an enhanced development environment that improves the maintainability of a data science analysis

    Get PDF
    With the increasing ability to generate actionable insight from data, the field of data science has seen significant growth. As more teams develop data science solutions, the analytical code they develop will need to be enhanced in the future, by an existing or a new team member. Thus, the importance of being able to easily maintain and enhance the code required for an analysis will increase. However, to date, there has been minimal research on the maintainability of an analysis done by a data science team. To help address this gap, data science maintainability was explored by (1) creating a data science maintainability model, (2) creating a new tool, called MIDST (Modular Interactive Data Science Tool), that aims to improve data science maintainability, and then (3) conducting a mixed method experiment to evaluate MIDST. The new tool aims to improve the ability of a team member to update and rerun an existing data science analysis by providing a visual data flow view of the analysis within an integrated code and computational environment. Via an analysis of the quantitative and qualitative survey results, the experiment found that MIDST does help improve the maintainability of an analysis. Thus, this research demonstrates the importance of enhanced tools to help improve the maintainability of data science projects

    Building a Culture of Reproducibility in Academic Research

    Full text link
    Reproducibility is an ideal that no researcher would dispute "in the abstract", but when aspirations meet the cold hard reality of the academic grind, reproducibility often "loses out". In this essay, I share some personal experiences grappling with how to operationalize reproducibility while balancing its demands against other priorities. My research group has had some success building a "culture of reproducibility" over the past few years, which I attempt to distill into lessons learned and actionable advice, organized around answering three questions: why, what, and how. I believe that reproducibility efforts should yield easy-to-use, well-packaged, and self-contained software artifacts that allow others to reproduce and generalize research findings. At the core, my approach centers on self interest: I argue that the primary beneficiaries of reproducibility efforts are, in fact, those making the investments. I believe that (unashamedly) appealing to self interest, augmented with expectations of reciprocity, increases the chances of success. Building from repeatability, social processes and standardized tools comprise the two important additional ingredients that help achieve aspirational ideals. The dogfood principle nicely ties these ideas together
    corecore