1,196 research outputs found

    Tackling Version Management and Reproducibility in MLOps

    Get PDF
    A crescente adoção de soluções baseadas em machine learning (ML) exige avanços na aplicação das melhores práticas para manter estes sistemas em produção. Operações de machine learning (MLOps) incorporam princípios de automação contínua ao desenvolvimento de modelos de ML, promovendo entrega, monitoramento e treinamento contínuos. Devido a vários fatores, como a natureza experimental do desenvolvimento de modelos de ML ou a necessidade de otimizações derivadas de mudanças nas necessidades de negócios, espera-se que os cientistas de dados criem vários experimentos para desenvolver um modelo ou preditor que atenda satisfatoriamente aos principais desafios de um dado problema. Como a reavaliação de modelos é uma necessidade constante, metadados são constantemente produzidos devido a várias execuções de experimentos. Esses metadados são conhecidos como artefatos ou ativos de ML. A linhagem adequada entre esses artefatos possibilita a recriação do ambiente em que foram desenvolvidos, facilitando a reprodutibilidade do modelo. Vincular informações de experimentos, modelos, conjuntos de dados, configurações e alterações de código requer organização, rastreamento, manutenção e controle de versão adequados. Este trabalho investigará as melhores práticas, problemas atuais e desafios relacionados ao gerenciamento e versão de artefatos e aplicará esse conhecimento para desenvolver um fluxo de trabalho que suporte a engenharia e operacionalização de ML, aplicando princípios de MLOps que facilitam a reprodutibilidade dos modelos. Cenários cobrindo preparação de dados, geração de modelo, comparação entre versões de modelo, implantação, monitoramento, depuração e re-treinamento demonstraram como as estruturas e ferramentas selecionadas podem ser integradas para atingir esse objetivo.The growing adoption of machine learning solutions requires advancements in applying best practices to maintain artificial intelligence systems in production. Machine Learning Operations (MLOps) incorporates DevOps principles into machine learning development, promoting automation, continuous delivery, monitoring, and training capabilities. Due to multiple factors, such as the experimental nature of the machine learning process or the need for model optimizations derived from changes in business needs, data scientists are expected to create multiple experiments to develop a model or predictor that satisfactorily addresses the main challenges of a given problem. Since the re-evaluation of models is a constant need, metadata is constantly produced due to multiple experiment runs. This metadata is known as ML artifacts or assets. The proper lineage between these artifacts enables environment recreation, facilitating model reproducibility. Linking information from experiments, models, datasets, configurations, and code changes requires proper organization, tracking, maintenance, and version control of these artifacts. This work will investigate the best practices, current issues, and open challenges related to artifact versioning and management and apply this knowledge to develop an ML workflow that supports ML engineering and operationalization, applying MLOps principles that facilitate model reproducibility. Scenarios covering data preparation, model generation, comparison between model versions, deployment, monitoring, debugging, and retraining demonstrated how the selected frameworks and tools could be integrated to achieve that goal

    Development of a Machine Learning Platform

    Get PDF
    Adoption of machine learning is becoming widespread, thus, it is natural to see a more comprehensive adoption of this technology by companies to, not only to enhance their products and services, but also to offer greater market competitiveness. Having said that, and attending to this new paradigm, the present dissertation is focused on the implementation of a platform to optimize and enhance the development of projects in the area of machine learning. This challenge arises from a proposal put forward by company GMV, which aims to make the machine learning process more accessible and intuitive for its workers and, in parallel, to ensure high levels of consistency and productivity in the development of its projects. Based on all these assumptions, a first approach is made in this dissertation, laying both on how a machine learning project is organized as well as on the problems that arise throughout its development. First, a study was made of the functioning of some platforms already present in the market, in order to understand which problems they intend to solve and which solution or solutions have been developed to address them. Then, the characteristics to be integrated in the platform were identified. The study and comparison of some technologies present in the market allowed us to select and implement the most promising ones regarding the characteristics previously identified. Finally, the proposed solution is presented, explaining both the functioning of the platform and the options taken throughout its development.Numa altura em que se preconiza a adoção, cada vez mais generalizada, da aprendi- zagem automática, é com naturalidade que se assiste a uma adesão mais abrangente por parte das empresas a esta tecnologia. Não só para potenciar os seus produtos e serviços, mas também porque oferece uma maior competitividade no mercado. Posto isto, e aten- dendo a todo este novo paradigma, surge a presente dissertação, que tem como foco o desenvolvimento de uma plataforma que permita otimizar e potenciar o desenvolvimento de projetos na área de inteligência artificial. Este desafio surgiu de uma proposta apre- sentada pela empresa GMV, que pretende tornar o processo de aprendizagem automática mais acessível e intuitivo para os seus trabalhadores e, paralelamente, assegurar níveis elevados de consistência e produtividade no desenvolvimento dos seus projetos. Partindo de todos estes pressupostos, nesta dissertação foi feita uma primeira aborda- gem, quer sobre como é organizado um projeto de aprendizagem automática, quer aos problemas que existem ao longo do seu desenvolvimento. Posteriormente, foi feito um estudo do funcionamento de algumas plataformas já presentes no mercado, por forma a compreender quais os problemas que pretendem resolver e qual a solução ou soluções desenvolvidas para os colmatar. Feita esta análise, prosseguiu-se com a identificação das características a integrar na plataforma. Após este passo, seguiu-se o estudo e comparação de algumas tecnologias presentes no mercado tendo em vista a implementação das mais promissoras e que contemplassem as características identificadas previamente. Por fim, é apresentada a solução proposta, com a explicação quer do funcionamento da plataforma, quer das opções tomadas ao longo do seu desenvolvimento

    The Pipeline for the Continuous Development of Artificial Intelligence Models -- Current State of Research and Practice

    Full text link
    Companies struggle to continuously develop and deploy AI models to complex production systems due to AI characteristics while assuring quality. To ease the development process, continuous pipelines for AI have become an active research area where consolidated and in-depth analysis regarding the terminology, triggers, tasks, and challenges is required. This paper includes a Multivocal Literature Review where we consolidated 151 relevant formal and informal sources. In addition, nine-semi structured interviews with participants from academia and industry verified and extended the obtained information. Based on these sources, this paper provides and compares terminologies for DevOps and CI/CD for AI, MLOps, (end-to-end) lifecycle management, and CD4ML. Furthermore, the paper provides an aggregated list of potential triggers for reiterating the pipeline, such as alert systems or schedules. In addition, this work uses a taxonomy creation strategy to present a consolidated pipeline comprising tasks regarding the continuous development of AI. This pipeline consists of four stages: Data Handling, Model Learning, Software Development and System Operations. Moreover, we map challenges regarding pipeline implementation, adaption, and usage for the continuous development of AI to these four stages.Comment: accepted in the Journal Systems and Softwar

    Training future ML engineers: a project-based course on MLOps

    Get PDF
    Recently, the proliferation of commercial ML-based services has given rise to new job roles, such as ML engineers. Despite being highly sought-after in the job market, ML engineers are difficult to recruit, possibly due to the lack of specialized academic curricula for this position at universities. To address this gap, in the past two years, we have supplemented traditional Computer Science and Data Science university courses with a project-based course on MLOps focused on the fundamental skills required of ML engineers. In this paper, we present an overview of the course by showcasing a couple of sample projects developed by our students. Additionally, we share the lessons learned from offering the course at two different institutions.This work is partially supported by the NRRP Initiative – Next Generation EU ("FAIR - Future Artificial Intelligence Research", code PE00000013, CUP H97G22000210007); the Complementary National Plan PNC-I.1 ("DARE - DigitAl lifelong pRevEntion initiative", code PNC0000002, CUP B53C22006420001), and the project TED2021- 130923B-I00, funded by MCIN/AEI/10.13039/50110001 1033 and the European Union Next Generation EU/PRTR.Peer ReviewedPostprint (author's final draft

    Operationalizing Machine Learning: An Interview Study

    Full text link
    Organizations rely on machine learning engineers (MLEs) to operationalize ML, i.e., deploy and maintain ML pipelines in production. The process of operationalizing ML, or MLOps, consists of a continual loop of (i) data collection and labeling, (ii) experimentation to improve ML performance, (iii) evaluation throughout a multi-staged deployment process, and (iv) monitoring of performance drops in production. When considered together, these responsibilities seem staggering -- how does anyone do MLOps, what are the unaddressed challenges, and what are the implications for tool builders? We conducted semi-structured ethnographic interviews with 18 MLEs working across many applications, including chatbots, autonomous vehicles, and finance. Our interviews expose three variables that govern success for a production ML deployment: Velocity, Validation, and Versioning. We summarize common practices for successful ML experimentation, deployment, and sustaining production performance. Finally, we discuss interviewees' pain points and anti-patterns, with implications for tool design.Comment: 20 pages, 4 figure
    corecore