1,196 research outputs found
Tackling Version Management and Reproducibility in MLOps
A crescente adoção de soluções baseadas em machine learning (ML) exige avanços na aplicação das melhores práticas para manter estes sistemas em produção. Operações de machine learning (MLOps) incorporam princípios de automação contínua ao desenvolvimento de modelos de ML, promovendo entrega, monitoramento e treinamento contínuos. Devido a vários fatores, como a natureza experimental do desenvolvimento de modelos de ML ou a necessidade de otimizações derivadas de mudanças nas necessidades de negócios, espera-se que os cientistas de dados criem
vários experimentos para desenvolver um modelo ou preditor que atenda satisfatoriamente aos principais desafios de um dado problema.
Como a reavaliação de modelos é uma necessidade constante, metadados são constantemente produzidos devido a várias execuções de experimentos. Esses metadados são conhecidos como artefatos ou ativos de ML. A linhagem adequada entre esses artefatos possibilita a recriação do ambiente em que foram desenvolvidos, facilitando a reprodutibilidade do modelo. Vincular informações de experimentos, modelos, conjuntos de dados, configurações e alterações de código requer organização, rastreamento, manutenção e controle de versão adequados.
Este trabalho investigará as melhores práticas, problemas atuais e desafios relacionados ao gerenciamento e versão de artefatos e aplicará esse conhecimento para desenvolver um fluxo de trabalho que suporte a engenharia e operacionalização de ML, aplicando princípios de MLOps que facilitam a reprodutibilidade dos modelos. Cenários cobrindo preparação de dados, geração de modelo, comparação entre versões de modelo, implantação, monitoramento, depuração e re-treinamento demonstraram como as estruturas e ferramentas selecionadas podem ser integradas para atingir esse objetivo.The growing adoption of machine learning solutions requires advancements in applying best practices to maintain artificial intelligence systems in production. Machine Learning Operations (MLOps) incorporates DevOps principles into machine learning development, promoting automation, continuous delivery, monitoring, and training capabilities. Due to multiple factors, such as the experimental nature of the machine learning process or the need for model optimizations derived from changes in business needs, data scientists are expected to create multiple experiments to develop a model or predictor that satisfactorily addresses the main challenges of a given problem.
Since the re-evaluation of models is a constant need, metadata is constantly produced due to multiple experiment runs. This metadata is known as ML artifacts or assets. The proper lineage between these artifacts enables environment recreation, facilitating model reproducibility. Linking information from experiments, models, datasets, configurations, and code changes requires proper organization, tracking, maintenance, and version control of these artifacts.
This work will investigate the best practices, current issues, and open challenges related
to artifact versioning and management and apply this knowledge to develop an ML workflow that supports ML engineering and operationalization, applying MLOps principles that facilitate model reproducibility. Scenarios covering data preparation, model generation, comparison between model versions, deployment, monitoring, debugging, and retraining demonstrated how the selected frameworks and tools could be integrated to achieve that goal
Development of a Machine Learning Platform
Adoption of machine learning is becoming widespread, thus, it is natural to see a
more comprehensive adoption of this technology by companies to, not only to enhance
their products and services, but also to offer greater market competitiveness. Having
said that, and attending to this new paradigm, the present dissertation is focused on the
implementation of a platform to optimize and enhance the development of projects in the
area of machine learning. This challenge arises from a proposal put forward by company
GMV, which aims to make the machine learning process more accessible and intuitive for
its workers and, in parallel, to ensure high levels of consistency and productivity in the
development of its projects.
Based on all these assumptions, a first approach is made in this dissertation, laying
both on how a machine learning project is organized as well as on the problems that
arise throughout its development. First, a study was made of the functioning of some
platforms already present in the market, in order to understand which problems they
intend to solve and which solution or solutions have been developed to address them.
Then, the characteristics to be integrated in the platform were identified.
The study and comparison of some technologies present in the market allowed us to
select and implement the most promising ones regarding the characteristics previously
identified. Finally, the proposed solution is presented, explaining both the functioning
of the platform and the options taken throughout its development.Numa altura em que se preconiza a adoção, cada vez mais generalizada, da aprendi-
zagem automática, é com naturalidade que se assiste a uma adesão mais abrangente por
parte das empresas a esta tecnologia. Não só para potenciar os seus produtos e serviços,
mas também porque oferece uma maior competitividade no mercado. Posto isto, e aten-
dendo a todo este novo paradigma, surge a presente dissertação, que tem como foco o
desenvolvimento de uma plataforma que permita otimizar e potenciar o desenvolvimento
de projetos na área de inteligência artificial. Este desafio surgiu de uma proposta apre-
sentada pela empresa GMV, que pretende tornar o processo de aprendizagem automática
mais acessível e intuitivo para os seus trabalhadores e, paralelamente, assegurar níveis
elevados de consistência e produtividade no desenvolvimento dos seus projetos.
Partindo de todos estes pressupostos, nesta dissertação foi feita uma primeira aborda-
gem, quer sobre como é organizado um projeto de aprendizagem automática, quer aos
problemas que existem ao longo do seu desenvolvimento. Posteriormente, foi feito um
estudo do funcionamento de algumas plataformas já presentes no mercado, por forma a
compreender quais os problemas que pretendem resolver e qual a solução ou soluções
desenvolvidas para os colmatar. Feita esta análise, prosseguiu-se com a identificação das
características a integrar na plataforma.
Após este passo, seguiu-se o estudo e comparação de algumas tecnologias presentes
no mercado tendo em vista a implementação das mais promissoras e que contemplassem
as características identificadas previamente. Por fim, é apresentada a solução proposta,
com a explicação quer do funcionamento da plataforma, quer das opções tomadas ao
longo do seu desenvolvimento
The Pipeline for the Continuous Development of Artificial Intelligence Models -- Current State of Research and Practice
Companies struggle to continuously develop and deploy AI models to complex
production systems due to AI characteristics while assuring quality. To ease
the development process, continuous pipelines for AI have become an active
research area where consolidated and in-depth analysis regarding the
terminology, triggers, tasks, and challenges is required. This paper includes a
Multivocal Literature Review where we consolidated 151 relevant formal and
informal sources. In addition, nine-semi structured interviews with
participants from academia and industry verified and extended the obtained
information. Based on these sources, this paper provides and compares
terminologies for DevOps and CI/CD for AI, MLOps, (end-to-end) lifecycle
management, and CD4ML. Furthermore, the paper provides an aggregated list of
potential triggers for reiterating the pipeline, such as alert systems or
schedules. In addition, this work uses a taxonomy creation strategy to present
a consolidated pipeline comprising tasks regarding the continuous development
of AI. This pipeline consists of four stages: Data Handling, Model Learning,
Software Development and System Operations. Moreover, we map challenges
regarding pipeline implementation, adaption, and usage for the continuous
development of AI to these four stages.Comment: accepted in the Journal Systems and Softwar
Training future ML engineers: a project-based course on MLOps
Recently, the proliferation of commercial ML-based services has given rise to new job roles, such as ML engineers. Despite being highly sought-after in the job market, ML engineers are difficult to recruit, possibly due to the lack of specialized academic curricula for this position at universities. To address this gap, in the past two years, we have supplemented traditional Computer Science and Data Science university courses with a project-based course on MLOps focused on the fundamental skills required of ML engineers. In this paper, we present an overview of the course by showcasing a couple of sample projects developed by our students. Additionally, we share the lessons learned from offering the course at two different institutions.This work is partially supported by the NRRP Initiative – Next Generation EU ("FAIR - Future Artificial Intelligence Research", code PE00000013, CUP H97G22000210007); the Complementary National Plan PNC-I.1 ("DARE - DigitAl lifelong pRevEntion initiative", code PNC0000002, CUP B53C22006420001), and the project TED2021- 130923B-I00, funded by MCIN/AEI/10.13039/50110001 1033 and the European Union Next Generation EU/PRTR.Peer ReviewedPostprint (author's final draft
Operationalizing Machine Learning: An Interview Study
Organizations rely on machine learning engineers (MLEs) to operationalize ML,
i.e., deploy and maintain ML pipelines in production. The process of
operationalizing ML, or MLOps, consists of a continual loop of (i) data
collection and labeling, (ii) experimentation to improve ML performance, (iii)
evaluation throughout a multi-staged deployment process, and (iv) monitoring of
performance drops in production. When considered together, these
responsibilities seem staggering -- how does anyone do MLOps, what are the
unaddressed challenges, and what are the implications for tool builders?
We conducted semi-structured ethnographic interviews with 18 MLEs working
across many applications, including chatbots, autonomous vehicles, and finance.
Our interviews expose three variables that govern success for a production ML
deployment: Velocity, Validation, and Versioning. We summarize common practices
for successful ML experimentation, deployment, and sustaining production
performance. Finally, we discuss interviewees' pain points and anti-patterns,
with implications for tool design.Comment: 20 pages, 4 figure
- …