6 research outputs found

    Machine Learning at Microsoft with ML .NET

    Full text link
    Machine Learning is transitioning from an art and science into a technology available to every developer. In the near future, every application on every platform will incorporate trained models to encode data-based decisions that would be impossible for developers to author. This presents a significant engineering challenge, since currently data science and modeling are largely decoupled from standard software development processes. This separation makes incorporating machine learning capabilities inside applications unnecessarily costly and difficult, and furthermore discourage developers from embracing ML in first place. In this paper we present ML .NET, a framework developed at Microsoft over the last decade in response to the challenge of making it easy to ship machine learning models in large software applications. We present its architecture, and illuminate the application demands that shaped it. Specifically, we introduce DataView, the core data abstraction of ML .NET which allows it to capture full predictive pipelines efficiently and consistently across training and inference lifecycles. We close the paper with a surprisingly favorable performance study of ML .NET compared to more recent entrants, and a discussion of some lessons learned

    GPU ν™˜κ²½μ—μ„œ λ¨Έμ‹ λŸ¬λ‹ μ›Œν¬λ‘œλ“œμ˜ 효율적인 μ‹€ν–‰

    Get PDF
    ν•™μœ„λ…Όλ¬Έ(박사) -- μ„œμšΈλŒ€ν•™κ΅λŒ€ν•™μ› : κ³΅κ³ΌλŒ€ν•™ 컴퓨터곡학뢀, 2023. 2. 전병곀.Machine learning (ML) workloads are becoming increasingly important in many types of real-world applications. We attribute this trend to the development of software systems for ML, which have facilitated the widespread adoption of heterogeneous accelerators such as GPUs. Todays ML software stack has made great improvements in terms of efficiency, however, not all use cases are well supported. In this dissertation, we study how to improve execution efficiency of ML workloads on GPUs from a software system perspective. We identify workloads where current systems for ML have inefficiencies in utilizing GPUs and devise new system techniques that handle those workloads efficiently. We first present Nimble, a ML execution engine equipped with carefully optimized GPU scheduling. The proposed scheduling techniques can be used to improve execution efficiency by up to 22.34Γ—. Second, we propose Orca, an inference serving system specialized for Transformer-based generative models. By incorporating new scheduling and batching techniques, Orca significantly outperforms state-of-the-art systems – 36.9Γ— throughput improvement at the same level of latency. The last topic of this dissertation is WindTunnel, a framework that translates classical ML pipelines into neural networks, providing GPU training capabilities for classical ML workloads. WindTunnel also allows joint training of pipeline components via backpropagation, resulting in improved accuracy over the original pipeline and neural network baselines.졜근 κ²½ν–₯을 보면 λ‹€μ–‘ν•œ μ’…λ₯˜μ˜ μ• ν”Œλ¦¬μΌ€μ΄μ…˜μ—μ„œ λ¨Έμ‹  λŸ¬λ‹(ML) μ›Œν¬λ‘œλ“œκ°€ 점 점 더 μ€‘μš”ν•˜κ²Œ ν™œμš©λ˜κ³  μžˆλ‹€. μ΄λŠ” ML용 μ‹œμŠ€ν…œ μ†Œν”„νŠΈμ›¨μ–΄μ˜ κ°œλ°œμ„ 톡해 GPU 와 같은 이기쒅 κ°€μ†κΈ°μ˜ κ΄‘λ²”μœ„ν•œ ν™œμš©μ΄ κ°€λŠ₯ν•΄μ‘ŒκΈ° λ•Œλ¬Έμ΄λ‹€. λ§Žμ€ μ—°κ΅¬μžλ“€μ˜ 관심 덕에 ML용 μ‹œμŠ€ν…œ μ†Œν”„νŠΈμ›¨μ–΄ μŠ€νƒμ€ λΆ„λͺ… ν•˜λ£¨κ°€ λ‹€λ₯΄κ²Œ κ°œμ„ λ˜κ³  μžˆμ§€λ§Œ, μ—¬μ „νžˆ λͺ¨λ“  μ‚¬λ‘€μ—μ„œ 높은 νš¨μœ¨μ„±μ„ λ³΄μ—¬μ£Όμ§€λŠ” λͺ»ν•œλ‹€. 이 ν•™μœ„λ…Όλ¬Έμ—μ„œλŠ” μ‹œμŠ€ ν…œ μ†Œν”„νŠΈμ›¨μ–΄ κ΄€μ μ—μ„œ GPU ν™˜κ²½μ—μ„œ ML μ›Œν¬λ‘œλ“œμ˜ μ‹€ν–‰ νš¨μœ¨μ„±μ„ κ°œμ„ ν•˜λŠ” 방법을 μ—°κ΅¬ν•œλ‹€. κ΅¬μ²΄μ μœΌλ‘œλŠ” μ˜€λŠ˜λ‚ μ˜ ML용 μ‹œμŠ€ν…œμ΄ GPUλ₯Ό 효율적으둜 사 μš©ν•˜μ§€ λͺ»ν•˜λŠ” μ›Œν¬λ‘œλ“œλ₯Ό 규λͺ…ν•˜κ³  더 λ‚˜μ•„κ°€μ„œ ν•΄λ‹Ή μ›Œν¬λ‘œλ“œλ₯Ό 효율적으둜 μ²˜λ¦¬ν•  수 μžˆλŠ” μ‹œμŠ€ν…œ κΈ°μˆ μ„ κ³ μ•ˆν•˜λŠ” 것을 λͺ©ν‘œλ‘œ ν•œλ‹€. λ³Έ λ…Όλ¬Έμ—μ„œλŠ” λ¨Όμ € μ΅œμ ν™”λœ GPU μŠ€μΌ€μ€„λ§μ„ κ°–μΆ˜ ML μ‹€ν–‰ 엔진인 Nimble 을 μ†Œκ°œν•œλ‹€. μƒˆ μŠ€μΌ€μ€„λ§ 기법을 톡해 Nimble은 κΈ°μ‘΄ λŒ€λΉ„ GPU μ‹€ν–‰ νš¨μœ¨μ„± 을 μ΅œλŒ€ 22.34λ°°κΉŒμ§€ ν–₯μƒμ‹œν‚¬ 수 μžˆλ‹€. λ‘˜μ§Έλ‘œ Transformer 기반의 생성 λͺ¨λΈμ— νŠΉν™”λœ μΆ”λ‘  μ„œλΉ„μŠ€ μ‹œμŠ€ν…œ Orcaλ₯Ό μ œμ•ˆν•œλ‹€. μƒˆλ‘œμš΄ μŠ€μΌ€μ€„λ§ 및 batching κΈ° μˆ μ— νž˜μž…μ–΄, OrcaλŠ” λ™μΌν•œ μˆ˜μ€€μ˜ 지연 μ‹œκ°„μ„ κΈ°μ€€μœΌλ‘œ ν–ˆμ„ λ•Œ κΈ°μ‘΄ μ‹œμŠ€ν…œ λŒ€λΉ„ 36.9λ°° ν–₯μƒλœ μ²˜λ¦¬λŸ‰μ„ 보인닀. λ§ˆμ§€λ§‰μœΌλ‘œ 신경망을 μ‚¬μš©ν•˜μ§€ μ•ŠλŠ” κ³ μ „ ML νŒŒμ΄ν”„λΌμΈμ„ μ‹ κ²½λ§μœΌλ‘œ λ³€ν™˜ν•˜λŠ” ν”„λ ˆμž„μ›Œν¬ WindTunnel을 μ†Œκ°œν•œλ‹€. 이 λ₯Ό 톡해 κ³ μ „ ML νŒŒμ΄ν”„λΌμΈ ν•™μŠ΅μ„ GPUλ₯Ό μ‚¬μš©ν•΄ 진행할 수 있게 λœλ‹€. λ˜ν•œ WindTunnel은 gradient backpropagation을 톡해 νŒŒμ΄ν”„λΌμΈμ˜ μ—¬λŸ¬ μš”μ†Œλ₯Ό ν•œ λ²ˆμ— κ³΅λ™μœΌλ‘œ ν•™μŠ΅ ν•  수 있으며, 이λ₯Ό 톡해 νŒŒμ΄ν”„λΌμΈμ˜ 정확도λ₯Ό 더 ν–₯μƒμ‹œν‚¬ 수 μžˆμŒμ„ ν™•μΈν•˜μ˜€λ‹€.Chapter 1 Introduction 1 1.1 Motivation 1 1.2 Dissertation Overview 2 1.3 Previous Publications 4 1.4 Roadmap 5 Chapter 2 Background 6 2.1 ML Workloads 6 2.2 The GPU Execution Model 7 2.3 GPU Scheduling in ML Frameworks 8 2.4 Engine Scheduling in Inference Servers 10 2.5 Inference Procedure of Generative Models 11 Chapter 3 Nimble: Lightweight and Parallel GPU Task Scheduling for Deep Learning 17 3.1 Introduction 17 3.2 Motivation 21 3.3 System Design 24 3.3.1 Ahead-of-time (AoT) Scheduling 25 3.3.2 Stream Assignment Algorithm 28 3.4 Evaluation 32 3.4.1 Inference Latency 36 3.4.2 Impact of Multi-stream Execution 36 3.4.3 Training Throughput 37 3.5 Summary 38 Chapter 4 Orca: A Distributed Serving System for Transformer-Based Generative Models 40 4.1 Introduction 40 4.2 Challenges and Proposed Solutions 44 4.3 Orca System Design 51 4.3.1 Distributed Architecture 51 4.3.2 Scheduling Algorithm 54 4.4 Implementation 60 4.5 Evaluation 61 4.5.1 Engine Microbenchmark 63 4.5.2 End-to-end Performance 66 4.6 Summary 71 Chapter 5 WindTunnel: Towards Differentiable ML Pipelines Beyond a Single Model 72 5.1 Introduction 72 5.2 Pipeline Translation 78 5.2.1 Translating Arithmetic Operators 80 5.2.2 Translating Algorithmic Operators: GBDT 81 5.2.3 Translating Algorithmic Operators for Categorical Features 85 5.2.4 Fine-Tuning 87 5.3 Implementation 87 5.4 Experiments 88 5.4.1 Experimental Setup 89 5.4.2 Overall Performance 94 5.4.3 Ablation Study 95 5.5 Summary 98 Chapter 6 Related Work 99 Chapter 7 Conclusion 105 Bibliography 107 Appendix A Appendix: Nimble 131 A.1 Proofs on the Stream Assignment Algorithm of Nimble 131 A.1.1 Proof of Theorem 1 132 A.1.2 Proof of Theorem 2 134 A.1.3 Proof of Theorem 3 135 A.1.4 Time Complexity Analysis 137 A.2 Evaluation Results on Various GPUs 139 A.3 Evaluation Results on Different Training Batch Sizes 139λ°•

    Service Abstractions for Scalable Deep Learning Inference at the Edge

    Get PDF
    Deep learning driven intelligent edge has already become a reality, where millions of mobile, wearable, and IoT devices analyze real-time data and transform those into actionable insights on-device. Typical approaches for optimizing deep learning inference mostly focus on accelerating the execution of individual inference tasks, without considering the contextual correlation unique to edge environments and the statistical nature of learning-based computation. Specifically, they treat inference workloads as individual black boxes and apply canonical system optimization techniques, developed over the last few decades, to handle them as yet another type of computation-intensive applications. As a result, deep learning inference on edge devices still face the ever increasing challenges of customization to edge device heterogeneity, fuzzy computation redundancy between inference tasks, and end-to-end deployment at scale. In this thesis, we propose the first framework that automates and scales the end-to-end process of deploying efficient deep learning inference from the cloud to heterogeneous edge devices. The framework consists of a series of service abstractions that handle DNN model tailoring, model indexing and query, and computation reuse for runtime inference respectively. Together, these services bridge the gap between deep learning training and inference, eliminate computation redundancy during inference execution, and further lower the barrier for deep learning algorithm and system co-optimization. To build efficient and scalable services, we take a unique algorithmic approach of harnessing the semantic correlation between the learning-based computation. Rather than viewing individual tasks as isolated black boxes, we optimize them collectively in a white box approach, proposing primitives to formulate the semantics of the deep learning workloads, algorithms to assess their hidden correlation (in terms of the input data, the neural network models, and the deployment trials) and merge common processing steps to minimize redundancy

    Believability Assessment and Modelling in Video Games

    Get PDF
    Artificial Intelligence remains one of the most sought after subjects in computer science to this day. One of its subjects, and the focus of this thesis, is its application to video games as believable agents. This means focusing on implementing agents that behave like us rather than simply attempting to win, whether that means cooperating or competing like we do. Success in building more human-like characters can enhance immersion and enjoyment in games, thus potentially increasing its gameplay value. Ultimately, bringing benefits to the industry and academia. However, believability is a hard concept to define. It depends on how and what one considers to be ``believable'', which is often very subjective. This means that developing believable agents remains a sought out, albeit difficult, challenge. There are many approaches to development ranging from finite state machines or imitation learning to emotional models, with no single solution to creating a human-like agent. This problems remains when attempting to assess these solutions as well. Assessing the believability of agents, characters and simulated actors is also a core challenge for human-like behaviour. While numerous approaches are suggested in the literature, there is not a dominant solution for evaluation either. In addition, assessment rarely receives as much attention as development or modelling do. Mostly, it comes as a necessity of evaluating agents rather than focusing on how its process could affect the outcome of the evaluation itself. This thesis takes a different approach to developing believability and its assessment. For starters, it explores assessment first. In previous years, several researchers have tried to find ways of assessing human-like behaviour in games through adaptations of Turing Tests on their agents. Given the small pool of diversity of the explored parameters in believability assessment and a focus on programming the bots, this thesis starts by exploring different parameters for evaluating believability in video games. The objective of this work is to analyze the different ways believability can be assessed, for humans and non-player characters (NPCs) by comparing how results between them and scores are affected in both when changing the parameters. This thesis also explores the concept of believability and its need in video games in general. Another aspect of assessment explored in this thesis is believability's overall representation. Past research shows methodologies being limited to discrete and low-granularity representations of believable behaviour. This work will focus, for the first time, in viewing believability as a time-continuous phenomenon and explore the suitability of two different affect annotation schemes for its assessment. These techniques are also compared to previously used discrete methodologies, to understand how moment-to-moment assessment can contribute to these. In addition, this thesis studies the degree to which we can predict character believability in a continuous fashion. This is achieved by training random forest models to predict believability based on annotations of the context extracted from a game. It is then that this thesis tackles development. For this work, different solutions are combined into one and in a different order: this time-continuous data based on peoples' assessment of believability is modelled and integrated into a game agent to affect its behaviour. This results in a final comparison between two agents, where one uses a believability biased model and the other does not. Showing that biasing agents' behaviour with assessment data can increase their overall believability
    corecore