83 research outputs found

    Robust health stream processing

    Get PDF
    2014 Fall.Includes bibliographical references.As the cost of personal health sensors decrease along with improvements in battery life and connectivity, it becomes more feasible to allow patients to leave full-time care environments sooner. Such devices could lead to greater independence for the elderly, as well as for others who would normally require full-time care. It would also allow surgery patients to spend less time in the hospital, both pre- and post-operation, as all data could be gathered via remote sensors in the patients home. While sensor technology is rapidly approaching the point where this is a feasible option, we still lack in processing frameworks which would make such a leap not only feasible but safe. This work focuses on developing a framework which is robust to both failures of processing elements as well as interference from other computations processing health sensor data. We work with 3 disparate data streams and accompanying computations: electroencephalogram (EEG) data gathered for a brain-computer interface (BCI) application, electrocardiogram (ECG) data gathered for arrhythmia detection, and thorax data gathered from monitoring patient sleep status

    The parallel event loop model and runtime: a parallel programming model and runtime system for safe event-based parallel programming

    Get PDF
    Recent trends in programming models for server-side development have shown an increasing popularity of event-based single- threaded programming models based on the combination of dynamic languages such as JavaScript and event-based runtime systems for asynchronous I/O management such as Node.JS. Reasons for the success of such models are the simplicity of the single-threaded event-based programming model as well as the growing popularity of the Cloud as a deployment platform for Web applications. Unfortunately, the popularity of single-threaded models comes at the price of performance and scalability, as single-threaded event-based models present limitations when parallel processing is needed, and traditional approaches to concurrency such as threads and locks don't play well with event-based systems. This dissertation proposes a programming model and a runtime system to overcome such limitations by enabling single-threaded event-based applications with support for speculative parallel execution. The model, called Parallel Event Loop, has the goal of bringing parallel execution to the domain of single-threaded event-based programming without relaxing the main characteristics of the single-threaded model, and therefore providing developers with the impression of a safe, single-threaded, runtime. Rather than supporting only pure single-threaded programming, however, the parallel event loop can also be used to derive safe, high-level, parallel programming models characterized by a strong compatibility with single-threaded runtimes. We describe three distinct implementations of speculative runtimes enabling the parallel execution of event-based applications. The first implementation we describe is a pessimistic runtime system based on locks to implement speculative parallelization. The second and the third implementations are based on two distinct optimistic runtimes using software transactional memory. Each of the implementations supports the parallelization of applications written using an asynchronous single-threaded programming style, and each of them enables applications to benefit from parallel execution

    The parallel event loop model and runtime: a parallel programming model and runtime system for safe event-based parallel programming

    Get PDF
    Recent trends in programming models for server-side development have shown an increasing popularity of event-based single- threaded programming models based on the combination of dynamic languages such as JavaScript and event-based runtime systems for asynchronous I/O management such as Node.JS. Reasons for the success of such models are the simplicity of the single-threaded event-based programming model as well as the growing popularity of the Cloud as a deployment platform for Web applications. Unfortunately, the popularity of single-threaded models comes at the price of performance and scalability, as single-threaded event-based models present limitations when parallel processing is needed, and traditional approaches to concurrency such as threads and locks don't play well with event-based systems. This dissertation proposes a programming model and a runtime system to overcome such limitations by enabling single-threaded event-based applications with support for speculative parallel execution. The model, called Parallel Event Loop, has the goal of bringing parallel execution to the domain of single-threaded event-based programming without relaxing the main characteristics of the single-threaded model, and therefore providing developers with the impression of a safe, single-threaded, runtime. Rather than supporting only pure single-threaded programming, however, the parallel event loop can also be used to derive safe, high-level, parallel programming models characterized by a strong compatibility with single-threaded runtimes. We describe three distinct implementations of speculative runtimes enabling the parallel execution of event-based applications. The first implementation we describe is a pessimistic runtime system based on locks to implement speculative parallelization. The second and the third implementations are based on two distinct optimistic runtimes using software transactional memory. Each of the implementations supports the parallelization of applications written using an asynchronous single-threaded programming style, and each of them enables applications to benefit from parallel execution

    Methods to Improve Applicability and Efficiency of Distributed Data-Centric Compute Frameworks

    Get PDF
    The success of modern applications depends on the insights they collect from their data repositories. Data repositories for such applications currently exceed exabytes and are rapidly increasing in size, as they collect data from varied sources - web applications, mobile phones, sensors and other connected devices. Distributed storage and data-centric compute frameworks have been invented to store and analyze these large datasets. This dissertation focuses on extending the applicability and improving the efficiency of distributed data-centric compute frameworks

    3rd Many-core Applications Research Community (MARC) Symposium. (KIT Scientific Reports ; 7598)

    Get PDF
    This manuscript includes recent scientific work regarding the Intel Single Chip Cloud computer and describes approaches for novel approaches for programming and run-time organization

    Distributed Sparse Computing and Communication for Big Graph Analytics and Deep Learning

    Get PDF
    Sparsity can be found in the underlying structure of many real-world computationally expensive problems including big graph analytics and large scale sparse deep neural networks. In addition, if gracefully investigated, many of these problems contain a broad substratum of parallelism suitable for parallel and distributed executions of sparse computation. However, usually, dense computation is preferred to its sparse alternative as sparse computation is not only hard to parallelize due to the irregular nature of the sparse data, but also complicated to implement in terms of rewriting a dense algorithm into a sparse one. Hence, foolproof sparse computation requires customized data structures to encode the sparsity of the sparse data and new algorithms to mask the complexity of the sparse computation. However, by carefully exploiting the sparse data structures and algorithms, sparse computation can reduce memory consumption, communication volume, and processing power and thus undoubtedly move the scalability boundaries compared to its dense equivalent. In this dissertation, I explain how to use parallel and distributed computing techniques in the presence of sparsity to solve large scientific problems including graph analytics and deep learning. To meet this end goal, I leverage the duality between graph theory and sparse linear algebra primitives, and thus solve graph analytics and deep learning problems with the sparse matrix operations. My contributions are fourfold: (1) design and implementation of a new distributed compressed sparse matrix data structure that reduces both computation and communication volumes and is suitable for sparse matrix-vector and sparse matrix-matrix operations, (2) introducing the new MPI*X parallelism model that deems threads as basic units of computing and communication, (3) optimizing sparse matrix-matrix multiplication by employing different hashing techniques, and (4) proposing the new data-then-model parallelism that mitigates the effect of stragglers in sparse deep learning by combining data and model parallelisms. Altogether, these contributions provide a set of data structures and algorithms to accelerate and scale the sparse computing and communication

    Inter-workgroup barrier synchronisation on graphics processing units

    Get PDF
    GPUs are parallel devices that are able to run thousands of independent threads concurrently. Traditional GPU programs are data-parallel, requiring little to no communication, i.e. synchronisation, between threads. However, classical concurrency in the context of CPUs often exploits synchronisation idioms that are not supported on GPUs. By studying such idioms on GPUs, with an aim to facilitate them in a portable way, a wider and more generic space of GPU applications can be made possible. While the breadth of this thesis extends to many aspects of GPU systems, the common thread throughout is the global barrier: an execution barrier that synchronises all threads executing a GPU application. The idea of such a barrier might seem straightforward, however this investigation reveals many challenges and insights. In particular, this thesis includes the following studies: Execution models: while a general global barrier can deadlock due to starvation on GPUs, it is shown that the scheduling guarantees of current GPUs can be used to dynamically create an execution environment that allows for a safe and portable global barrier across a subset of the GPU threads. Application optimisations: a set GPU optimisations are examined that are tailored for graph applications, including one optimisation enabled by the global barrier. It is shown that these optimisations can provided substantial performance improvements, e.g. the barrier optimisation achieves over a 10X speedup on AMD and Intel GPUs. The performance portability of these optimisations is investigated, as their utility varies across input, application, and architecture. Multitasking: because many GPUs do not support preemption, long-running GPU compute tasks (e.g. applications that use the global barrier) may block other GPU functions, including graphics. A simple cooperative multitasking scheme is proposed that allows graphics tasks to meet their deadlines with reasonable overheads.Open Acces

    ๋ถ„์‚ฐ ๊ธฐ๊ณ„ ํ•™์Šต์˜ ์ž์› ํšจ์œจ์ ์ธ ์ˆ˜ํ–‰์„ ์œ„ํ•œ ๋™์  ์ตœ์ ํ™” ๊ธฐ์ˆ 

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ(๋ฐ•์‚ฌ)--์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› :๊ณต๊ณผ๋Œ€ํ•™ ์ปดํ“จํ„ฐ๊ณตํ•™๋ถ€,2020. 2. ์ „๋ณ‘๊ณค.Machine Learning(ML) systems are widely used to extract insights from data. Ever increasing dataset sizes and model complexity gave rise to many efforts towards ef๏ฌcient distributed machine learning systems. One of the popular approaches to support large scale data and complicated models is the parameter server (PS) approach. In this approach, a training job runs with distributed worker and server tasks, where workers iteratively compute gradients to update the global model parameters that are kept in servers. To improve the PS system performance, this dissertation proposes two solutions that automatically optimize resource ef๏ฌciency and system performance. First, we propose a solution that optimizes the resource con๏ฌguration and workload partitioning of distributed ML training on PS system. To ๏ฌnd the best con๏ฌguration, we build an Optimizer based on a cost model that works with online metrics. To ef๏ฌciently apply decisions by Optimizer, we design our runtime elastic to perform recon๏ฌguration in the background with minimal overhead. The second solution optimizes the scheduling of resources and tasks of multiple ML training jobs in a shared cluster. Speci๏ฌcally, we co-locate jobs with complementary resource use to increase resource utilization, while executing their tasks with ๏ฌne-grained unit to avoid resource contention. To alleviate memory pressure by co-located jobs, we enable dynamic spill/reload of data, which adaptively changes the ratio of data between disk and memory. We build a working system that implements our approaches. The above two solutions are implemented in the same system and share the runtime part that can dynamically migrate jobs between machines and reallocate machine resources. We evaluate our system with popular ML applications to verify the effectiveness of our solutions.๊ธฐ๊ณ„ ํ•™์Šต ์‹œ์Šคํ…œ์€ ๋ฐ์ดํ„ฐ์— ์ˆจ๊ฒจ์ง„ ์˜๋ฏธ๋ฅผ ๋ฝ‘์•„๋‚ด๊ธฐ ์œ„ํ•ด ๋„๋ฆฌ ์‚ฌ์šฉ๋˜๊ณ  ์žˆ๋‹ค. ๋ฐ์ดํ„ฐ์…‹์˜ ํฌ๊ธฐ์™€ ๋ชจ๋ธ์˜ ๋ณต์žก๋„๊ฐ€ ์–ด๋Š๋•Œ๋ณด๋‹ค ์ปค์ง์— ๋”ฐ๋ผ ํšจ์œจ์ ์ธ ๋ถ„์‚ฐ ๊ธฐ๊ณ„ ํ•™์Šต ์‹œ์Šคํ…œ์„์œ„ํ•œ ๋งŽ์€ ๋…ธ๋ ฅ๋“ค์ด ์ด๋ฃจ์–ด์ง€๊ณ  ์žˆ๋‹ค. ํŒŒ๋ผ๋ฏธํ„ฐ ์„œ๋ฒ„ ๋ฐฉ์‹์€ ๊ฑฐ๋Œ€ํ•œ ์Šค์ผ€์ผ์˜ ๋ฐ์ดํ„ฐ์™€ ๋ณต์žกํ•œ ๋ชจ๋ธ์„ ์ง€์›ํ•˜๊ธฐ ์œ„ํ•œ ์œ ๋ช…ํ•œ ๋ฐฉ๋ฒ•๋“ค ์ค‘ ํ•˜๋‚˜์ด๋‹ค. ์ด ๋ฐฉ์‹์—์„œ, ํ•™์Šต ์ž‘์—…์€ ๋ถ„์‚ฐ ์›Œ์ปค์™€ ์„œ๋ฒ„๋“ค๋กœ ๊ตฌ์„ฑ๋˜๊ณ , ์›Œ์ปค๋“ค์€ ํ• ๋‹น๋œ ์ž…๋ ฅ ๋ฐ์ดํ„ฐ๋กœ๋ถ€ํ„ฐ ๋ฐ˜๋ณต์ ์œผ๋กœ ๊ทธ๋ ˆ๋””์–ธํŠธ๋ฅผ ๊ณ„์‚ฐํ•˜์—ฌ ์„œ๋ฒ„๋“ค์— ๋ณด๊ด€๋œ ๊ธ€๋กœ๋ฒŒ ๋ชจ๋ธ ํŒŒ ๋ผ๋ฏธํ„ฐ๋“ค์„ ์—…๋ฐ์ดํŠธํ•œ๋‹ค. ํŒŒ๋ผ๋ฏธํ„ฐ ์„œ๋ฒ„ ์‹œ์Šคํ…œ์˜ ์„ฑ๋Šฅ์„ ํ–ฅ์ƒ์‹œํ‚ค๊ธฐ ์œ„ํ•ด, ์ด ๋…ผ๋ฌธ์—์„œ๋Š” ์ž๋™์ ์œผ๋กœ ์ž์› ํšจ์œจ์„ฑ๊ณผ ์‹œ์Šคํ…œ ์„ฑ๋Šฅ์„ ์ตœ์ ํ™”ํ•˜๋Š” ๋‘๊ฐ€์ง€์˜ ํ•ด๋ฒ•์„ ์ œ์•ˆํ•œ๋‹ค. ์ฒซ๋ฒˆ์งธ ํ•ด๋ฒ•์€, ํŒŒ๋ผ๋ฏธํ„ฐ ์‹œ์Šคํ…œ์—์„œ ๋ถ„์‚ฐ ๊ธฐ๊ณ„ ํ•™์Šต์„ ์ˆ˜ํ–‰์‹œ์— ์ž์› ์„ค์ • ๋ฐ ์›Œํฌ๋กœ๋“œ ๋ถ„๋ฐฐ๋ฅผ ์ž๋™ํ™”ํ•˜๋Š” ๊ฒƒ์ด๋‹ค. ์ตœ๊ณ ์˜ ์„ค์ •์„ ์ฐพ๊ธฐ ์œ„ํ•ด ์šฐ๋ฆฌ๋Š” ์˜จ๋ผ์ธ ๋ฉ”ํŠธ๋ฆญ์„ ์‚ฌ์šฉํ•˜๋Š” ๋น„์šฉ ๋ชจ๋ธ์„ ๊ธฐ๋ฐ˜์œผ๋กœ ํ•˜๋Š” Optimizer๋ฅผ ๋งŒ๋“ค์—ˆ๋‹ค. Optimizer์˜ ๊ฒฐ์ •์„ ํšจ์œจ์ ์œผ๋กœ ์ ์šฉํ•˜๊ธฐ ์œ„ํ•ด, ์šฐ๋ฆฌ๋Š” ๋Ÿฐํƒ€์ž„์„ ๋™์  ์žฌ์„ค์ •์„ ์ตœ์†Œ์˜ ์˜ค๋ฒ„ํ—ค๋“œ๋กœ ๋ฐฑ๊ทธ๋ผ์šด๋“œ์—์„œ ์ˆ˜ํ–‰ํ•˜๋„๋ก ๋””์ž์ธํ–ˆ๋‹ค. ๋‘๋ฒˆ์งธ ํ•ด๋ฒ•์€ ๊ณต์œ  ํด๋Ÿฌ์Šคํ„ฐ ์ƒํ™ฉ์—์„œ ์—ฌ๋Ÿฌ ๊ฐœ์˜ ๊ธฐ๊ณ„ ํ•™์Šต ์ž‘์—…์˜ ์„ธ๋ถ€ ์ž‘์—… ๊ณผ ์ž์›์˜ ์Šค์ผ€์ฅด๋ง์„ ์ตœ์ ํ™”ํ•œ ๊ฒƒ์ด๋‹ค. ๊ตฌ์ฒด์ ์œผ๋กœ, ์šฐ๋ฆฌ๋Š” ์„ธ๋ถ€ ์ž‘์—…๋“ค์„ ์„ธ๋ฐ€ํ•œ ๋‹จ์œ„๋กœ ์ˆ˜ํ–‰ํ•จ์œผ๋กœ์จ ์ž์› ๊ฒฝ์Ÿ์„ ์–ต์ œํ•˜๊ณ , ์„œ๋กœ๋ฅผ ๋ณด์™„ํ•˜๋Š” ์ž์› ์‚ฌ์šฉ ํŒจํ„ด์„ ๋ณด์ด๋Š” ์ž‘์—…๋“ค์„ ๊ฐ™์€ ์ž์›์— ํ•จ๊ป˜ ์œ„์น˜์‹œ์ผœ ์ž์› ํ™œ์šฉ์œจ์„ ๋Œ์–ด์˜ฌ๋ ธ๋‹ค. ํ•จ๊ป˜ ์œ„์น˜ํ•œ ์ž‘์—…๋“ค์˜ ๋ฉ”๋ชจ๋ฆฌ ์••๋ ฅ์„ ๊ฒฝ๊ฐ์‹œํ‚ค๊ธฐ ์œ„ํ•ด ์šฐ๋ฆฌ๋Š” ๋™์ ์œผ๋กœ ๋ฐ์ดํ„ฐ๋ฅผ ๋””์Šคํฌ๋กœ ๋‚ด๋ ธ๋‹ค๊ฐ€ ๋‹ค์‹œ ๋ฉ”๋ชจ๋ฆฌ๋กœ ์ฝ์–ด์˜ค๋Š” ๊ธฐ๋Šฅ์„ ์ง€์›ํ•จ๊ณผ ๋™์‹œ์—, ๋””์Šคํฌ์™€ ๋ฉ”๋ชจ๋ฆฌ๊ฐ„์˜ ๋ฐ์ดํ„ฐ ๋น„์œจ์„ ์ƒํ™ฉ์— ๋งž๊ฒŒ ์‹œ์Šคํ…œ์ด ์ž๋™์œผ๋กœ ๋งž์ถ”๋„๋ก ํ•˜์˜€๋‹ค. ์œ„์˜ ํ•ด๋ฒ•๋“ค์„ ์‹ค์ฒดํ™”ํ•˜๊ธฐ ์œ„ํ•ด, ์‹ค์ œ ๋™์ž‘ํ•˜๋Š” ์‹œ์Šคํ…œ์„ ๋งŒ๋“ค์—ˆ๋‹ค. ๋‘๊ฐ€์ง€์˜ ํ•ด๋ฒ•์„ ํ•˜๋‚˜์˜ ์‹œ์Šคํ…œ์— ๊ตฌํ˜„ํ•จ์œผ๋กœ์จ, ๋™์ ์œผ๋กœ ์ž‘์—…์„ ๋จธ์‹  ๊ฐ„์— ์˜ฎ๊ธฐ๊ณ  ์ž์›์„ ์žฌํ• ๋‹นํ•  ์ˆ˜ ์žˆ๋Š” ๋Ÿฐํƒ€์ž„์„ ๊ณต์œ ํ•œ๋‹ค. ํ•ด๋‹น ์†”๋ฃจ์…˜๋“ค์˜ ํšจ๊ณผ๋ฅผ ๋ณด์—ฌ์ฃผ๊ธฐ ์œ„ํ•ด, ์ด ์‹œ์Šคํ…œ์„ ๋งŽ์ด ์‚ฌ์šฉ๋˜๋Š” ๊ธฐ๊ณ„ ํ•™์Šต ์–ดํ”Œ๋ฆฌ์ผ€์ด์…˜์œผ๋กœ ์‹คํ—˜ํ•˜์˜€๊ณ  ๊ธฐ์กด ์‹œ์Šคํ…œ๋“ค ๋Œ€๋น„ ๋›ฐ์–ด๋‚œ ์„ฑ๋Šฅ ํ–ฅ์ƒ์„ ๋ณด์—ฌ์ฃผ์—ˆ๋‹ค.Chapter1. Introduction 1 1.1 Distributed Machine Learning on Parameter Servers 1 1.2 Automating System Conguration of Distributed Machine Learning 2 1.3 Scheduling of Multiple Distributed Machine Learning Jobs 3 1.4 Contributions 5 1.5 Dissertation Structure 6 Chapter2. Background 7 Chapter3. Automating System Conguration of Distributed Machine Learning 10 3.1 System Conguration Challenges 11 3.2 Finding Good System Conguration 13 3.2.1 Cost Model 13 3.2.2 Cost Formulation 15 3.2.3 Optimization 16 3.3 Cruise 18 3.3.1 Optimizer 19 3.3.2 Elastic Runtime 21 3.4 Evaluation 26 3.4.1 Experimental Setup 26 3.4.2 Finding Baselines with Grid Search 28 3.4.3 Optimization in the Homogeneous Environment 28 3.4.4 Utilizing Opportunistic Resources 30 3.4.5 Optimization in the Heterogeneous Environment 31 3.4.6 Reconguration Speed 32 3.5 Related Work 33 3.6 Summary 34 Chapter4 A Scheduling Framework Optimized for Multiple Distributed Machine Learning Jobs 36 4.1 Resource Under-utilization Problems in PS ML Training 37 4.2 Harmony Overview 42 4.3 Multiplexing ML Jobs 43 4.3.1 Fine-grained Execution with Subtasks 44 4.3.2 Dynamic Grouping of Jobs 45 4.3.3 Dynamic Data Reloading 54 4.4 Evaluation 56 4.4.1 Baselines 56 4.4.2 Experimental Setup 57 4.4.3 Performance Comparison 59 4.4.4 Performance Breakdown 59 4.4.5 Workload Sensitivity Analysis 61 4.4.6 Accuracy of the Performance Model 63 4.4.7 Performance and Scalability of the Scheduling Algorithm 64 4.4.8 Dynamic Data Reloading 66 4.5 Discussion 67 4.6 Related Work 67 4.7 Summary 70 Chapter5 Conclusion 71 5.1 Summary 71 5.2 Future Work 71 5.2.1 Other Communication Architecture Support 71 5.2.2 Deep Learning & GPU Resource Support 72 ์š”์•ฝ 81Docto

    Orchestration of music emotion recognition services - automating deployment, scaling and management

    Get PDF
    Every day, thousands of new songs are created and distributed over the internet. These ever-increasing databases introduced the need for automatic search and organization methods, that allow users to better filter and browse such collections. However, fundamental research in the MER field is very academic, with the typical work presenting results in the form classification metrics โ€“ how good the approach worked in the tested datasets and providing access to the data and methods. In order to overcome this problem, we built and deployed a platform to orchestrate a distributed, resilient, and scalable, music emotion recognition (MER) application using Kubernetes that can be easily expanded in the future. The solution developed is based on a proof of concept that explored the usage of containers and microservices in MER but had some gaps. We reengineered and expanded it, proposing a properly orchestrated, containerbased solution, and adopting a DevOps development culture with continuous integration (CI) and continuous delivery (CD) that in an automated way, makes it easy for the different teams to focus on developing new blocks separately. At the application level, instead of analyzing the audio signal recurring to only three audio features, the system now combines a large number of audio and lyric (text) features, explores different parts of audio (vocals, accompaniment) in segments (e.g., 30-second segments instead of the full song) and uses properly trained machine learning (ML) classifiers, a contribution by Tiago Antรณnio. At the orchestration level, it uses Kubernetes with Calico as the networking plugin, providing networking for the containers and pods and Rook with Ceph for the persistent block and file storage. To allow external traffic into the cluster, will use HAproxy as an external ingress controller on an external node, with BIRD providing BGP peering with Calico, allowing the communication between the pods and the external node. ArgoCD was selected as the continuous delivery tool, constantly syncing with a git repository, and thus maintaining the state of the cluster manifests up to date, which allows totally abstracting developers from the infrastructure. A monitoring stack combining Prometheus, Alertmanager and Grafana allows the constant monitoring of running iv applications and cluster status, collecting metrics that can help to understand the state of operations. The administration of the cluster can be carried out in a simplified way using Portainer. The continuous implementation pipelines run on GitHub Actions, integrating software and security tests and automatically build new versions of the containers based on tag releases and publish them on DockerHub. This implementation is fully cloud native and backed only by open source software
    • โ€ฆ
    corecore