Search CORE

2,711 research outputs found

DeSyRe: on-Demand System Reliability

Author: Armato Antonino
Bouganis Christos-Savvas
Falsafi Babak
Gaydadjiev Georgi
Isaza Sebastian
Malek Alirad
Mariani Riccardo
Pnevmatikatos Dionisios N
Pradhan Dhiraj K
Rauwerda Gerard
Seepers Robert
Shafik Rishad Ahmed
Sourdis Ioannis
Strydis Christos
Sunesen Kim
Theodoropoulos Dimitris
Tzilis Stavros
Vavouras Michail
Publication venue: 'Elsevier BV'
Publication date: 01/01/2013
Field of study

The DeSyRe project builds on-demand adaptive and reliable Systems-on-Chips (SoCs). As fabrication technology scales down, chips are becoming less reliable, thereby incurring increased power and performance costs for fault tolerance. To make matters worse, power density is becoming a significant limiting factor in SoC design, in general. In the face of such changes in the technological landscape, current solutions for fault tolerance are expected to introduce excessive overheads in future systems. Moreover, attempting to design and manufacture a totally defect and fault-free system, would impact heavily, even prohibitively, the design, manufacturing, and testing costs, as well as the system performance and power consumption. In this context, DeSyRe delivers a new generation of systems that are reliable by design at well-balanced power, performance, and design costs. In our attempt to reduce the overheads of fault-tolerance, only a small fraction of the chip is built to be fault-free. This fault-free part is then employed to manage the remaining fault-prone resources of the SoC. The DeSyRe framework is applied to two medical systems with high safety requirements (measured using the IEC 61508 functional safety standard) and tight power and performance constraints

Southampton (e-Prints Soton)

EUR Research Repository

Chalmers Research

Chalmers Publication Library

Explore Bristol Research

Dynamic load balancing for the distributed mining of molecular structures

Author: Berthold M.R.
Di Fatta Giuseppe
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2006
Field of study

In molecular biology, it is often desirable to find common properties in large numbers of drug candidates. One family of methods stems from the data mining community, where algorithms to find frequent graphs have received increasing attention over the past years. However, the computational complexity of the underlying problem and the large amount of data to be explored essentially render sequential algorithms useless. In this paper, we present a distributed approach to the frequent subgraph mining problem to discover interesting patterns in molecular compounds. This problem is characterized by a highly irregular search tree, whereby no reliable workload prediction is available. We describe the three main aspects of the proposed distributed algorithm, namely, a dynamic partitioning of the search space, a distribution process based on a peer-to-peer communication framework, and a novel receiverinitiated load balancing algorithm. The effectiveness of the distributed method has been evaluated on the well-known National Cancer Institute’s HIV-screening data set, where we were able to show close-to linear speedup in a network of workstations. The proposed approach also allows for dynamic resource aggregation in a non dedicated computational environment. These features make it suitable for large-scale, multi-domain, heterogeneous environments, such as computational grids

KOPS - The Institutional Repository of the University of Konstanz

Central Archive at the University of Reading

Crossref

SplitPlace: AI augmented splitting and placement of large-scale neural networks in mobile edge environments

Author: Casale G
Jennings NR
Tuli S
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 20/05/2022
Field of study

In recent years, deep learning models have become ubiquitous in industry and academia alike. Deep neural networks can solve some of the most complex pattern-recognition problems today, but come with the price of massive compute and memory requirements. This makes the problem of deploying such large-scale neural networks challenging in resource-constrained mobile edge computing platforms, specifically in mission-critical domains like surveillance and healthcare. To solve this, a promising solution is to split resource-hungry neural networks into lightweight disjoint smaller components for pipelined distributed processing. At present, there are two main approaches to do this: semantic and layer-wise splitting. The former partitions a neural network into parallel disjoint models that produce a part of the result, whereas the latter partitions into sequential models that produce intermediate results. However, there is no intelligent algorithm that decides which splitting strategy to use and places such modular splits to edge nodes for optimal performance. To combat this, this work proposes a novel AI-driven online policy, SplitPlace, that uses Multi-Armed-Bandits to intelligently decide between layer and semantic splitting strategies based on the input task's service deadline demands. SplitPlace places such neural network split fragments on mobile edge devices using decision-aware reinforcement learning for efficient and scalable computing. Moreover, SplitPlace fine-tunes its placement engine to adapt to volatile environments. Our experiments on physical mobile-edge environments with real-world workloads show that SplitPlace can significantly improve the state-of-the-art in terms of average response time, deadline violation rate, inference accuracy, and total reward by up to 46, 69, 3 and 12 percent respectively

Spiral - Imperial College Digital Repository

분산 기계 학습의 자원 효율적인 수행을 위한 동적 최적화 기술

Author: 이우연
Publication venue: 서울대학교 대학원
Publication date: 01/02/2020
Field of study

학위논문(박사)--서울대학교 대학원 :공과대학 컴퓨터공학부,2020. 2. 전병곤.Machine Learning(ML) systems are widely used to extract insights from data. Ever increasing dataset sizes and model complexity gave rise to many efforts towards efﬁcient distributed machine learning systems. One of the popular approaches to support large scale data and complicated models is the parameter server (PS) approach. In this approach, a training job runs with distributed worker and server tasks, where workers iteratively compute gradients to update the global model parameters that are kept in servers. To improve the PS system performance, this dissertation proposes two solutions that automatically optimize resource efﬁciency and system performance. First, we propose a solution that optimizes the resource conﬁguration and workload partitioning of distributed ML training on PS system. To ﬁnd the best conﬁguration, we build an Optimizer based on a cost model that works with online metrics. To efﬁciently apply decisions by Optimizer, we design our runtime elastic to perform reconﬁguration in the background with minimal overhead. The second solution optimizes the scheduling of resources and tasks of multiple ML training jobs in a shared cluster. Speciﬁcally, we co-locate jobs with complementary resource use to increase resource utilization, while executing their tasks with ﬁne-grained unit to avoid resource contention. To alleviate memory pressure by co-located jobs, we enable dynamic spill/reload of data, which adaptively changes the ratio of data between disk and memory. We build a working system that implements our approaches. The above two solutions are implemented in the same system and share the runtime part that can dynamically migrate jobs between machines and reallocate machine resources. We evaluate our system with popular ML applications to verify the effectiveness of our solutions.기계 학습 시스템은 데이터에 숨겨진 의미를 뽑아내기 위해 널리 사용되고 있다. 데이터셋의 크기와 모델의 복잡도가 어느때보다 커짐에 따라 효율적인 분산 기계 학습 시스템을위한 많은 노력들이 이루어지고 있다. 파라미터 서버 방식은 거대한 스케일의 데이터와 복잡한 모델을 지원하기 위한 유명한 방법들 중 하나이다. 이 방식에서, 학습 작업은 분산 워커와 서버들로 구성되고, 워커들은 할당된 입력 데이터로부터 반복적으로 그레디언트를 계산하여 서버들에 보관된 글로벌 모델 파 라미터들을 업데이트한다. 파라미터 서버 시스템의 성능을 향상시키기 위해, 이 논문에서는 자동적으로 자원 효율성과 시스템 성능을 최적화하는 두가지의 해법을 제안한다. 첫번째 해법은, 파라미터 시스템에서 분산 기계 학습을 수행시에 자원 설정 및 워크로드 분배를 자동화하는 것이다. 최고의 설정을 찾기 위해 우리는 온라인 메트릭을 사용하는 비용 모델을 기반으로 하는 Optimizer를 만들었다. Optimizer의 결정을 효율적으로 적용하기 위해, 우리는 런타임을 동적 재설정을 최소의 오버헤드로 백그라운드에서 수행하도록 디자인했다. 두번째 해법은 공유 클러스터 상황에서 여러 개의 기계 학습 작업의 세부 작업 과 자원의 스케쥴링을 최적화한 것이다. 구체적으로, 우리는 세부 작업들을 세밀한 단위로 수행함으로써 자원 경쟁을 억제하고, 서로를 보완하는 자원 사용 패턴을 보이는 작업들을 같은 자원에 함께 위치시켜 자원 활용율을 끌어올렸다. 함께 위치한 작업들의 메모리 압력을 경감시키기 위해 우리는 동적으로 데이터를 디스크로 내렸다가 다시 메모리로 읽어오는 기능을 지원함과 동시에, 디스크와 메모리간의 데이터 비율을 상황에 맞게 시스템이 자동으로 맞추도록 하였다. 위의 해법들을 실체화하기 위해, 실제 동작하는 시스템을 만들었다. 두가지의 해법을 하나의 시스템에 구현함으로써, 동적으로 작업을 머신 간에 옮기고 자원을 재할당할 수 있는 런타임을 공유한다. 해당 솔루션들의 효과를 보여주기 위해, 이 시스템을 많이 사용되는 기계 학습 어플리케이션으로 실험하였고 기존 시스템들 대비 뛰어난 성능 향상을 보여주었다.Chapter1. Introduction 1 1.1 Distributed Machine Learning on Parameter Servers 1 1.2 Automating System Conguration of Distributed Machine Learning 2 1.3 Scheduling of Multiple Distributed Machine Learning Jobs 3 1.4 Contributions 5 1.5 Dissertation Structure 6 Chapter2. Background 7 Chapter3. Automating System Conguration of Distributed Machine Learning 10 3.1 System Conguration Challenges 11 3.2 Finding Good System Conguration 13 3.2.1 Cost Model 13 3.2.2 Cost Formulation 15 3.2.3 Optimization 16 3.3 Cruise 18 3.3.1 Optimizer 19 3.3.2 Elastic Runtime 21 3.4 Evaluation 26 3.4.1 Experimental Setup 26 3.4.2 Finding Baselines with Grid Search 28 3.4.3 Optimization in the Homogeneous Environment 28 3.4.4 Utilizing Opportunistic Resources 30 3.4.5 Optimization in the Heterogeneous Environment 31 3.4.6 Reconguration Speed 32 3.5 Related Work 33 3.6 Summary 34 Chapter4 A Scheduling Framework Optimized for Multiple Distributed Machine Learning Jobs 36 4.1 Resource Under-utilization Problems in PS ML Training 37 4.2 Harmony Overview 42 4.3 Multiplexing ML Jobs 43 4.3.1 Fine-grained Execution with Subtasks 44 4.3.2 Dynamic Grouping of Jobs 45 4.3.3 Dynamic Data Reloading 54 4.4 Evaluation 56 4.4.1 Baselines 56 4.4.2 Experimental Setup 57 4.4.3 Performance Comparison 59 4.4.4 Performance Breakdown 59 4.4.5 Workload Sensitivity Analysis 61 4.4.6 Accuracy of the Performance Model 63 4.4.7 Performance and Scalability of the Scheduling Algorithm 64 4.4.8 Dynamic Data Reloading 66 4.5 Discussion 67 4.6 Related Work 67 4.7 Summary 70 Chapter5 Conclusion 71 5.1 Summary 71 5.2 Future Work 71 5.2.1 Other Communication Architecture Support 71 5.2.2 Deep Learning & GPU Resource Support 72 요약 81Docto

SNU Open Repository and Archive

DyScale: A MapReduce Job Scheduler for Heterogeneous Multicore Processors

Author: Cherkasova Ludmila
Smirni Evgenia
Yan Feng
Zhang Zhuoyao
Publication venue: W&M ScholarWorks
Publication date: 01/07/2017
Field of study

The functionality of modern multi-core processors is often driven by a given power budget that requires designers to evaluate different decision trade-offs, e.g., to choose between many slow, power-efficient cores, or fewer faster, power-hungry cores, or a combination of them. Here, we prototype and evaluate a new Hadoop scheduler, called DyScale, that exploits capabilities offered by heterogeneous cores within a single multi-core processor for achieving a variety of performance objectives. A typical MapReduce workload contains jobs with different performance goals: large, batch jobs that are throughput oriented, and smaller interactive jobs that are response time sensitive. Heterogeneous multi-core processors enable creating virtual resource pools based on slow and fast cores for multi-class priority scheduling. Since the same data can be accessed with either slow or fast slots, spare resources (slots) can be shared between different resource pools. Using measurements on an actual experimental setting and via simulation, we argue in favor of heterogeneous multi-core processors as they achieve faster (up to 40 percent) processing of small, interactive MapReduce jobs, while offering improved throughput (up to 40 percent) for large, batch jobs. We evaluate the performance benefits of DyScale versus the FIFO and Capacity job schedulers that are broadly used in the Hadoop community

College of William & Mary: W&M Publish